Counting gaps per sequence

We have several different ways of counting sequence gaps, and of visualising the results. By default, the count_gaps_per_seq() method returns a matrix of counts without the ability to visualise the results. When setting the argument unique=True, the counts are for gaps uniquely induced by each sequence. This can be a useful indicator of highly divergent sequences.

[1]:
from cogent3 import load_aligned_seqs

aln = load_aligned_seqs('../../tests/data/brca1.fasta', moltype='dna')

counts = aln.count_gaps_per_seq(unique=True)
counts
[1]:
FlyingFox DogFaced FreeTaile LittleBro TombBat RoundEare FalseVamp
0 0 0 0 0 0 0
LeafNose Horse Rhino Pangolin Cat Dog Llama Pig Cow Hippo SpermWhale
0 0 0 0 0 0 0 0 3 0 0
HumpbackW Mole Hedgehog TreeShrew FlyingLem Galago HowlerMon Rhesus
0 0 0 3 0 3 21 0
Orangutan Gorilla Human Chimpanzee Jackrabbit FlyingSqu OldWorld Mouse
0 0 0 0 0 57 0 0
Rat NineBande HairyArma Anteater Sloth Dugong Manatee AfricanEl
0 0 0 0 0 0 0 0
AsianElep RockHyrax TreeHyrax Aardvark GoldenMol Madagascar Tenrec
0 0 0 0 0 0 6
LesserEle GiantElep Caenolest Phascogale Wombat Bandicoot
0 6 0 0 0 0

Plotting counts of unique gaps

There are three plot types supported. In all cases, placing the mouse pointer over a data point will show hover text with the sequence name.

Displaying unique gaps as a bar chart

[2]:
counts = aln.count_gaps_per_seq(unique=True, drawable='bar')
counts.show(width=500)

Displaying unique gaps as a violin plot

[3]:
counts = aln.count_gaps_per_seq(unique=True, drawable='violin')
counts.show(width=300, height=500)

Displaying unique gaps as a box plot

[4]:
counts = aln.count_gaps_per_seq(unique=True, drawable='box')
counts.show(width=300, height=500)