Draw sequence logos

Sequence logo’s display sequence information. They’re extensively applied to transcription factor binding site (TFBS) display. They can also be applied to sequence alignments more generally.

Drawing logo for a TFBS

We use the TFBS for the TAT box binding protein.

[1]:
from cogent3.parse import jaspar

_, pwm = jaspar.read("../data/tbp.jaspar")
freqarr = pwm.to_freq_array()
freqarr[:5]  # illustrating the contents of the MotifFreqsArray
[1]:
T C A G
0 0.080 0.373 0.157 0.391
1 0.794 0.118 0.041 0.046
2 0.090 0.000 0.905 0.005
3 0.961 0.026 0.008 0.005
4 0.077 0.000 0.910 0.013
[2]:
logo = freqarr.logo()
logo.show(height=250, width=500)

Drawing a sequence logo from a multiple sequence alignment

This can be done for an entire alignment, but bear in mind it can take some time to render. Note that we include gap characters in the display.

[3]:
from cogent3 import load_aligned_seqs

aln = load_aligned_seqs("../data/brca1-bats.fasta", moltype="dna")
l = aln[:311].seqlogo(height=300, width=500, wrap=60, vspace=0.05)
l.show()

Sequence logo of protein alignment

No difference here except it uses the built-in colour scheme from the protein MolType.

[4]:
aa = aln.get_translation(incomplete_ok=True)[:120]
logo = aa.seqlogo(width=500, height=200, wrap=50, vspace=0.1)
logo.show()