Dotplot

This is a useful technique (Gibbs and McIntyre) for comparing sequences. All cogent3 sequence collections classes (SequenceCollection, Alignment and ArrayAlignment) have a dotplot method.

The method returns a drawable, as demonstrated below between unaligned sequences.

[1]:
from cogent3 import load_unaligned_seqs

seqs = load_unaligned_seqs("../data/SCA1-cds.fasta", moltype="dna")
draw = seqs.dotplot()
draw.show()

If sequence names are not provided, two randomly chosen sequences are selected (see below). The plot title reflects the parameter values for defining a match. window is the size of the sequence segments being compared. threshold is the number of exact matches within window required for the two sequence segments to be considered a match. gap is the size of a gap between adjacent matches before merging.

Modifying the matching parameters

If we set window and threshold to be equal, this is equivalent to an exact match approach.

[2]:
draw = seqs.dotplot(name1="Human", name2="Mouse", window=8, threshold=8)
draw.show()

Displaying dotplot for the reverse complement

[3]:
draw = seqs.dotplot(name1="Human", name2="Mouse", rc=True)
draw.show()

NOTE: clicking on an entry in the legend turns it off

Setting plot attributes

I’ll modify the title and figure width.

[4]:
draw = seqs.dotplot(name1="Human", name2="Mouse", rc=True, title="SCA1", width=400)
draw.show()

All options

[5]:
help(seqs.dotplot)
Help on method dotplot in module cogent3.core.alignment:

dotplot(name1=None, name2=None, window=20, threshold=None, min_gap=0, width=500, title=None, rc=False, show_progress=False) method of cogent3.core.alignment.SequenceCollection instance
    make a dotplot between specified sequences. Random sequences
    chosen if names not provided.

    Parameters
    ----------
    name1, name2 : str or None
        names of sequences. If one is not provided, a random choice is made
    window : int
        k-mer size for comparison between sequences
    threshold : int
        windows where the sequences are identical >= threshold are a match
    min_gap : int
        permitted gap for joining adjacent line segments, default is no gap
        joining
    width : int
        figure width. Figure height is computed based on the ratio of
        len(seq1) / len(seq2)
    title
        title for the plot
    rc : bool or None
        include dotplot of reverse compliment also. Only applies to Nucleic
        acids moltypes
    Returns
    -------
    a Drawable or AnnotatedDrawable