Molecular types¶

The MolType object provides services for resolving ambiguities, or providing the correct ambiguity for recoding. It also maintains the mappings between different kinds of alphabets, sequences and alignments.

If your analysis involves handling ambiguous states, or translation via a genetic code, it’s critical to specify the appropriate moltype.

Available molecular types¶

[1]:

from cogent3 import available_moltypes

available_moltypes()

[1]:

Specify a moltype by the string 'Abbreviation' (case insensitive).
Abbreviation	Number of states	Moltype
ab	2	MolType(('a', 'b'))
dna	4	MolType(('T', 'C', 'A', 'G'))
rna	4	MolType(('U', 'C', 'A', 'G'))
protein	21	MolType(('A', 'C', 'D', 'E', 'F', 'G', ...
protein_with_stop	22	MolType(('A', 'C', 'D', 'E', 'F', 'G', ...
text	52	MolType(('a', 'b', 'c', 'd', 'e', 'f', ...
bytes	256	MolType(('\x00', '\x01', '\x02', '\x03'...

7 rows x 3 columns

For statements that have a moltype argument, use the entry under the “Abbreviation” column. For example:

from cogent3 import load_aligned_seqs

seqs = load_aligned_seqs("path/to/data.fasta", moltype="dna")

Getting a `MolType`¶

[2]:

from cogent3 import get_moltype

dna = get_moltype("dna")
dna

[2]:

MolType(('T', 'C', 'A', 'G'))

Using a `MolType` to get ambiguity codes¶

Just using dna from above.

[3]:

dna.ambiguities

[3]:

{'?': ('T', 'C', 'A', 'G', '-'),
 '-': ('-',),
 'N': ('A', 'C', 'T', 'G'),
 'R': ('A', 'G'),
 'Y': ('C', 'T'),
 'W': ('A', 'T'),
 'S': ('C', 'G'),
 'K': ('T', 'G'),
 'M': ('C', 'A'),
 'B': ('C', 'T', 'G'),
 'D': ('A', 'T', 'G'),
 'H': ('A', 'C', 'T'),
 'V': ('A', 'C', 'G'),
 'T': ('T',),
 'C': ('C',),
 'A': ('A',),
 'G': ('G',)}

`MolType` definition of degenerate codes¶

[4]:

dna.degenerates

[4]:

{'N': ('A', 'C', 'T', 'G'),
 'R': ('A', 'G'),
 'Y': ('C', 'T'),
 'W': ('A', 'T'),
 'S': ('C', 'G'),
 'K': ('T', 'G'),
 'M': ('C', 'A'),
 'B': ('C', 'T', 'G'),
 'D': ('A', 'T', 'G'),
 'H': ('A', 'C', 'T'),
 'V': ('A', 'C', 'G'),
 '?': 'TCAG-'}

Nucleic acid `MolType` and complementing¶

[5]:

dna.complement("AGG")

[5]:

'TCC'

Making sequences¶

Use the either the top level cogent3.make_seq function, or the method on the MolType instance.

[6]:

seq = dna.make_seq("AGGCTT", name="seq1")
seq

[6]:

DnaSequence(AGGCTT)

Verify sequences¶

[7]:

rna = get_moltype("rna")
rna.is_valid("ACGUACGUACGUACGU")

[7]:

True

Making a custom `MolType`¶

We demonstrate this by customising DNA so it allows . as gaps

[8]:

from cogent3.core import moltype as mt

DNAgapped = mt.MolType(seq_constructor=mt.DnaSequence,
                       motifset=mt.IUPAC_DNA_chars,
                       ambiguities=mt.IUPAC_DNA_ambiguities,
                       complements=mt.IUPAC_DNA_ambiguities_complements,
                       pairs=mt.DnaStandardPairs,
                       gaps='.')
seq = DNAgapped.make_seq('ACG.')
seq

[8]:

DnaSequence(ACG.)

Warning

At present, constructing a custom MolType that overrides a builtin one affects the original (in this instance, the DnaSequence class). All subsequent calls to the original class in the running process that made the change are affected. teh below code is resetting this attribute now to allow the rest of the documentation to be executed.

[9]:

from cogent3 import DNA
from cogent3.core.sequence import DnaSequence
DnaSequence.moltype = DNA

cogent3

Navigation

Related Topics

Molecular types¶

Available molecular types¶

Getting a `MolType`¶

Using a `MolType` to get ambiguity codes¶

`MolType` definition of degenerate codes¶

Nucleic acid `MolType` and complementing¶

Making sequences¶

Verify sequences¶

Making a custom `MolType`¶

Molecular types¶

Available molecular types¶

Getting a MolType¶

Using a MolType to get ambiguity codes¶

MolType definition of degenerate codes¶

Nucleic acid MolType and complementing¶

Making sequences¶

Verify sequences¶

Making a custom MolType¶

Getting a `MolType`¶

Using a `MolType` to get ambiguity codes¶

`MolType` definition of degenerate codes¶

Nucleic acid `MolType` and complementing¶

Making a custom `MolType`¶