Alphabets

Alphabet and MolType

MolType instances have an Alphabet.

>>> from cogent3 import DNA, PROTEIN
>>> print(DNA.alphabet)
('T', 'C', 'A', 'G')
>>> print(PROTEIN.alphabet)
('A', 'C', 'D', 'E', ...

Alphabet instances have a MolType.

>>> PROTEIN.alphabet.moltype == PROTEIN
True

Creating tuple alphabets

You can create a tuple alphabet of, for example, dinucleotides or trinucleotides.

>>> dinuc_alphabet = DNA.alphabet.get_word_alphabet(2)
>>> print(dinuc_alphabet)
('TT', 'CT', 'AT', 'GT', ...
>>> trinuc_alphabet = DNA.alphabet.get_word_alphabet(3)
>>> print(trinuc_alphabet)
('TTT', 'CTT', 'ATT', ...

Convert a sequence into integers

>>> seq = 'TAGT'
>>> indices = DNA.alphabet.to_indices(seq)
>>> indices
[0, 2, 3, 0]

Convert integers to a sequence

>>> seq = DNA.alphabet.from_indices([0,2,3,0])
>>> seq
['T', 'A', 'G', 'T']

or

>>> seq = DNA.alphabet.from_ordinals_to_seq([0,2,3,0])
>>> seq
DnaSequence(TAGT)