Available genetic codes

[1]:
from cogent3 import available_codes

available_codes()
[1]:
Specify a genetic code using either 'Name' or Code ID (as an integer or string)
Code ID Name
1 Standard Nuclear
2 Vertebrate Mitochondrial
3 Yeast Mitochondrial
4 Mold, Protozoan, and Coelenterate Mitochondrial, and Mycoplasma/Spiroplasma Nuclear
5 Invertebrate Mitochondrial
6 Ciliate, Dasycladacean and Hexamita Nuclear
9 Echinoderm and Flatworm Mitochondrial
10 Euplotid Nuclear
11 Bacterial Nuclear and Plant Plastid
12 Alternative Yeast Nuclear
13 Ascidian Mitochondrial
14 Alternative Flatworm Mitochondrial
15 Blepharisma Nuclear
16 Chlorophycean Mitochondrial
20 Trematode Mitochondrial
22 Scenedesmus obliquus Mitochondrial
23 Thraustochytrium Mitochondrial

17 rows x 2 columns

In cases where a cogent3 object method has a gc argument, you can just use the number under “Code ID” column.

For example:

[2]:
from cogent3 import load_aligned_seqs

nt_seqs = load_aligned_seqs("../data/brca1-bats.fasta", moltype="dna")
nt_seqs[:21]
[2]:
0
TombBatTGTGGCACAAGTACTCATGCC
FlyingFox..........A.G........
DogFaced..........A..........
FreeTaile.........GA..........
LittleBro.........GA..........

5 x 21 dna alignment

We specify the genetic code, and that codons that are incomplete as they contain a gap, are converted to ?.

[3]:
aa_seqs = nt_seqs.get_translation(gc=1, incomplete_ok=True)
aa_seqs[:20]
[3]:
0
TombBatCGTSTHASSVQHENSSLLLT
FlyingFox...NA....L....-...Y.
DogFaced...N...N.L........Y.
FreeTaile...D.....L..........
LittleBro...D.....L..........

5 x 20 protein alignment

Getting a genetic code with get_code()

This function can be used directly to get a genetic code. We will get the code with ID 4.

[4]:
from cogent3 import get_code

gc = get_code(4)
gc
[4]:
Mold, Protozoan, and Coelenterate Mitochondrial, and Mycoplasma/Spiroplasma Nuclear
aa IUPAC code codons
Alanine A GCT,GCC,GCA,GCG
Cysteine C TGT,TGC
Aspartic Acid D GAT,GAC
Glutamic Acid E GAA,GAG
Phenylalanine F TTT,TTC
Glycine G GGT,GGC,GGA,GGG
Histidine H CAT,CAC
Isoleucine I ATT,ATC,ATA
Lysine K AAA,AAG
Leucine L TTA,TTG,CTT,CTC,CTA,CTG
Methionine M ATG
Asparagine N AAT,AAC
Proline P CCT,CCC,CCA,CCG
Glutamine Q CAA,CAG
Arginine R CGT,CGC,CGA,CGG,AGA,AGG
Serine S TCT,TCC,TCA,TCG,AGT,AGC
Threonine T ACT,ACC,ACA,ACG
Valine V GTT,GTC,GTA,GTG
Tryptophan W TGA,TGG
Tyrosine Y TAT,TAC
STOP * TAA,TAG