Available genetic codes¶
[1]:
from cogent3 import available_codes
available_codes()
[1]:
Code ID | Name |
---|---|
1 | Standard Nuclear |
2 | Vertebrate Mitochondrial |
3 | Yeast Mitochondrial |
4 | Mold, Protozoan, and Coelenterate Mitochondrial, and Mycoplasma/Spiroplasma Nuclear |
5 | Invertebrate Mitochondrial |
6 | Ciliate, Dasycladacean and Hexamita Nuclear |
9 | Echinoderm and Flatworm Mitochondrial |
10 | Euplotid Nuclear |
11 | Bacterial Nuclear and Plant Plastid |
12 | Alternative Yeast Nuclear |
13 | Ascidian Mitochondrial |
14 | Alternative Flatworm Mitochondrial |
15 | Blepharisma Nuclear |
16 | Chlorophycean Mitochondrial |
20 | Trematode Mitochondrial |
22 | Scenedesmus obliquus Mitochondrial |
23 | Thraustochytrium Mitochondrial |
17 rows x 2 columns
In cases where a cogent3
object method has a gc
argument, you can just use the number under “Code ID” column.
For example:
[2]:
from cogent3 import load_aligned_seqs
nt_seqs = load_aligned_seqs("../data/brca1-bats.fasta", moltype="dna")
nt_seqs[:21]
[2]:
0 | |
TombBat | TGTGGCACAAGTACTCATGCC |
FlyingFox | ..........A.G........ |
DogFaced | ..........A.......... |
FreeTaile | .........GA.......... |
LittleBro | .........GA.......... |
5 x 21 dna alignment
We specify the genetic code, and that codons that are incomplete as they contain a gap, are converted to ?
.
[3]:
aa_seqs = nt_seqs.get_translation(gc=1, incomplete_ok=True)
aa_seqs[:20]
[3]:
0 | |
TombBat | CGTSTHASSVQHENSSLLLT |
FlyingFox | ...NA....L....-...Y. |
DogFaced | ...N...N.L........Y. |
FreeTaile | ...D.....L.......... |
LittleBro | ...D.....L.......... |
5 x 20 protein alignment
Getting a genetic code with get_code()
¶
This function can be used directly to get a genetic code. We will get the code with ID 4.
[4]:
from cogent3 import get_code
gc = get_code(4)
gc
[4]:
aa | IUPAC code | codons |
---|---|---|
Alanine | A | GCT,GCC,GCA,GCG |
Cysteine | C | TGT,TGC |
Aspartic Acid | D | GAT,GAC |
Glutamic Acid | E | GAA,GAG |
Phenylalanine | F | TTT,TTC |
Glycine | G | GGT,GGC,GGA,GGG |
Histidine | H | CAT,CAC |
Isoleucine | I | ATT,ATC,ATA |
Lysine | K | AAA,AAG |
Leucine | L | TTA,TTG,CTT,CTC,CTA,CTG |
Methionine | M | ATG |
Asparagine | N | AAT,AAC |
Proline | P | CCT,CCC,CCA,CCG |
Glutamine | Q | CAA,CAG |
Arginine | R | CGT,CGC,CGA,CGG,AGA,AGG |
Serine | S | TCT,TCC,TCA,TCG,AGT,AGC |
Threonine | T | ACT,ACC,ACA,ACG |
Valine | V | GTT,GTC,GTA,GTG |
Tryptophan | W | TGA,TGG |
Tyrosine | Y | TAT,TAC |
STOP | * | TAA,TAG |