Applying a time-reversible codon model

We display the full set of codon models available.

[1]:
from cogent3 import available_models

available_models("codon")
[1]:
Specify a model using 'Abbreviation' (case sensitive).
Model Type Abbreviation Description
codon CNFGTR Conditional nucleotide frequency codon substitution model, GTR variant (with params analagous to the nucleotide GTR model). Yap, Lindsay, Easteal and Huttley, 2010, Mol Biol Evol 27: 726-734
codon CNFHKY Conditional nucleotide frequency codon substitution model, HKY variant (with kappa, the ratio of transitions to transversions) Yap, Lindsay, Easteal and Huttley, 2010, Mol Biol Evol 27: 726-734
codon MG94HKY Muse and Gaut 1994 codon substitution model, HKY variant (with kappa, the ratio of transitions to transversions) Muse and Gaut, 1994, Mol Biol Evol, 11, 715-24
codon MG94GTR Muse and Gaut 1994 codon substitution model, GTR variant (with params analagous to the nucleotide GTR model) Muse and Gaut, 1994, Mol Biol Evol, 11, 715-24
codon GY94 Goldman and Yang 1994 codon substitution model. N Goldman and Z Yang, 1994, Mol Biol Evol, 11(5):725-36.
codon Y98 Yang's 1998 substitution model, a derivative of the GY94. Z Yang, 1998, Mol Biol Evol, 15(5):568-73
codon H04G Huttley 2004 CpG substitution model. Includes a term for substitutions to or from CpG's. GA Huttley, 2004, Mol Biol Evol, 21(9):1760-8
codon H04GK Huttley 2004 CpG substitution model. Includes a term for transition substitutions to or from CpG's. GA Huttley, 2004, Mol Biol Evol, 21(9):1760-8
codon H04GGK Huttley 2004 CpG substitution model. Includes a general term for substitutions to or from CpG's and an adjustment for CpG transitions. GA Huttley, 2004, Mol Biol Evol, 21(9):1760-8
codon GNC General Nucleotide Codon, a non-reversible codon model. Kaehler, Yap, Huttley, 2017, Gen Biol Evol 9(1): 134–49

10 rows x 3 columns

Using the conditional nucleotide form codon model

The CNFGTR model (Yap et al) is the most robust of the time-reversible codon models available (Kaehler et al). By default, this model does not optimise the codon frequencies but uses the average estimated from the alignment. We configure the model to optimise the root motif probabilities.

[2]:
from cogent3.app import io, evo

loader = io.load_aligned(format="fasta", moltype="dna")
aln = loader("../data/primate_brca1.fasta")
model = evo.model("CNFGTR",
                  tree="../data/primate_brca1.tree",
                  sm_args=dict(optimise_motif_probs=True))
result = model(aln)
result
[2]:
CNFGTR
key lnL nfp DLC unique_Q
-6739.3067 77 True
[3]:
result.lf
[3]:

CNFGTR

log-likelihood = -6739.3067

number of free parameters = 77

Global params
A/C A/G A/T C/G C/T omega
1.0656 3.9391 0.7851 1.9475 4.2265 0.7569
Edge params
edge parent length
Galago root 0.5330
HowlerMon root 0.1365
Rhesus edge.3 0.0659
Orangutan edge.2 0.0233
Gorilla edge.1 0.0075
Human edge.0 0.0182
Chimpanzee edge.0 0.0085
edge.0 edge.1 0.0000
edge.1 edge.2 0.0101
edge.2 edge.3 0.0352
edge.3 root 0.0228
Motif params
AAA AAC AAG AAT ACA ACC ACG ACT AGA AGC
0.0540 0.0242 0.0307 0.0543 0.0237 0.0063 0.0021 0.0297 0.0238 0.0280
AGG AGT ATA ATC ATG ATT CAA CAC CAG CAT
0.0122 0.0405 0.0226 0.0071 0.0141 0.0203 0.0228 0.0063 0.0220 0.0237
CCA CCC CCG CCT CGA CGC CGG CGT CTA CTC
0.0165 0.0043 0.0021 0.0239 0.0022 0.0012 0.0035 0.0058 0.0123 0.0065
CTG CTT GAA GAC GAG GAT GCA GCC GCG GCT
0.0098 0.0105 0.0703 0.0112 0.0263 0.0310 0.0154 0.0083 0.0036 0.0145
GGA GGC GGG GGT GTA GTC GTG GTT TAC TAT
0.0151 0.0072 0.0051 0.0139 0.0170 0.0077 0.0094 0.0210 0.0036 0.0171
TCA TCC TCG TCT TGC TGG TGT TTA TTC TTG
0.0220 0.0083 0.0039 0.0214 0.0038 0.0033 0.0201 0.0222 0.0051 0.0107
TTT
0.0146