{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Applying a time-reversible codon model\n", "\n", "We display the full set of codon models available." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Specify a model using 'Abbreviation' (case sensitive).
Model TypeAbbreviationDescription
codonCNFGTRConditional nucleotide frequency codon substitution model, GTR variant (with params analagous to the nucleotide GTR model). Yap, Lindsay, Easteal and Huttley, 2010, Mol Biol Evol 27: 726-734
codonCNFHKYConditional nucleotide frequency codon substitution model, HKY variant (with kappa, the ratio of transitions to transversions) Yap, Lindsay, Easteal and Huttley, 2010, Mol Biol Evol 27: 726-734
codonMG94HKYMuse and Gaut 1994 codon substitution model, HKY variant (with kappa, the ratio of transitions to transversions) Muse and Gaut, 1994, Mol Biol Evol, 11, 715-24
codonMG94GTRMuse and Gaut 1994 codon substitution model, GTR variant (with params analagous to the nucleotide GTR model) Muse and Gaut, 1994, Mol Biol Evol, 11, 715-24
codonGY94Goldman and Yang 1994 codon substitution model. N Goldman and Z Yang, 1994, Mol Biol Evol, 11(5):725-36.
codonY98Yang's 1998 substitution model, a derivative of the GY94. Z Yang, 1998, Mol Biol Evol, 15(5):568-73
codonH04GHuttley 2004 CpG substitution model. Includes a term for substitutions to or from CpG's. GA Huttley, 2004, Mol Biol Evol, 21(9):1760-8
codonH04GKHuttley 2004 CpG substitution model. Includes a term for transition substitutions to or from CpG's. GA Huttley, 2004, Mol Biol Evol, 21(9):1760-8
codonH04GGKHuttley 2004 CpG substitution model. Includes a general term for substitutions to or from CpG's and an adjustment for CpG transitions. GA Huttley, 2004, Mol Biol Evol, 21(9):1760-8
codonGNCGeneral Nucleotide Codon, a non-reversible codon model. Kaehler, Yap, Huttley, 2017, Gen Biol Evol 9(1): 134–49
\n", "

\n", "10 rows x 3 columns

" ], "text/plain": [ "Specify a model using 'Abbreviation' (case sensitive).\n", "===============================================================================================================================================================================================================================\n", "Model Type Abbreviation Description\n", "-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n", " codon CNFGTR Conditional nucleotide frequency codon substitution model, GTR variant (with params analagous to the nucleotide GTR model). Yap, Lindsay, Easteal and Huttley, 2010, Mol Biol Evol 27: 726-734\n", " codon CNFHKY Conditional nucleotide frequency codon substitution model, HKY variant (with kappa, the ratio of transitions to transversions) Yap, Lindsay, Easteal and Huttley, 2010, Mol Biol Evol 27: 726-734\n", " codon MG94HKY Muse and Gaut 1994 codon substitution model, HKY variant (with kappa, the ratio of transitions to transversions) Muse and Gaut, 1994, Mol Biol Evol, 11, 715-24\n", " codon MG94GTR Muse and Gaut 1994 codon substitution model, GTR variant (with params analagous to the nucleotide GTR model) Muse and Gaut, 1994, Mol Biol Evol, 11, 715-24\n", " codon GY94 Goldman and Yang 1994 codon substitution model. N Goldman and Z Yang, 1994, Mol Biol Evol, 11(5):725-36.\n", " codon Y98 Yang's 1998 substitution model, a derivative of the GY94. Z Yang, 1998, Mol Biol Evol, 15(5):568-73\n", " codon H04G Huttley 2004 CpG substitution model. Includes a term for substitutions to or from CpG's. GA Huttley, 2004, Mol Biol Evol, 21(9):1760-8\n", " codon H04GK Huttley 2004 CpG substitution model. Includes a term for transition substitutions to or from CpG's. GA Huttley, 2004, Mol Biol Evol, 21(9):1760-8\n", " codon H04GGK Huttley 2004 CpG substitution model. Includes a general term for substitutions to or from CpG's and an adjustment for CpG transitions. GA Huttley, 2004, Mol Biol Evol, 21(9):1760-8\n", " codon GNC General Nucleotide Codon, a non-reversible codon model. Kaehler, Yap, Huttley, 2017, Gen Biol Evol 9(1): 134–49\n", "-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n", "\n", "10 rows x 3 columns" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from cogent3 import available_models\n", "\n", "available_models(\"codon\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the conditional nucleotide form codon model\n", "\n", "The CNFGTR model ([Yap et al](https://www.ncbi.nlm.nih.gov/pubmed/19815689)) is the most robust of the time-reversible codon models available ([Kaehler et al](https://www.ncbi.nlm.nih.gov/pubmed/28175284)). By default, this model does not optimise the codon frequencies but uses the average estimated from the alignment. We configure the model to optimise the root motif probabilities." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
CNFGTR
keylnLnfpDLCunique_Q
-6739.306777True
\n" ], "text/plain": [ "CNFGTR\n", "============================================\n", "key lnL nfp DLC unique_Q\n", "--------------------------------------------\n", " -6739.3067 77 True \n", "--------------------------------------------\n", "\n", "1 rows x 5 columns" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from cogent3.app import io, evo\n", "\n", "loader = io.load_aligned(format=\"fasta\", moltype=\"dna\")\n", "aln = loader(\"../data/primate_brca1.fasta\")\n", "model = evo.model(\"CNFGTR\", \n", " tree=\"../data/primate_brca1.tree\", \n", " sm_args=dict(optimise_motif_probs=True))\n", "result = model(aln)\n", "result" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

CNFGTR

\n", "

log-likelihood = -6739.3067

\n", "

number of free parameters = 77

\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Global params
A/CA/GA/TC/GC/Tomega
1.06563.93910.78511.94754.22650.7569
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Edge params
edgeparentlength
Galagoroot0.5330
HowlerMonroot0.1365
Rhesusedge.30.0659
Orangutanedge.20.0233
Gorillaedge.10.0075
Humanedge.00.0182
Chimpanzeeedge.00.0085
edge.0edge.10.0000
edge.1edge.20.0101
edge.2edge.30.0352
edge.3root0.0228
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Motif params
AAAAACAAGAATACAACCACGACTAGAAGC
0.05400.02420.03070.05430.02370.00630.00210.02970.02380.0280
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
AGGAGTATAATCATGATTCAACACCAGCAT
0.01220.04050.02260.00710.01410.02030.02280.00630.02200.0237
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
CCACCCCCGCCTCGACGCCGGCGTCTACTC
0.01650.00430.00210.02390.00220.00120.00350.00580.01230.0065
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
CTGCTTGAAGACGAGGATGCAGCCGCGGCT
0.00980.01050.07030.01120.02630.03100.01540.00830.00360.0145
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
GGAGGCGGGGGTGTAGTCGTGGTTTACTAT
0.01510.00720.00510.01390.01700.00770.00940.02100.00360.0171
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
TCATCCTCGTCTTGCTGGTGTTTATTCTTG
0.02200.00830.00390.02140.00380.00330.02010.02220.00510.0107
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
TTT
0.0146
\n" ], "text/plain": [ "CNFGTR\n", "log-likelihood = -6739.3067\n", "number of free parameters = 77\n", "========================================================\n", " A/C A/G A/T C/G C/T omega\n", "--------------------------------------------------------\n", "1.0656 3.9391 0.7851 1.9475 4.2265 0.7569\n", "--------------------------------------------------------\n", "==============================\n", " edge parent length\n", "------------------------------\n", " Galago root 0.5330\n", " HowlerMon root 0.1365\n", " Rhesus edge.3 0.0659\n", " Orangutan edge.2 0.0233\n", " Gorilla edge.1 0.0075\n", " Human edge.0 0.0182\n", "Chimpanzee edge.0 0.0085\n", " edge.0 edge.1 0.0000\n", " edge.1 edge.2 0.0101\n", " edge.2 edge.3 0.0352\n", " edge.3 root 0.0228\n", "------------------------------\n", "============================================================================\n", " AAA AAC AAG AAT ACA ACC ACG ACT\n", "----------------------------------------------------------------------------\n", "0.0540 0.0242 0.0307 0.0543 0.0237 0.0063 0.0021 0.0297\n", "----------------------------------------------------------------------------\n", "\n", "continued: \n", "============================================================================\n", " AGA AGC AGG AGT ATA ATC ATG ATT\n", "----------------------------------------------------------------------------\n", "0.0238 0.0280 0.0122 0.0405 0.0226 0.0071 0.0141 0.0203\n", "----------------------------------------------------------------------------\n", "\n", "continued: \n", "============================================================================\n", " CAA CAC CAG CAT CCA CCC CCG CCT\n", "----------------------------------------------------------------------------\n", "0.0228 0.0063 0.0220 0.0237 0.0165 0.0043 0.0021 0.0239\n", "----------------------------------------------------------------------------\n", "\n", "continued: \n", "============================================================================\n", " CGA CGC CGG CGT CTA CTC CTG CTT\n", "----------------------------------------------------------------------------\n", "0.0022 0.0012 0.0035 0.0058 0.0123 0.0065 0.0098 0.0105\n", "----------------------------------------------------------------------------\n", "\n", "continued: \n", "============================================================================\n", " GAA GAC GAG GAT GCA GCC GCG GCT\n", "----------------------------------------------------------------------------\n", "0.0703 0.0112 0.0263 0.0310 0.0154 0.0083 0.0036 0.0145\n", "----------------------------------------------------------------------------\n", "\n", "continued: \n", "============================================================================\n", " GGA GGC GGG GGT GTA GTC GTG GTT\n", "----------------------------------------------------------------------------\n", "0.0151 0.0072 0.0051 0.0139 0.0170 0.0077 0.0094 0.0210\n", "----------------------------------------------------------------------------\n", "\n", "continued: \n", "============================================================================\n", " TAC TAT TCA TCC TCG TCT TGC TGG\n", "----------------------------------------------------------------------------\n", "0.0036 0.0171 0.0220 0.0083 0.0039 0.0214 0.0038 0.0033\n", "----------------------------------------------------------------------------\n", "\n", "continued: \n", "==============================================\n", " TGT TTA TTC TTG TTT\n", "----------------------------------------------\n", "0.0201 0.0222 0.0051 0.0107 0.0146\n", "----------------------------------------------" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result.lf" ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:c3dev] *", "language": "python", "name": "conda-env-c3dev-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.1" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }