{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Applying a time-reversible nucleotide model\n", "\n", "We display the available set of nucleotide substitution models." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Specify a model using 'Abbreviation' (case sensitive).
Model TypeAbbreviationDescription
nucleotideJC69Jukes and Cantor's 1969 model
nucleotideK80Kimura 1980
nucleotideF81Felsenstein's 1981 model
nucleotideHKY85Hasegawa, Kishino and Yanamo 1985 model
nucleotideTN93Tamura and Nei 1993 model
nucleotideGTRGeneral Time Reversible nucleotide substitution model.
nucleotidessGNstrand-symmetric general Markov nucleotide (non-stationary, non-reversible). Kaehler, 2017, Journal of Theoretical Biology 420: 144–51
nucleotideGNGeneral Markov Nucleotide (non-stationary, non-reversible). Kaehler, Yap, Zhang, Huttley, 2015, Sys Biol 64 (2): 281–93
nucleotideBHBarry and Hartigan Discrete Time substitution model Barry and Hartigan 1987. Biometrics 43: 261–76.
nucleotideDTDiscrete Time substitution model (non-stationary, non-reversible). motif_length=2 makes this a dinucleotide model, motif_length=3 a trinucleotide model.
\n", "

\n", "10 rows x 3 columns

" ], "text/plain": [ "Specify a model using 'Abbreviation' (case sensitive).\n", "======================================================================================================================================================================================\n", "Model Type Abbreviation Description\n", "--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n", "nucleotide JC69 Jukes and Cantor's 1969 model\n", "nucleotide K80 Kimura 1980\n", "nucleotide F81 Felsenstein's 1981 model\n", "nucleotide HKY85 Hasegawa, Kishino and Yanamo 1985 model\n", "nucleotide TN93 Tamura and Nei 1993 model\n", "nucleotide GTR General Time Reversible nucleotide substitution model.\n", "nucleotide ssGN strand-symmetric general Markov nucleotide (non-stationary, non-reversible). Kaehler, 2017, Journal of Theoretical Biology 420: 144–51\n", "nucleotide GN General Markov Nucleotide (non-stationary, non-reversible). Kaehler, Yap, Zhang, Huttley, 2015, Sys Biol 64 (2): 281–93\n", "nucleotide BH Barry and Hartigan Discrete Time substitution model Barry and Hartigan 1987. Biometrics 43: 261–76.\n", "nucleotide DT Discrete Time substitution model (non-stationary, non-reversible). motif_length=2 makes this a dinucleotide model, motif_length=3 a trinucleotide model.\n", "--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n", "\n", "10 rows x 3 columns" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from cogent3 import available_models\n", "\n", "available_models(\"nucleotide\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the GTR model\n", "\n", "We specify the general time-reversible model ([Lanave et al](https://www.ncbi.nlm.nih.gov/pubmed/6429346)) by its abbreviation. By default, this model does not optimise the codon frequencies but uses the average estimated from the alignment. We configure the model to optimise the root motif probabilities." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
GTR
keylnLnfpDLCunique_Q
-6992.574119True
\n" ], "text/plain": [ "GTR\n", "============================================\n", "key lnL nfp DLC unique_Q\n", "--------------------------------------------\n", " -6992.5741 19 True \n", "--------------------------------------------\n", "\n", "1 rows x 5 columns" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from cogent3.app import io, evo\n", "\n", "loader = io.load_aligned(format=\"fasta\", moltype=\"dna\")\n", "aln = loader(\"../data/primate_brca1.fasta\")\n", "model = evo.model(\"GTR\", \n", " tree=\"../data/primate_brca1.tree\", \n", " sm_args=dict(optimise_motif_probs=True))\n", "result = model(aln)\n", "result" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

GTR

\n", "

log-likelihood = -6992.5741

\n", "

number of free parameters = 19

\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Global params
A/CA/GA/TC/GC/T
1.22965.24780.94722.33895.9666
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Edge params
edgeparentlength
Galagoroot0.1727
HowlerMonroot0.0448
Rhesusedge.30.0215
Orangutanedge.20.0077
Gorillaedge.10.0025
Humanedge.00.0060
Chimpanzeeedge.00.0028
edge.0edge.10.0000
edge.1edge.20.0034
edge.2edge.30.0119
edge.3root0.0076
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Motif params
ACGT
0.37920.17190.20660.2423
\n" ], "text/plain": [ "GTR\n", "log-likelihood = -6992.5741\n", "number of free parameters = 19\n", "==============================================\n", " A/C A/G A/T C/G C/T\n", "----------------------------------------------\n", "1.2296 5.2478 0.9472 2.3389 5.9666\n", "----------------------------------------------\n", "==============================\n", " edge parent length\n", "------------------------------\n", " Galago root 0.1727\n", " HowlerMon root 0.0448\n", " Rhesus edge.3 0.0215\n", " Orangutan edge.2 0.0077\n", " Gorilla edge.1 0.0025\n", " Human edge.0 0.0060\n", "Chimpanzee edge.0 0.0028\n", " edge.0 edge.1 0.0000\n", " edge.1 edge.2 0.0034\n", " edge.2 edge.3 0.0119\n", " edge.3 root 0.0076\n", "------------------------------\n", "====================================\n", " A C G T\n", "------------------------------------\n", "0.3792 0.1719 0.2066 0.2423\n", "------------------------------------" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result.lf" ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:c3dev] *", "language": "python", "name": "conda-env-c3dev-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.1" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }