{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Applying a time-reversible nucleotide model\n",
"\n",
"We display the available set of nucleotide substitution models."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"Specify a model using 'Abbreviation' (case sensitive).\n",
"\n",
"Model Type | \n",
"Abbreviation | \n",
"Description | \n",
"\n",
"\n",
"\n",
"nucleotide | \n",
"JC69 | \n",
"Jukes and Cantor's 1969 model | \n",
"
\n",
"\n",
"nucleotide | \n",
"K80 | \n",
"Kimura 1980 | \n",
"
\n",
"\n",
"nucleotide | \n",
"F81 | \n",
"Felsenstein's 1981 model | \n",
"
\n",
"\n",
"nucleotide | \n",
"HKY85 | \n",
"Hasegawa, Kishino and Yanamo 1985 model | \n",
"
\n",
"\n",
"nucleotide | \n",
"TN93 | \n",
"Tamura and Nei 1993 model | \n",
"
\n",
"\n",
"nucleotide | \n",
"GTR | \n",
"General Time Reversible nucleotide substitution model. | \n",
"
\n",
"\n",
"nucleotide | \n",
"ssGN | \n",
"strand-symmetric general Markov nucleotide (non-stationary, non-reversible). Kaehler, 2017, Journal of Theoretical Biology 420: 144–51 | \n",
"
\n",
"\n",
"nucleotide | \n",
"GN | \n",
"General Markov Nucleotide (non-stationary, non-reversible). Kaehler, Yap, Zhang, Huttley, 2015, Sys Biol 64 (2): 281–93 | \n",
"
\n",
"\n",
"nucleotide | \n",
"BH | \n",
"Barry and Hartigan Discrete Time substitution model Barry and Hartigan 1987. Biometrics 43: 261–76. | \n",
"
\n",
"\n",
"nucleotide | \n",
"DT | \n",
"Discrete Time substitution model (non-stationary, non-reversible). motif_length=2 makes this a dinucleotide model, motif_length=3 a trinucleotide model. | \n",
"
\n",
"\n",
"
\n",
"\n",
"10 rows x 3 columns
"
],
"text/plain": [
"Specify a model using 'Abbreviation' (case sensitive).\n",
"======================================================================================================================================================================================\n",
"Model Type Abbreviation Description\n",
"--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n",
"nucleotide JC69 Jukes and Cantor's 1969 model\n",
"nucleotide K80 Kimura 1980\n",
"nucleotide F81 Felsenstein's 1981 model\n",
"nucleotide HKY85 Hasegawa, Kishino and Yanamo 1985 model\n",
"nucleotide TN93 Tamura and Nei 1993 model\n",
"nucleotide GTR General Time Reversible nucleotide substitution model.\n",
"nucleotide ssGN strand-symmetric general Markov nucleotide (non-stationary, non-reversible). Kaehler, 2017, Journal of Theoretical Biology 420: 144–51\n",
"nucleotide GN General Markov Nucleotide (non-stationary, non-reversible). Kaehler, Yap, Zhang, Huttley, 2015, Sys Biol 64 (2): 281–93\n",
"nucleotide BH Barry and Hartigan Discrete Time substitution model Barry and Hartigan 1987. Biometrics 43: 261–76.\n",
"nucleotide DT Discrete Time substitution model (non-stationary, non-reversible). motif_length=2 makes this a dinucleotide model, motif_length=3 a trinucleotide model.\n",
"--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n",
"\n",
"10 rows x 3 columns"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from cogent3 import available_models\n",
"\n",
"available_models(\"nucleotide\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using the GTR model\n",
"\n",
"We specify the general time-reversible model ([Lanave et al](https://www.ncbi.nlm.nih.gov/pubmed/6429346)) by its abbreviation. By default, this model does not optimise the codon frequencies but uses the average estimated from the alignment. We configure the model to optimise the root motif probabilities."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"GTR\n",
"\n",
"key | \n",
"lnL | \n",
"nfp | \n",
"DLC | \n",
"unique_Q | \n",
"\n",
"\n",
"\n",
" | \n",
"-6992.5741 | \n",
"19 | \n",
"True | \n",
" | \n",
"
\n",
"\n",
"
\n"
],
"text/plain": [
"GTR\n",
"============================================\n",
"key lnL nfp DLC unique_Q\n",
"--------------------------------------------\n",
" -6992.5741 19 True \n",
"--------------------------------------------\n",
"\n",
"1 rows x 5 columns"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from cogent3.app import io, evo\n",
"\n",
"loader = io.load_aligned(format=\"fasta\", moltype=\"dna\")\n",
"aln = loader(\"../data/primate_brca1.fasta\")\n",
"model = evo.model(\"GTR\", \n",
" tree=\"../data/primate_brca1.tree\", \n",
" sm_args=dict(optimise_motif_probs=True))\n",
"result = model(aln)\n",
"result"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"GTR
\n",
"log-likelihood = -6992.5741
\n",
"number of free parameters = 19
\n",
"\n",
"\n",
"Global params\n",
"\n",
"A/C | \n",
"A/G | \n",
"A/T | \n",
"C/G | \n",
"C/T | \n",
"\n",
"\n",
"\n",
"1.2296 | \n",
"5.2478 | \n",
"0.9472 | \n",
"2.3389 | \n",
"5.9666 | \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"Edge params\n",
"\n",
"edge | \n",
"parent | \n",
"length | \n",
"\n",
"\n",
"\n",
"Galago | \n",
"root | \n",
"0.1727 | \n",
"
\n",
"\n",
"HowlerMon | \n",
"root | \n",
"0.0448 | \n",
"
\n",
"\n",
"Rhesus | \n",
"edge.3 | \n",
"0.0215 | \n",
"
\n",
"\n",
"Orangutan | \n",
"edge.2 | \n",
"0.0077 | \n",
"
\n",
"\n",
"Gorilla | \n",
"edge.1 | \n",
"0.0025 | \n",
"
\n",
"\n",
"Human | \n",
"edge.0 | \n",
"0.0060 | \n",
"
\n",
"\n",
"Chimpanzee | \n",
"edge.0 | \n",
"0.0028 | \n",
"
\n",
"\n",
"edge.0 | \n",
"edge.1 | \n",
"0.0000 | \n",
"
\n",
"\n",
"edge.1 | \n",
"edge.2 | \n",
"0.0034 | \n",
"
\n",
"\n",
"edge.2 | \n",
"edge.3 | \n",
"0.0119 | \n",
"
\n",
"\n",
"edge.3 | \n",
"root | \n",
"0.0076 | \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"Motif params\n",
"\n",
"A | \n",
"C | \n",
"G | \n",
"T | \n",
"\n",
"\n",
"\n",
"0.3792 | \n",
"0.1719 | \n",
"0.2066 | \n",
"0.2423 | \n",
"
\n",
"\n",
"
\n"
],
"text/plain": [
"GTR\n",
"log-likelihood = -6992.5741\n",
"number of free parameters = 19\n",
"==============================================\n",
" A/C A/G A/T C/G C/T\n",
"----------------------------------------------\n",
"1.2296 5.2478 0.9472 2.3389 5.9666\n",
"----------------------------------------------\n",
"==============================\n",
" edge parent length\n",
"------------------------------\n",
" Galago root 0.1727\n",
" HowlerMon root 0.0448\n",
" Rhesus edge.3 0.0215\n",
" Orangutan edge.2 0.0077\n",
" Gorilla edge.1 0.0025\n",
" Human edge.0 0.0060\n",
"Chimpanzee edge.0 0.0028\n",
" edge.0 edge.1 0.0000\n",
" edge.1 edge.2 0.0034\n",
" edge.2 edge.3 0.0119\n",
" edge.3 root 0.0076\n",
"------------------------------\n",
"====================================\n",
" A C G T\n",
"------------------------------------\n",
"0.3792 0.1719 0.2066 0.2423\n",
"------------------------------------"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result.lf"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:c3dev] *",
"language": "python",
"name": "conda-env-c3dev-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.1"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}