{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Extracting maximum likelihood estimates from a `model_result`\n", "\n", "If you want to get the stats out-of a fitted model, use the `evo.tabulate_stats()` app.\n", "\n", "We first fit a model." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from cogent3.app import io, evo\n", "\n", "loader = io.load_aligned(format=\"fasta\", moltype=\"dna\")\n", "aln = loader(\"../data/primate_brca1.fasta\")\n", "model = evo.model(\"GN\", tree=\"../data/primate_brca1.tree\")\n", "result = model(aln)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create and apply `tabulate_stats` app" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3x tabular_result('global params': Table, 'edge params': Table, 'motif params': Table)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tabulator = evo.tabulate_stats()\n", "tabulated = tabulator(result)\n", "tabulated" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`tabulated` is a `tabular_result` instance which, like other result types, has `dict` like behaviour. It also contains key/value pairs for each model parameter type." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Edge parameters\n", "\n", "These are all parameters that differ between edges. Since the current model is time-homogeneous (a single rate matrix), only the table only has entries for the branch scalar (denoted \"length\")." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
edge params
edgeparentlength
Galagoroot0.1735
HowlerMonroot0.0450
Rhesusedge.30.0215
Orangutanedge.20.0078
Gorillaedge.10.0025
Humanedge.00.0061
Chimpanzeeedge.00.0028
edge.0edge.10.0000
edge.1edge.20.0033
edge.2edge.30.0121
edge.3root0.0077
\n", "

\n", "11 rows x 3 columns

" ], "text/plain": [ "edge params\n", "==============================\n", " edge parent length\n", "------------------------------\n", " Galago root 0.1735\n", " HowlerMon root 0.0450\n", " Rhesus edge.3 0.0215\n", " Orangutan edge.2 0.0078\n", " Gorilla edge.1 0.0025\n", " Human edge.0 0.0061\n", "Chimpanzee edge.0 0.0028\n", " edge.0 edge.1 0.0000\n", " edge.1 edge.2 0.0033\n", " edge.2 edge.3 0.0121\n", " edge.3 root 0.0077\n", "------------------------------\n", "\n", "11 rows x 3 columns" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tabulated[\"edge params\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**NOTE:** Unless the model is time-reversible, the lengths in that table are not ENS ([Kaehler et al](https://academic.oup.com/gbe/article-lookup/doi/10.1093/gbe/evw308)). As we used a non-stationary nucleotide model in this example, the length values are a scalar used to adjust the matrices during optimisation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Global parameters\n", "\n", "In this example, these are the elements of the rate matrix." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
global params
A>CA>GA>TC>AC>GC>TG>AG>CG>TT>A
0.87003.66690.91111.59252.12646.03238.21781.22880.62941.2498
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
T>C
3.4136
\n", "

\n", "1 rows x 11 columns

" ], "text/plain": [ "global params\n", "============================================================================\n", " A>C A>G A>T C>A C>G C>T G>A G>C\n", "----------------------------------------------------------------------------\n", "0.8700 3.6669 0.9111 1.5925 2.1264 6.0323 8.2178 1.2288\n", "----------------------------------------------------------------------------\n", "\n", "continued: global params\n", "==========================\n", " G>T T>A T>C\n", "--------------------------\n", "0.6294 1.2498 3.4136\n", "--------------------------\n", "\n", "\n", "1 rows x 11 columns" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tabulated[\"global params\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Motif parameters\n", "\n", "In the current example, these are estimates of the nucleotide probabilities in the unobserved ancestor." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
motif params
ACGT
0.37560.17680.20780.2398
\n", "

\n", "1 rows x 4 columns

" ], "text/plain": [ "motif params\n", "====================================\n", " A C G T\n", "------------------------------------\n", "0.3756 0.1768 0.2078 0.2398\n", "------------------------------------\n", "\n", "1 rows x 4 columns" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tabulated[\"motif params\"]" ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:c3dev] *", "language": "python", "name": "conda-env-c3dev-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.1" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }