{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Testing a hypothesis -- non-stationary or time-reversible\n", "\n", "We evaluate whether the GTR model is sufficient for a data set, compared with the GN (non-stationary general nucleotide model)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from cogent3.app import io, evo, sample\n", "\n", "loader = io.load_aligned(format=\"fasta\", moltype=\"dna\")\n", "aln = loader(\"../data/primate_brca1.fasta\")" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "cogent3.app.result.hypothesis_result" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tree = \"../data/primate_brca1.tree\"\n", "sm_args = dict(optimise_motif_probs=True)\n", "\n", "null = evo.model(\"GTR\", tree=tree, sm_args=sm_args)\n", "alt = evo.model(\"GN\", tree=tree, sm_args=sm_args)\n", "hyp = evo.hypothesis(null, alt)\n", "result = hyp(aln)\n", "type(result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`result` is a `hypothesis_result` object. The `repr()` displays the likelihood ratio test statistic, degrees of freedom and associated p-value>" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Statistics
LRdfpvalue
9.381360.1532
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
hypothesiskeylnLnfpDLCunique_Q
null'GTR'-6992.574119True
alt'GN'-6987.883425True
\n" ], "text/plain": [ "Statistics\n", "======================\n", " LR df pvalue\n", "----------------------\n", "9.3813 6 0.1532\n", "----------------------\n", "============================================================\n", "hypothesis key lnL nfp DLC unique_Q\n", "------------------------------------------------------------\n", " null 'GTR' -6992.5741 19 True \n", " alt 'GN' -6987.8834 25 True \n", "------------------------------------------------------------" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case, we accept the null given the p-value is > 0.05. We still use this object to demonstrate the properties of a `hypothesis_result`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `hypothesis_result` has attributes and keys" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Accessing the test statistics" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(9.381277657315877, 6, 0.15324334527517805)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result.LR, result.df, result.pvalue" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The null hypothesis\n", "\n", "This model is accessed via the `null` attribute." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
GTR
keylnLnfpDLCunique_Q
-6992.574119True
\n" ], "text/plain": [ "GTR\n", "============================================\n", "key lnL nfp DLC unique_Q\n", "--------------------------------------------\n", " -6992.5741 19 True \n", "--------------------------------------------\n", "\n", "1 rows x 5 columns" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result.null" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

GTR

\n", "

log-likelihood = -6992.5741

\n", "

number of free parameters = 19

\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Global params
A/CA/GA/TC/GC/T
1.22965.24780.94722.33895.9666
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Edge params
edgeparentlength
Galagoroot0.1727
HowlerMonroot0.0448
Rhesusedge.30.0215
Orangutanedge.20.0077
Gorillaedge.10.0025
Humanedge.00.0060
Chimpanzeeedge.00.0028
edge.0edge.10.0000
edge.1edge.20.0034
edge.2edge.30.0119
edge.3root0.0076
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Motif params
ACGT
0.37920.17190.20660.2423
\n" ], "text/plain": [ "GTR\n", "log-likelihood = -6992.5741\n", "number of free parameters = 19\n", "==============================================\n", " A/C A/G A/T C/G C/T\n", "----------------------------------------------\n", "1.2296 5.2478 0.9472 2.3389 5.9666\n", "----------------------------------------------\n", "==============================\n", " edge parent length\n", "------------------------------\n", " Galago root 0.1727\n", " HowlerMon root 0.0448\n", " Rhesus edge.3 0.0215\n", " Orangutan edge.2 0.0077\n", " Gorilla edge.1 0.0025\n", " Human edge.0 0.0060\n", "Chimpanzee edge.0 0.0028\n", " edge.0 edge.1 0.0000\n", " edge.1 edge.2 0.0034\n", " edge.2 edge.3 0.0119\n", " edge.3 root 0.0076\n", "------------------------------\n", "====================================\n", " A C G T\n", "------------------------------------\n", "0.3792 0.1719 0.2066 0.2423\n", "------------------------------------" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result.null.lf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The alternate hypothesis" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

GN

\n", "

log-likelihood = -6987.8834

\n", "

number of free parameters = 25

\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Global params
A>CA>GA>TC>AC>GC>TG>AG>CG>TT>A
0.87003.66700.91111.59252.12646.03248.21781.22880.62941.2499
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
T>C
3.4136
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Edge params
edgeparentlength
Galagoroot0.1735
HowlerMonroot0.0450
Rhesusedge.30.0215
Orangutanedge.20.0078
Gorillaedge.10.0025
Humanedge.00.0061
Chimpanzeeedge.00.0028
edge.0edge.10.0000
edge.1edge.20.0033
edge.2edge.30.0121
edge.3root0.0077
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Motif params
ACGT
0.37560.17680.20780.2398
\n" ], "text/plain": [ "GN\n", "log-likelihood = -6987.8834\n", "number of free parameters = 25\n", "============================================================================\n", " A>C A>G A>T C>A C>G C>T G>A G>C\n", "----------------------------------------------------------------------------\n", "0.8700 3.6670 0.9111 1.5925 2.1264 6.0324 8.2178 1.2288\n", "----------------------------------------------------------------------------\n", "\n", "continued: \n", "==========================\n", " G>T T>A T>C\n", "--------------------------\n", "0.6294 1.2499 3.4136\n", "--------------------------\n", "\n", "==============================\n", " edge parent length\n", "------------------------------\n", " Galago root 0.1735\n", " HowlerMon root 0.0450\n", " Rhesus edge.3 0.0215\n", " Orangutan edge.2 0.0078\n", " Gorilla edge.1 0.0025\n", " Human edge.0 0.0061\n", "Chimpanzee edge.0 0.0028\n", " edge.0 edge.1 0.0000\n", " edge.1 edge.2 0.0033\n", " edge.2 edge.3 0.0121\n", " edge.3 root 0.0077\n", "------------------------------\n", "====================================\n", " A C G T\n", "------------------------------------\n", "0.3756 0.1768 0.2078 0.2398\n", "------------------------------------" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result.alt.lf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Saving hypothesis results\n", "\n", "You are advised to save these results as json using the standard json writer, or the db writer.\n", "\n", "This following would write the result into a `tinydb`.\n", "\n", "```python\n", "from cogent3.app.io import write_db\n", "\n", "writer = write_db(\"path/to/myresults.tinydb\", create=True, if_exists=\"overwrite\")\n", "writer(result)\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:c3dev] *", "language": "python", "name": "conda-env-c3dev-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.1" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }