{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Testing a hypothesis -- non-stationary or time-reversible\n",
"\n",
"We evaluate whether the GTR model is sufficient for a data set, compared with the GN (non-stationary general nucleotide model)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from cogent3.app import io, evo, sample\n",
"\n",
"loader = io.load_aligned(format=\"fasta\", moltype=\"dna\")\n",
"aln = loader(\"../data/primate_brca1.fasta\")"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"cogent3.app.result.hypothesis_result"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tree = \"../data/primate_brca1.tree\"\n",
"sm_args = dict(optimise_motif_probs=True)\n",
"\n",
"null = evo.model(\"GTR\", tree=tree, sm_args=sm_args)\n",
"alt = evo.model(\"GN\", tree=tree, sm_args=sm_args)\n",
"hyp = evo.hypothesis(null, alt)\n",
"result = hyp(aln)\n",
"type(result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`result` is a `hypothesis_result` object. The `repr()` displays the likelihood ratio test statistic, degrees of freedom and associated p-value>"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"Statistics\n",
"\n",
"LR | \n",
"df | \n",
"pvalue | \n",
"\n",
"\n",
"\n",
"9.3813 | \n",
"6 | \n",
"0.1532 | \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"\n",
"hypothesis | \n",
"key | \n",
"lnL | \n",
"nfp | \n",
"DLC | \n",
"unique_Q | \n",
"\n",
"\n",
"\n",
"null | \n",
"'GTR' | \n",
"-6992.5741 | \n",
"19 | \n",
"True | \n",
" | \n",
"
\n",
"\n",
"alt | \n",
"'GN' | \n",
"-6987.8834 | \n",
"25 | \n",
"True | \n",
" | \n",
"
\n",
"\n",
"
\n"
],
"text/plain": [
"Statistics\n",
"======================\n",
" LR df pvalue\n",
"----------------------\n",
"9.3813 6 0.1532\n",
"----------------------\n",
"============================================================\n",
"hypothesis key lnL nfp DLC unique_Q\n",
"------------------------------------------------------------\n",
" null 'GTR' -6992.5741 19 True \n",
" alt 'GN' -6987.8834 25 True \n",
"------------------------------------------------------------"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this case, we accept the null given the p-value is > 0.05. We still use this object to demonstrate the properties of a `hypothesis_result`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `hypothesis_result` has attributes and keys"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Accessing the test statistics"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(9.381277657315877, 6, 0.15324334527517805)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result.LR, result.df, result.pvalue"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The null hypothesis\n",
"\n",
"This model is accessed via the `null` attribute."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"GTR\n",
"\n",
"key | \n",
"lnL | \n",
"nfp | \n",
"DLC | \n",
"unique_Q | \n",
"\n",
"\n",
"\n",
" | \n",
"-6992.5741 | \n",
"19 | \n",
"True | \n",
" | \n",
"
\n",
"\n",
"
\n"
],
"text/plain": [
"GTR\n",
"============================================\n",
"key lnL nfp DLC unique_Q\n",
"--------------------------------------------\n",
" -6992.5741 19 True \n",
"--------------------------------------------\n",
"\n",
"1 rows x 5 columns"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result.null"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"GTR
\n",
"log-likelihood = -6992.5741
\n",
"number of free parameters = 19
\n",
"\n",
"\n",
"Global params\n",
"\n",
"A/C | \n",
"A/G | \n",
"A/T | \n",
"C/G | \n",
"C/T | \n",
"\n",
"\n",
"\n",
"1.2296 | \n",
"5.2478 | \n",
"0.9472 | \n",
"2.3389 | \n",
"5.9666 | \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"Edge params\n",
"\n",
"edge | \n",
"parent | \n",
"length | \n",
"\n",
"\n",
"\n",
"Galago | \n",
"root | \n",
"0.1727 | \n",
"
\n",
"\n",
"HowlerMon | \n",
"root | \n",
"0.0448 | \n",
"
\n",
"\n",
"Rhesus | \n",
"edge.3 | \n",
"0.0215 | \n",
"
\n",
"\n",
"Orangutan | \n",
"edge.2 | \n",
"0.0077 | \n",
"
\n",
"\n",
"Gorilla | \n",
"edge.1 | \n",
"0.0025 | \n",
"
\n",
"\n",
"Human | \n",
"edge.0 | \n",
"0.0060 | \n",
"
\n",
"\n",
"Chimpanzee | \n",
"edge.0 | \n",
"0.0028 | \n",
"
\n",
"\n",
"edge.0 | \n",
"edge.1 | \n",
"0.0000 | \n",
"
\n",
"\n",
"edge.1 | \n",
"edge.2 | \n",
"0.0034 | \n",
"
\n",
"\n",
"edge.2 | \n",
"edge.3 | \n",
"0.0119 | \n",
"
\n",
"\n",
"edge.3 | \n",
"root | \n",
"0.0076 | \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"Motif params\n",
"\n",
"A | \n",
"C | \n",
"G | \n",
"T | \n",
"\n",
"\n",
"\n",
"0.3792 | \n",
"0.1719 | \n",
"0.2066 | \n",
"0.2423 | \n",
"
\n",
"\n",
"
\n"
],
"text/plain": [
"GTR\n",
"log-likelihood = -6992.5741\n",
"number of free parameters = 19\n",
"==============================================\n",
" A/C A/G A/T C/G C/T\n",
"----------------------------------------------\n",
"1.2296 5.2478 0.9472 2.3389 5.9666\n",
"----------------------------------------------\n",
"==============================\n",
" edge parent length\n",
"------------------------------\n",
" Galago root 0.1727\n",
" HowlerMon root 0.0448\n",
" Rhesus edge.3 0.0215\n",
" Orangutan edge.2 0.0077\n",
" Gorilla edge.1 0.0025\n",
" Human edge.0 0.0060\n",
"Chimpanzee edge.0 0.0028\n",
" edge.0 edge.1 0.0000\n",
" edge.1 edge.2 0.0034\n",
" edge.2 edge.3 0.0119\n",
" edge.3 root 0.0076\n",
"------------------------------\n",
"====================================\n",
" A C G T\n",
"------------------------------------\n",
"0.3792 0.1719 0.2066 0.2423\n",
"------------------------------------"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result.null.lf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The alternate hypothesis"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"GN
\n",
"log-likelihood = -6987.8834
\n",
"number of free parameters = 25
\n",
"\n",
"\n",
"Global params\n",
"\n",
"A>C | \n",
"A>G | \n",
"A>T | \n",
"C>A | \n",
"C>G | \n",
"C>T | \n",
"G>A | \n",
"G>C | \n",
"G>T | \n",
"T>A | \n",
"\n",
"\n",
"\n",
"0.8700 | \n",
"3.6670 | \n",
"0.9111 | \n",
"1.5925 | \n",
"2.1264 | \n",
"6.0324 | \n",
"8.2178 | \n",
"1.2288 | \n",
"0.6294 | \n",
"1.2499 | \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"T>C | \n",
"\n",
"\n",
"\n",
"3.4136 | \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"Edge params\n",
"\n",
"edge | \n",
"parent | \n",
"length | \n",
"\n",
"\n",
"\n",
"Galago | \n",
"root | \n",
"0.1735 | \n",
"
\n",
"\n",
"HowlerMon | \n",
"root | \n",
"0.0450 | \n",
"
\n",
"\n",
"Rhesus | \n",
"edge.3 | \n",
"0.0215 | \n",
"
\n",
"\n",
"Orangutan | \n",
"edge.2 | \n",
"0.0078 | \n",
"
\n",
"\n",
"Gorilla | \n",
"edge.1 | \n",
"0.0025 | \n",
"
\n",
"\n",
"Human | \n",
"edge.0 | \n",
"0.0061 | \n",
"
\n",
"\n",
"Chimpanzee | \n",
"edge.0 | \n",
"0.0028 | \n",
"
\n",
"\n",
"edge.0 | \n",
"edge.1 | \n",
"0.0000 | \n",
"
\n",
"\n",
"edge.1 | \n",
"edge.2 | \n",
"0.0033 | \n",
"
\n",
"\n",
"edge.2 | \n",
"edge.3 | \n",
"0.0121 | \n",
"
\n",
"\n",
"edge.3 | \n",
"root | \n",
"0.0077 | \n",
"
\n",
"\n",
"
\n",
"\n",
"\n",
"\n",
"Motif params\n",
"\n",
"A | \n",
"C | \n",
"G | \n",
"T | \n",
"\n",
"\n",
"\n",
"0.3756 | \n",
"0.1768 | \n",
"0.2078 | \n",
"0.2398 | \n",
"
\n",
"\n",
"
\n"
],
"text/plain": [
"GN\n",
"log-likelihood = -6987.8834\n",
"number of free parameters = 25\n",
"============================================================================\n",
" A>C A>G A>T C>A C>G C>T G>A G>C\n",
"----------------------------------------------------------------------------\n",
"0.8700 3.6670 0.9111 1.5925 2.1264 6.0324 8.2178 1.2288\n",
"----------------------------------------------------------------------------\n",
"\n",
"continued: \n",
"==========================\n",
" G>T T>A T>C\n",
"--------------------------\n",
"0.6294 1.2499 3.4136\n",
"--------------------------\n",
"\n",
"==============================\n",
" edge parent length\n",
"------------------------------\n",
" Galago root 0.1735\n",
" HowlerMon root 0.0450\n",
" Rhesus edge.3 0.0215\n",
" Orangutan edge.2 0.0078\n",
" Gorilla edge.1 0.0025\n",
" Human edge.0 0.0061\n",
"Chimpanzee edge.0 0.0028\n",
" edge.0 edge.1 0.0000\n",
" edge.1 edge.2 0.0033\n",
" edge.2 edge.3 0.0121\n",
" edge.3 root 0.0077\n",
"------------------------------\n",
"====================================\n",
" A C G T\n",
"------------------------------------\n",
"0.3756 0.1768 0.2078 0.2398\n",
"------------------------------------"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result.alt.lf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Saving hypothesis results\n",
"\n",
"You are advised to save these results as json using the standard json writer, or the db writer.\n",
"\n",
"This following would write the result into a `tinydb`.\n",
"\n",
"```python\n",
"from cogent3.app.io import write_db\n",
"\n",
"writer = write_db(\"path/to/myresults.tinydb\", create=True, if_exists=\"overwrite\")\n",
"writer(result)\n",
"```"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:c3dev] *",
"language": "python",
"name": "conda-env-c3dev-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.1"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}