Performing a relative rate test

Section author: Gavin Huttley

From cogent3 import all the components we need

>>> from cogent3 import load_aligned_seqs, load_tree
>>> from cogent3.evolve.models import HKY85
>>> from cogent3.maths import stats

Get your alignment and tree.

>>> aln = load_aligned_seqs("data/long_testseqs.fasta")
>>> t = load_tree(filename="data/test.tree")

Create a HKY85 model.

>>> sm = HKY85()

Make the controller object and limit the display precision (to decrease the chance that small differences in estimates cause tests of the documentation to fail).

>>> lf = sm.make_likelihood_function(t, digits=2, space=3)

Set the local clock for humans & Howler Monkey. This method is just a special interface to the more general set_param_rules method.

>>> lf.set_local_clock("Human", "HowlerMon")

Get the likelihood function object this object performs the actual likelihood calculation.

>>> lf.set_alignment(aln)

Optimise the function capturing the return optimised lnL, and parameter value vector.

>>> lf.optimise(show_progress=False)

View the resulting maximum-likelihood parameter values.

>>> lf.set_name("clock")
>>> print(lf)
clock
log-likelihood = -8751.9425
number of free parameters = 7
=====
kappa
-----
 4.10
-----
===========================
     edge   parent   length
---------------------------
    Human   edge.0     0.04
HowlerMon   edge.0     0.04
   edge.0   edge.1     0.04
    Mouse   edge.1     0.28
   edge.1     root     0.02
NineBande     root     0.09
 DogFaced     root     0.11
---------------------------
=========================
   A      C      G      T
-------------------------
0.37   0.19   0.21   0.23
-------------------------

We extract the log-likelihood and number of free parameters for later use.

>>> null_lnL = lf.get_log_likelihood()
>>> null_nfp = lf.get_num_free_params()

Clear the local clock constraint, freeing up the branch lengths.

>>> lf.set_param_rule('length', is_independent=True)

Run the optimiser capturing the return optimised lnL, and parameter value vector.

>>> lf.optimise(show_progress=False)

View the resulting maximum-likelihood parameter values.

>>> lf.set_name("non clock")
>>> print(lf)
non clock
log-likelihood = -8750.5889
number of free parameters = 8
=====
kappa
-----
 4.10
-----
===========================
     edge   parent   length
---------------------------
    Human   edge.0     0.03
HowlerMon   edge.0     0.04
   edge.0   edge.1     0.04
    Mouse   edge.1     0.28
   edge.1     root     0.02
NineBande     root     0.09
 DogFaced     root     0.11
---------------------------
=========================
   A      C      G      T
-------------------------
0.37   0.19   0.21   0.23
-------------------------

These two lnL’s are now used to calculate the likelihood ratio statistic it’s degrees-of-freedom and the probability of observing the LR.

>>> LR = 2 * (lf.get_log_likelihood() - null_lnL)
>>> df = lf.get_num_free_params() - null_nfp
>>> P = stats.chisqprob(LR, df)

Print this and look up a \(\chi^2\) with number of edges - 1 degrees of freedom.

>>> print("Likelihood ratio statistic = ", LR)
Likelihood ratio statistic =  2.7...
>>> print("degrees-of-freedom = ", df)
degrees-of-freedom =  1
>>> print("probability = ", P)
probability =  0.09...