Performing a parametric bootstrap

Section author: Gavin Huttley

This file contains an example for estimating the probability of a Likelihood ratio statistic obtained from a relative rate test. The bootstrap classes can take advantage of parallel architectures.

From cogent3 import all the components we need.

>>> from cogent3 import load_aligned_seqs, load_tree
>>> from cogent3.evolve import bootstrap
>>> from cogent3.evolve.models import HKY85
>>> from cogent3.maths import stats

Define the null model that takes an alignment object and returns a likelihood function properly assembled for optimising the likelihood under the null hypothesis. The sample distribution is generated using this model.

We will use a HKY model.

>>> def create_alt_function():
...     t = load_tree("data/test.tree")
...     sm = HKY85()
...     return sm.make_likelihood_function(t)

Define a function that takes an alignment object and returns an appropriately assembled function for the alternative model. Since the two models are identical bar the constraint on the branch lengths, we’ll use the same code to generate the basic likelihood function as for the alt model, and then apply the constraint here

>>> def create_null_function():
...     lf = create_alt_function()
...     # set the local clock for humans & howler monkey
...     lf.set_local_clock("Human", "HowlerMon")
...     return lf

Get our observed data alignment

>>> aln = load_aligned_seqs("data/long_testseqs.fasta")

Create a EstimateProbability bootstrap instance

>>> estimateP = bootstrap.EstimateProbability(create_null_function(),
...                                       create_alt_function(),
...                                       aln)

Specify how many random samples we want it to generate. Here we use a very small number of replicates only for the purpose of testing.

>>> estimateP.set_num_replicates(5)

Run it.

>>> estimateP.run(show_progress=False)

Get the estimated probability.

>>> p = estimateP.get_estimated_prob()

p is a floating point value, as you’d expect. Grab the estimated likelihoods (null and alternate) for the observed data.

>>> print('%.2f, %.2f' % estimateP.get_observed_lnL())
-8751.94, -8750.59