Applying a discrete-time, non-stationary nucleotide model

We fit a discrete-time Markov nucleotide model. This corresponds to a Barry and Hartigan 1987 model.

[1]:
from cogent3.app import io, evo

loader = io.load_aligned(format="fasta", moltype="dna")
aln = loader("../data/primate_brca1.fasta")
model = evo.model("BH", tree="../data/primate_brca1.tree")
result = model(aln)
result
[1]:
BH
key lnL nfp DLC unique_Q
-6941.4684 135 True

NOTE: DLC stands for diagonal largest in column and the value is a check on the identifiability of the model. unique_Q is not applicable to a discrete-time model and so remains as None.

Looking at the likelihood function, you will

[2]:
result.lf
[2]:

BH

log-likelihood = -6941.4684

number of free parameters = 135

Edge motif motif2 params
edge motif motif2 psubs
Galago T T 0.8751
Galago T C 0.0649
Galago T A 0.0409
Galago T G 0.0192
Galago C T 0.1126
... ... ... ...
edge.3 A G 0.0055
edge.3 G T 0.0000
edge.3 G C 0.0011
edge.3 G A 0.0039
edge.3 G G 0.9950
Motif params
A C G T
0.3774 0.1763 0.2058 0.2404

Get a tree with branch lengths as paralinear

This is the only possible length metric for a discrete-time process.

[3]:
tree = result.tree
fig = tree.get_figure()
fig.scale_bar = "top right"
fig.show(width=500, height=500)

Getting parameter estimates

For a discrete-time model, aside from the root motif probabilities, everything is edge specific. But note that the tabular_result has different keys from the continuous-time case, as demonstrated below.

[4]:
tabulator = evo.tabulate_stats()
stats = tabulator(result)
stats
[4]:
2x tabular_result('edge motif motif2 params': Table, 'motif params': Table)
[5]:
stats['edge motif motif2 params']
[5]:
edge motif motif2 params
edge motif motif2 psubs
Galago T T 0.8751
Galago T C 0.0649
Galago T A 0.0409
Galago T G 0.0192
Galago C T 0.1126
... ... ... ...
edge.3 A G 0.0055
edge.3 G T 0.0000
edge.3 G C 0.0011
edge.3 G A 0.0039
edge.3 G G 0.9950

176 rows x 4 columns