Apply a non-stationary nucleotide model to an alignment with a tree

We analyse an alignment with sequences from 6 primates.

[1]:
from cogent3.app import io

reader = io.load_aligned(format="fasta", moltype="dna")
aln = reader("../data/primate_brca1.fasta")
aln.names
[1]:
['Chimpanzee',
 'Galago',
 'Gorilla',
 'HowlerMon',
 'Human',
 'Orangutan',
 'Rhesus']

Specify the tree via a tree instance

[2]:
from cogent3 import load_tree
from cogent3.app import evo

tree = load_tree("../data/primate_brca1.tree")
gn = evo.model("GN", tree=tree)
gn
[2]:
model(type='model', sm='GN', tree='root', name=None, sm_args=None, lf_args=None, time_het=None, param_rules=None, opt_args=None, split_codons=False, show_progress=False, verbose=False)

Specify the tree via a path.

[3]:
gn = evo.model("GN", tree="../data/primate_brca1.tree")
gn
[3]:
model(type='model', sm='GN', tree='../data/primate_brca1.tree', name=None, sm_args=None, lf_args=None, time_het=None, param_rules=None, opt_args=None, split_codons=False, show_progress=False, verbose=False)

Apply the model to an alignment

[4]:
fitted = gn(aln)
fitted
[4]:
GN
key lnL nfp DLC unique_Q
-6987.8834 25 True

In the above, no value is shown for unique_Q. This can happen because of numerical precision issues.

NOTE: in the display of the lf below, the “length” parameter is not the ENS. It is, instead, just a scalar.

[5]:
fitted.lf
[5]:

GN

log-likelihood = -6987.8834

number of free parameters = 25

Global params
A>C A>G A>T C>A C>G C>T G>A G>C G>T T>A
0.8700 3.6669 0.9111 1.5925 2.1264 6.0323 8.2178 1.2288 0.6294 1.2498
T>C
3.4136
Edge params
edge parent length
Galago root 0.1735
HowlerMon root 0.0450
Rhesus edge.3 0.0215
Orangutan edge.2 0.0078
Gorilla edge.1 0.0025
Human edge.0 0.0061
Chimpanzee edge.0 0.0028
edge.0 edge.1 0.0000
edge.1 edge.2 0.0033
edge.2 edge.3 0.0121
edge.3 root 0.0077
Motif params
A C G T
0.3756 0.1768 0.2078 0.2398