Apply a non-stationary nucleotide model to an alignment with 3 sequences¶

We load some sample data first and select just 3 sequences, all via apps.

[1]:

from cogent3.app import io, sample

reader = io.load_aligned(format="fasta", moltype="dna")
select_seqs = sample.take_named_seqs("Human", "Rhesus", "Galago")
process = reader + select_seqs
aln = process("../data/primate_brca1.fasta")
aln.names

[1]:

['Human', 'Rhesus', 'Galago']

We analyses these using the general Markov nucleotide, GN, model. Because we analyse just 3 sequences, there is only one possible unrooted tree. It’s not required to specify the tree in this instance.

[2]:

from cogent3.app import evo

gn = evo.model("GN")
gn

[2]:

model(type='model', sm='GN', tree=None, name=None, sm_args=None, lf_args=None, time_het=None, param_rules=None, opt_args=None, split_codons=False, show_progress=False, verbose=False)

We apply this to aln.

[3]:

fitted = gn(aln)
type(fitted)

[3]:

cogent3.app.result.model_result

`model_result`¶

As the output above indicates, fitted is a model_result object.

This object provides an interface for accessing attributes of a fitted model. The representation display (below), a styled table in a jupyter notebook, presents a summary view with the log-likelihood (lnL), number of free parameters (nfp) and whether all matrices satisfied the identifiability conditions diagonal largest in column (DLC) and a unique mapping of Q to P. (For description of these quantities and why they matter see Chang 1996 and Kaehler et al.)

model_result has dictionary behaviour, hence the key column. This will be demonstrated below.

[4]:

fitted

[4]:

GN
key	lnL	nfp	DLC	unique_Q
	-5964.0129	17	True	True

More detail on the fitted model are available via attributes. For instance, display the maximum likelihood estimates via the likelihood function attribute

[5]:

fitted.lf

[5]:

GN

log-likelihood = -5964.0129

number of free parameters = 17

Global params
A>C	A>G	A>T	C>A	C>G	C>T	G>A	G>C	G>T	T>A
1.0628	3.1833	1.0207	1.7952	2.3276	5.6847	9.0910	1.1136	0.8313	1.4997

T>C
3.5575

Edge params
edge	parent	length
Human	root	0.0214
Rhesus	root	0.0208
Galago	root	0.1781

Motif params
A	C	G	T
0.3740	0.1755	0.2098	0.2408

[6]:

fitted.lnL, fitted.nfp

[6]:

(-5964.012906757179, 17)

[7]:

fitted.source

[7]:

'../data/primate_brca1.fasta'

The model_result.tree attribute is an “annotated tree”. Maximum likelihood estimates from the model have been assigned to the tree. Of particular significance, the “length” attribute corresponds to the expected number of substitutions (or ENS). For a non-stationary model, like GN, this can be different to the conventional length (Kaehler et al).

[8]:

fitted.tree, fitted.alignment

[8]:

(Tree("(Human,Rhesus,Galago)root;"),
 3 x 2814 dna alignment: Human[TGTGGCACAAA...], Rhesus[TGTGGCACAAA...], Galago[TGTGGCAAAAA...])

We can access the sum of all branch lengths. Either as “ENS” or “paralinear” using the total_length() method.

[9]:

fitted.total_length(length_as="paralinear")

[9]:

0.9280297908804234

Fitting a separate nucleotide model to each codon position¶

Controlled by setting split_codons=True.

[10]:

gn = evo.model("GN", split_codons=True)

fitted = gn(aln)
fitted

[10]:

GN
key	lnL	nfp	DLC	unique_Q
	-5866.9371	51	True	True
1	-1955.5141	17
2	-1934.2598	17
3	-1977.1632	17

The model fit statistics, lnL and nfp are now sums of the corresponding values from the fits to the individual positions. The DLC and unique_Q are also a summary across all models. These only achieve the value True when all matrices, from all models, satisfy the condition.

We get access to the likelihood functions of the individual positions via the indicated dict keys.

[11]:

fitted[3]

[11]:

3

log-likelihood = -1977.1632

number of free parameters = 17

Global params
A>C	A>G	A>T	C>A	C>G	C>T	G>A	G>C	G>T	T>A
1.7484	3.4215	2.4111	2.3729	3.4060	16.1603	16.9889	2.0343	1.7432	1.9241

T>C
5.2795

Edge params
edge	parent	length
Human	root	0.0243
Rhesus	root	0.0319
Galago	root	0.1797

Motif params
A	C	G	T
0.3375	0.1395	0.1647	0.3583

cogent3

Navigation

Related Topics

Apply a non-stationary nucleotide model to an alignment with 3 sequences¶

`model_result`¶

GN

Fitting a separate nucleotide model to each codon position¶

3

Apply a non-stationary nucleotide model to an alignment with 3 sequences¶

model_result¶

GN

Fitting a separate nucleotide model to each codon position¶

3

`model_result`¶