natsel_sitehet – a test of site heterogeneity

This app evaluates evidence for whether sites differ in their mode of natural selection (Nielsen and Yang 1998).

[1]:
from cogent3.app import io, evo

loader = io.load_aligned(format="fasta", moltype="dna")
aln = loader("../data/primate_brca1.fasta")

sites_differ = evo.natsel_sitehet("GNC",
                                  tree="../data/primate_brca1.tree",
                                  optimise_motif_probs=False)

result = sites_differ(aln)
result
[1]:
Statistics
LR df pvalue
1.4048 2 0.4954
hypothesis key lnL nfp DLC unique_Q
null 'GNC-null' -6708.3119 24 True
alt 'GNC-alt' -6707.6095 26 True

The models have been constructed such that site-class bins have names indicating the mode of natural selection: -ve is purifying (omega<1); neutral (omega=1); and +ve is positive natural selection (omega>1). The two parameters of interest relating to these are the bprobs (the maximum likelihood estimate of the frequency of the site-class) and the corresponding value of omega.

[2]:
result.alt.lf
[2]:

GNC-alt

log-likelihood = -6707.6095

number of free parameters = 26

Global params
A>C A>G A>T C>A C>G C>T G>A G>C G>T T>A
0.8530 3.5646 0.9734 1.6404 2.1801 6.3218 8.0814 1.2347 0.7829 1.2798
T>C
3.0292
Bin params
bin bprobs omega
-ve 0.1043 0.0000
neutral 0.8052 1.0000
+ve 0.0905 20.0000
Edge params
edge parent length
Galago root 0.5463
HowlerMon root 0.1364
Rhesus edge.3 0.0649
Orangutan edge.2 0.0235
Gorilla edge.1 0.0075
Human edge.0 0.0182
Chimpanzee edge.0 0.0085
edge.0 edge.1 0.0000
edge.1 edge.2 0.0099
edge.2 edge.3 0.0364
edge.3 root 0.0233
Motif params
AAA AAC AAG AAT ACA ACC ACG ACT AGA AGC
0.0556 0.0235 0.0344 0.0556 0.0228 0.0046 0.0008 0.0289 0.0231 0.0286
AGG AGT ATA ATC ATG ATT CAA CAC CAG CAT
0.0140 0.0381 0.0186 0.0070 0.0128 0.0192 0.0196 0.0052 0.0238 0.0221
CCA CCC CCG CCT CGA CGC CGG CGT CTA CTC
0.0195 0.0062 0.0006 0.0263 0.0011 0.0009 0.0023 0.0032 0.0137 0.0078
CTG CTT GAA GAC GAG GAT GCA GCC GCG GCT
0.0125 0.0105 0.0755 0.0105 0.0303 0.0315 0.0158 0.0096 0.0014 0.0137
GGA GGC GGG GGT GTA GTC GTG GTT TAC TAT
0.0161 0.0090 0.0067 0.0133 0.0148 0.0070 0.0069 0.0213 0.0023 0.0101
TCA TCC TCG TCT TGC TGG TGT TTA TTC TTG
0.0221 0.0082 0.0015 0.0251 0.0018 0.0040 0.0201 0.0212 0.0078 0.0108
TTT
0.0187

Getting the individual site posterior probabilities

I’m just displaying the posterior-probabuilities from the first 20 positions only.

[3]:
bprobs = result.alt.lf.get_bin_probs()
bprobs[:, :20]
[3]:
0 1 2 3 4 5 6 7 8 9
-ve 0.149 0.084 0.000 0.131 0.115 0.157 0.084 0.119 0.102 0.080
neutral 0.773 0.813 0.864 0.785 0.796 0.767 0.812 0.793 0.803 0.814
+ve 0.078 0.103 0.136 0.084 0.089 0.076 0.103 0.088 0.095 0.106
10 11 12 13 14 15 16 17 18 19
-ve 0.076 0.157 0.094 0.000 0.156 0.517 0.080 0.070 0.115 0.122
neutral 0.816 0.767 0.807 0.865 0.767 0.482 0.814 0.820 0.796 0.791
+ve 0.108 0.076 0.099 0.135 0.077 0.000 0.106 0.111 0.089 0.087