natsel_zhang
– a branch-site test¶
This is the hypothesis test presented in Zhang et al. This test evaluates the hypothesis that a set of sites have undergone positive natural selection on a pre-specified set of lineages.
For this model class, there are groups of branches for which all positions are evolving neutrally but some proportion of those neutrally evolving sites change to adaptively evolving on so-called foreground edges. For the current example, we’ll define the Chimpanzee and Human branches as foreground and everything else as background. The following table defines the parameter scopes.
Site Class |
Proportion |
Background Edges |
Foreground Edges |
---|---|---|---|
0 |
p0 |
0 < omega0 < 1 |
0 < omega0 < 1 |
1 |
p1 |
omega1 = 1 |
omega1 = 1 |
2a |
p2 |
0 < omega0 < 1 |
0 < omega2 > 1 |
2b |
p3 |
omega1 = 1 |
0 < omega0 < 1 |
NOTE: Our implementation is not as parametrically succinct as that of Zhang et al, we have 1 additional bin probability.
[1]:
from cogent3.app import io, evo
loader = io.load_aligned(format="fasta", moltype="dna")
aln = loader("../data/primate_brca1.fasta")
zhang_test = evo.natsel_zhang("GNC",
tree="../data/primate_brca1.tree",
optimise_motif_probs=False,
tip1="Human",
tip2="Chimpanzee")
result = zhang_test(aln)
result
[1]:
LR | df | pvalue |
---|---|---|
4.9647 | 3 | 0.1744 |
hypothesis | key | lnL | nfp | DLC | unique_Q |
---|---|---|---|---|---|
null | 'GNC-null' | -6708.3119 | 24 | True | |
alt | 'GNC-alt' | -6705.8296 | 27 | True |
[2]:
result.alt.lf
[2]:
GNC-alt
log-likelihood = -6705.8296
number of free parameters = 27
A>C | A>G | A>T | C>A | C>G | C>T | G>A | G>C | G>T | T>A |
---|---|---|---|---|---|---|---|---|---|
0.8554 | 3.5343 | 0.9744 | 1.6586 | 2.1937 | 6.2585 | 8.0104 | 1.2418 | 0.7942 | 1.2667 |
T>C |
---|
2.9645 |
bin | bprobs |
---|---|
0 | 0.0532 |
1 | 0.2655 |
2a | 0.0403 |
2b | 0.6410 |
edge | parent | length |
---|---|---|
Galago | root | 0.5419 |
HowlerMon | root | 0.1359 |
Rhesus | edge.3 | 0.0648 |
Orangutan | edge.2 | 0.0235 |
Gorilla | edge.1 | 0.0075 |
Human | edge.0 | 0.0182 |
Chimpanzee | edge.0 | 0.0085 |
edge.0 | edge.1 | 0.0000 |
edge.1 | edge.2 | 0.0099 |
edge.2 | edge.3 | 0.0365 |
edge.3 | root | 0.0234 |
edge | bin | omega |
---|---|---|
Galago | 0 | 0.0000 |
Galago | 1 | 1.0000 |
Galago | 2a | 0.0000 |
Galago | 2b | 1.0000 |
HowlerMon | 0 | 0.0000 |
HowlerMon | 1 | 1.0000 |
HowlerMon | 2a | 0.0000 |
HowlerMon | 2b | 1.0000 |
Rhesus | 0 | 0.0000 |
Rhesus | 1 | 1.0000 |
Rhesus | 2a | 0.0000 |
Rhesus | 2b | 1.0000 |
Orangutan | 0 | 0.0000 |
Orangutan | 1 | 1.0000 |
Orangutan | 2a | 0.0000 |
Orangutan | 2b | 1.0000 |
Gorilla | 0 | 0.0000 |
Gorilla | 1 | 1.0000 |
Gorilla | 2a | 0.0000 |
Gorilla | 2b | 1.0000 |
Human | 0 | 0.0000 |
Human | 1 | 1.0000 |
Human | 2a | 20.0000 |
Human | 2b | 20.0000 |
Chimpanzee | 0 | 0.0000 |
Chimpanzee | 1 | 1.0000 |
Chimpanzee | 2a | 20.0000 |
Chimpanzee | 2b | 20.0000 |
edge.0 | 0 | 0.0000 |
edge.0 | 1 | 1.0000 |
edge.0 | 2a | 0.0000 |
edge.0 | 2b | 1.0000 |
edge.1 | 0 | 0.0000 |
edge.1 | 1 | 1.0000 |
edge.1 | 2a | 0.0000 |
edge.1 | 2b | 1.0000 |
edge.2 | 0 | 0.0000 |
edge.2 | 1 | 1.0000 |
edge.2 | 2a | 0.0000 |
edge.2 | 2b | 1.0000 |
edge.3 | 0 | 0.0000 |
edge.3 | 1 | 1.0000 |
edge.3 | 2a | 0.0000 |
edge.3 | 2b | 1.0000 |
AAA | AAC | AAG | AAT | ACA | ACC | ACG | ACT | AGA | AGC |
---|---|---|---|---|---|---|---|---|---|
0.0556 | 0.0235 | 0.0344 | 0.0556 | 0.0228 | 0.0046 | 0.0008 | 0.0289 | 0.0231 | 0.0286 |
AGG | AGT | ATA | ATC | ATG | ATT | CAA | CAC | CAG | CAT |
---|---|---|---|---|---|---|---|---|---|
0.0140 | 0.0381 | 0.0186 | 0.0070 | 0.0128 | 0.0192 | 0.0196 | 0.0052 | 0.0238 | 0.0221 |
CCA | CCC | CCG | CCT | CGA | CGC | CGG | CGT | CTA | CTC |
---|---|---|---|---|---|---|---|---|---|
0.0195 | 0.0062 | 0.0006 | 0.0263 | 0.0011 | 0.0009 | 0.0023 | 0.0032 | 0.0137 | 0.0078 |
CTG | CTT | GAA | GAC | GAG | GAT | GCA | GCC | GCG | GCT |
---|---|---|---|---|---|---|---|---|---|
0.0125 | 0.0105 | 0.0755 | 0.0105 | 0.0303 | 0.0315 | 0.0158 | 0.0096 | 0.0014 | 0.0137 |
GGA | GGC | GGG | GGT | GTA | GTC | GTG | GTT | TAC | TAT |
---|---|---|---|---|---|---|---|---|---|
0.0161 | 0.0090 | 0.0067 | 0.0133 | 0.0148 | 0.0070 | 0.0069 | 0.0213 | 0.0023 | 0.0101 |
TCA | TCC | TCG | TCT | TGC | TGG | TGT | TTA | TTC | TTG |
---|---|---|---|---|---|---|---|---|---|
0.0221 | 0.0082 | 0.0015 | 0.0251 | 0.0018 | 0.0040 | 0.0201 | 0.0212 | 0.0078 | 0.0108 |
TTT |
---|
0.0187 |
Getting the posterior probabilities of site-class membership¶
[3]:
bprobs = result.alt.lf.get_bin_probs()
bprobs[:, :20]
[3]:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.076 | 0.043 | 0.000 | 0.067 | 0.059 | 0.080 | 0.043 | 0.061 | 0.052 | 0.041 | 0.039 |
1 | 0.255 | 0.270 | 0.293 | 0.259 | 0.263 | 0.253 | 0.270 | 0.262 | 0.266 | 0.271 | 0.272 |
2a | 0.057 | 0.033 | 0.000 | 0.050 | 0.044 | 0.060 | 0.033 | 0.046 | 0.040 | 0.032 | 0.030 |
2b | 0.613 | 0.654 | 0.707 | 0.624 | 0.634 | 0.608 | 0.654 | 0.632 | 0.643 | 0.657 | 0.659 |
11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | |
---|---|---|---|---|---|---|---|---|---|
0 | 0.080 | 0.048 | 0.000 | 0.080 | 0.262 | 0.041 | 0.036 | 0.059 | 0.062 |
1 | 0.253 | 0.268 | 0.293 | 0.253 | 0.156 | 0.271 | 0.273 | 0.263 | 0.261 |
2a | 0.060 | 0.037 | 0.000 | 0.059 | 0.202 | 0.032 | 0.028 | 0.044 | 0.047 |
2b | 0.608 | 0.648 | 0.707 | 0.608 | 0.379 | 0.657 | 0.664 | 0.634 | 0.630 |
Getting all the statistics in tabular form¶
[4]:
tab = evo.tabulate_stats()
stats = tab(result.alt)
stats
[4]:
5x tabular_result('global params': Table, 'bin params': Table, 'edge params': Table, 'edge bin params': Table)
[5]:
stats["edge bin params"][:10] # truncating the table
[5]:
edge | bin | omega |
---|---|---|
Galago | 0 | 0.0000 |
Galago | 1 | 1.0000 |
Galago | 2a | 0.0000 |
Galago | 2b | 1.0000 |
HowlerMon | 0 | 0.0000 |
HowlerMon | 1 | 1.0000 |
HowlerMon | 2a | 0.0000 |
HowlerMon | 2b | 1.0000 |
Rhesus | 0 | 0.0000 |
Rhesus | 1 | 1.0000 |
10 rows x 3 columns