natsel_zhang – a branch-site test

This is the hypothesis test presented in Zhang et al. This test evaluates the hypothesis that a set of sites have undergone positive natural selection on a pre-specified set of lineages.

For this model class, there are groups of branches for which all positions are evolving neutrally but some proportion of those neutrally evolving sites change to adaptively evolving on so-called foreground edges. For the current example, we’ll define the Chimpanzee and Human branches as foreground and everything else as background. The following table defines the parameter scopes.

Site Class

Proportion

Background Edges

Foreground Edges

0

p0

0 < omega0 < 1

0 < omega0 < 1

1

p1

omega1 = 1

omega1 = 1

2a

p2

0 < omega0 < 1

0 < omega2 > 1

2b

p3

omega1 = 1

0 < omega0 < 1

NOTE: Our implementation is not as parametrically succinct as that of Zhang et al, we have 1 additional bin probability.

[1]:
from cogent3.app import io, evo

loader = io.load_aligned(format="fasta", moltype="dna")
aln = loader("../data/primate_brca1.fasta")

zhang_test = evo.natsel_zhang("GNC",
                              tree="../data/primate_brca1.tree",
                              optimise_motif_probs=False,
                              tip1="Human",
                              tip2="Chimpanzee")

result = zhang_test(aln)
result
[1]:
Statistics
LR df pvalue
4.9647 3 0.1744
hypothesis key lnL nfp DLC unique_Q
null 'GNC-null' -6708.3119 24 True
alt 'GNC-alt' -6705.8296 27 True
[2]:
result.alt.lf
[2]:

GNC-alt

log-likelihood = -6705.8296

number of free parameters = 27

Global params
A>C A>G A>T C>A C>G C>T G>A G>C G>T T>A
0.8554 3.5343 0.9744 1.6586 2.1937 6.2585 8.0104 1.2418 0.7942 1.2667
T>C
2.9645
Bin params
bin bprobs
0 0.0532
1 0.2655
2a 0.0403
2b 0.6410
Edge params
edge parent length
Galago root 0.5419
HowlerMon root 0.1359
Rhesus edge.3 0.0648
Orangutan edge.2 0.0235
Gorilla edge.1 0.0075
Human edge.0 0.0182
Chimpanzee edge.0 0.0085
edge.0 edge.1 0.0000
edge.1 edge.2 0.0099
edge.2 edge.3 0.0365
edge.3 root 0.0234
Edge bin params
edge bin omega
Galago 0 0.0000
Galago 1 1.0000
Galago 2a 0.0000
Galago 2b 1.0000
HowlerMon 0 0.0000
HowlerMon 1 1.0000
HowlerMon 2a 0.0000
HowlerMon 2b 1.0000
Rhesus 0 0.0000
Rhesus 1 1.0000
Rhesus 2a 0.0000
Rhesus 2b 1.0000
Orangutan 0 0.0000
Orangutan 1 1.0000
Orangutan 2a 0.0000
Orangutan 2b 1.0000
Gorilla 0 0.0000
Gorilla 1 1.0000
Gorilla 2a 0.0000
Gorilla 2b 1.0000
Human 0 0.0000
Human 1 1.0000
Human 2a 20.0000
Human 2b 20.0000
Chimpanzee 0 0.0000
Chimpanzee 1 1.0000
Chimpanzee 2a 20.0000
Chimpanzee 2b 20.0000
edge.0 0 0.0000
edge.0 1 1.0000
edge.0 2a 0.0000
edge.0 2b 1.0000
edge.1 0 0.0000
edge.1 1 1.0000
edge.1 2a 0.0000
edge.1 2b 1.0000
edge.2 0 0.0000
edge.2 1 1.0000
edge.2 2a 0.0000
edge.2 2b 1.0000
edge.3 0 0.0000
edge.3 1 1.0000
edge.3 2a 0.0000
edge.3 2b 1.0000
Motif params
AAA AAC AAG AAT ACA ACC ACG ACT AGA AGC
0.0556 0.0235 0.0344 0.0556 0.0228 0.0046 0.0008 0.0289 0.0231 0.0286
AGG AGT ATA ATC ATG ATT CAA CAC CAG CAT
0.0140 0.0381 0.0186 0.0070 0.0128 0.0192 0.0196 0.0052 0.0238 0.0221
CCA CCC CCG CCT CGA CGC CGG CGT CTA CTC
0.0195 0.0062 0.0006 0.0263 0.0011 0.0009 0.0023 0.0032 0.0137 0.0078
CTG CTT GAA GAC GAG GAT GCA GCC GCG GCT
0.0125 0.0105 0.0755 0.0105 0.0303 0.0315 0.0158 0.0096 0.0014 0.0137
GGA GGC GGG GGT GTA GTC GTG GTT TAC TAT
0.0161 0.0090 0.0067 0.0133 0.0148 0.0070 0.0069 0.0213 0.0023 0.0101
TCA TCC TCG TCT TGC TGG TGT TTA TTC TTG
0.0221 0.0082 0.0015 0.0251 0.0018 0.0040 0.0201 0.0212 0.0078 0.0108
TTT
0.0187

Getting the posterior probabilities of site-class membership

[3]:
bprobs = result.alt.lf.get_bin_probs()
bprobs[:, :20]
[3]:
0 1 2 3 4 5 6 7 8 9 10
0 0.076 0.043 0.000 0.067 0.059 0.080 0.043 0.061 0.052 0.041 0.039
1 0.255 0.270 0.293 0.259 0.263 0.253 0.270 0.262 0.266 0.271 0.272
2a 0.057 0.033 0.000 0.050 0.044 0.060 0.033 0.046 0.040 0.032 0.030
2b 0.613 0.654 0.707 0.624 0.634 0.608 0.654 0.632 0.643 0.657 0.659
11 12 13 14 15 16 17 18 19
0 0.080 0.048 0.000 0.080 0.262 0.041 0.036 0.059 0.062
1 0.253 0.268 0.293 0.253 0.156 0.271 0.273 0.263 0.261
2a 0.060 0.037 0.000 0.059 0.202 0.032 0.028 0.044 0.047
2b 0.608 0.648 0.707 0.608 0.379 0.657 0.664 0.634 0.630

Getting all the statistics in tabular form

[4]:
tab = evo.tabulate_stats()
stats = tab(result.alt)
stats
[4]:
5x tabular_result('global params': Table, 'bin params': Table, 'edge params': Table, 'edge bin params': Table)
[5]:
stats["edge bin params"][:10]  # truncating the table
[5]:
edge bin params
edge bin omega
Galago 0 0.0000
Galago 1 1.0000
Galago 2a 0.0000
Galago 2b 1.0000
HowlerMon 0 0.0000
HowlerMon 1 1.0000
HowlerMon 2a 0.0000
HowlerMon 2b 1.0000
Rhesus 0 0.0000
Rhesus 1 1.0000

10 rows x 3 columns