Building phylogenies¶
Building A Phylogenetic Tree From Pairwise Distances¶
Directly via alignment.quick_tree()
¶
Both the ArrayAlignment
and Alignment
classes support this.
>>> from cogent3 import load_aligned_seqs
>>> aln = load_aligned_seqs('data/primate_brca1.fasta', moltype="dna")
>>> tree = aln.quick_tree(calc="TN93")
>>> tree = tree.balanced() # purely for display
>>> print(tree.ascii_art())
/-Rhesus
/edge.1--|
| | /-HowlerMon
| \edge.0--|
| \-Galago
-root----|
|--Orangutan
|
| /-Chimpanzee
\edge.2--|
| /-Human
\edge.3--|
\-Gorilla
The quick_tree()
method also supports non-parametric bootstrapping. The number of resampled alignments is specified using the bootstrap
argument. In the following, trees are estimated from 100 resampled alignments and merged into a single consensus topology using a weighted consensus tree algorithm.
>>> tree = aln.quick_tree(calc="TN93", bootstrap=100)
Using the DistanceMatrix
object¶
>>> from cogent3 import load_aligned_seqs
>>> aln = load_aligned_seqs('data/primate_brca1.fasta', moltype="dna")
>>> dists = aln.distance_matrix(calc="TN93")
>>> tree = dists.quick_tree()
>>> tree = tree.balanced() # purely for display
>>> print(tree.ascii_art())
/-Rhesus
/edge.1--|
| | /-HowlerMon
| \edge.0--|
| \-Galago
-root----|
|--Orangutan
|
| /-Chimpanzee
\edge.2--|
| /-Human
\edge.3--|
\-Gorilla
Explicitly via DistanceMatrix
and cogent3.phylo.nj.nj()`
¶
>>> from cogent3.phylo import nj
>>> from cogent3 import load_aligned_seqs
>>> aln = load_aligned_seqs('data/primate_brca1.fasta', moltype="dna")
>>> dists = aln.distance_matrix(calc="TN93")
>>> tree = nj.nj(dists)
>>> tree = tree.balanced() # purely for display
>>> print(tree.ascii_art())
/-Rhesus
/edge.1--|
| | /-HowlerMon
| \edge.0--|
| \-Galago
-root----|
|--Orangutan
|
| /-Chimpanzee
\edge.2--|
| /-Human
\edge.3--|
\-Gorilla
Directly from a pairwise distance dict
¶
>>> from cogent3.phylo import nj
>>> dists = {('a', 'b'): 2.7, ('c', 'b'): 2.33, ('c', 'a'): 0.73}
>>> tree = nj.nj(dists)
>>> print(tree.ascii_art())
/-a
|
-root----|--b
|
\-c
By Least-squares¶
We illustrate the phylogeny reconstruction by least-squares using the F81 substitution model. We use the advanced-stepwise addition algorithm to search tree space. Here a
is the number of taxa to exhaustively evaluate all possible phylogenies for. Successive taxa are added to the top k
trees (measured by the least-squares metric) and k
trees are kept at each iteration.
>>> import pickle
>>> from cogent3.phylo.least_squares import WLS
>>> dists = pickle.load(open('data/dists_for_phylo.pickle', 'rb'))
>>> ls = WLS(dists)
>>> stat, tree = ls.trex(a=5, k=5, show_progress=False)
Other optional arguments that can be passed to the trex
method are: return_all
, whether the k
best trees at the final step are returned as a ScoredTreeCollection
object; order
, a series of tip names whose order defines the sequence in which tips will be added during tree building (this allows the user to randomise the input order).
By ML¶
We illustrate the phylogeny reconstruction using maximum-likelihood using the F81 substitution model. We use the advanced-stepwise addition algorithm to search tree space.
>>> from cogent3 import load_aligned_seqs
>>> from cogent3.phylo.maximum_likelihood import ML
>>> from cogent3.evolve.models import F81
>>> aln = load_aligned_seqs('data/primate_brca1.fasta')
>>> ml = ML(F81(), aln)
The ML
object also has the trex
method and this can be used in the same way as for above, i.e. ml.trex()
. We don’t do that here because this is a very slow method for phylogenetic reconstruction.