Building phylogenies

Building A Phylogenetic Tree From Pairwise Distances

Directly via alignment.quick_tree()

Both the ArrayAlignment and Alignment classes support this.

>>> from cogent3 import load_aligned_seqs
>>> aln = load_aligned_seqs('data/primate_brca1.fasta', moltype="dna")
>>> tree = aln.quick_tree(calc="TN93")
>>> tree = tree.balanced()  # purely for display
>>> print(tree.ascii_art())  
                    /-Rhesus
          /edge.1--|
         |         |          /-HowlerMon
         |          \edge.0--|
         |                    \-Galago
-root----|
         |--Orangutan
         |
         |          /-Chimpanzee
          \edge.2--|
                   |          /-Human
                    \edge.3--|
                              \-Gorilla

The quick_tree() method also supports non-parametric bootstrapping. The number of resampled alignments is specified using the bootstrap argument. In the following, trees are estimated from 100 resampled alignments and merged into a single consensus topology using a weighted consensus tree algorithm.

>>> tree = aln.quick_tree(calc="TN93", bootstrap=100)

Using the DistanceMatrix object

>>> from cogent3 import load_aligned_seqs
>>> aln = load_aligned_seqs('data/primate_brca1.fasta', moltype="dna")
>>> dists = aln.distance_matrix(calc="TN93")
>>> tree = dists.quick_tree()
>>> tree = tree.balanced()  # purely for display
>>> print(tree.ascii_art())  
                    /-Rhesus
          /edge.1--|
         |         |          /-HowlerMon
         |          \edge.0--|
         |                    \-Galago
-root----|
         |--Orangutan
         |
         |          /-Chimpanzee
          \edge.2--|
                   |          /-Human
                    \edge.3--|
                              \-Gorilla

Explicitly via DistanceMatrix and cogent3.phylo.nj.nj()`

>>> from cogent3.phylo import nj
>>> from cogent3 import load_aligned_seqs
>>> aln = load_aligned_seqs('data/primate_brca1.fasta', moltype="dna")
>>> dists = aln.distance_matrix(calc="TN93")
>>> tree = nj.nj(dists)
>>> tree = tree.balanced()  # purely for display
>>> print(tree.ascii_art())  
                    /-Rhesus
          /edge.1--|
         |         |          /-HowlerMon
         |          \edge.0--|
         |                    \-Galago
-root----|
         |--Orangutan
         |
         |          /-Chimpanzee
          \edge.2--|
                   |          /-Human
                    \edge.3--|
                              \-Gorilla

Directly from a pairwise distance dict

>>> from cogent3.phylo import nj
>>> dists = {('a', 'b'): 2.7, ('c', 'b'): 2.33, ('c', 'a'): 0.73}
>>> tree = nj.nj(dists)
>>> print(tree.ascii_art())  
          /-a
         |
-root----|--b
         |
          \-c

By Least-squares

We illustrate the phylogeny reconstruction by least-squares using the F81 substitution model. We use the advanced-stepwise addition algorithm to search tree space. Here a is the number of taxa to exhaustively evaluate all possible phylogenies for. Successive taxa are added to the top k trees (measured by the least-squares metric) and k trees are kept at each iteration.

>>> import pickle
>>> from cogent3.phylo.least_squares import WLS
>>> dists = pickle.load(open('data/dists_for_phylo.pickle', 'rb'))
>>> ls = WLS(dists)
>>> stat, tree = ls.trex(a=5, k=5, show_progress=False)

Other optional arguments that can be passed to the trex method are: return_all, whether the k best trees at the final step are returned as a ScoredTreeCollection object; order, a series of tip names whose order defines the sequence in which tips will be added during tree building (this allows the user to randomise the input order).

By ML

We illustrate the phylogeny reconstruction using maximum-likelihood using the F81 substitution model. We use the advanced-stepwise addition algorithm to search tree space.

>>> from cogent3 import load_aligned_seqs
>>> from cogent3.phylo.maximum_likelihood import ML
>>> from cogent3.evolve.models import F81
>>> aln = load_aligned_seqs('data/primate_brca1.fasta')
>>> ml = ML(F81(), aln)

The ML object also has the trex method and this can be used in the same way as for above, i.e. ml.trex(). We don’t do that here because this is a very slow method for phylogenetic reconstruction.