Calculate pairwise distances between sequences¶
Section author: Gavin Huttley
An example of how to calculate the pairwise distances for a set of sequences.
>>> from cogent3 import load_aligned_seqs
>>> from cogent3.evolve import distance
Import a substitution model (or create your own)
>>> from cogent3.evolve.models import HKY85
Load my alignment
>>> al = load_aligned_seqs("data/long_testseqs.fasta")
Create a pairwise distances object with your alignment and substitution model
>>> d = distance.EstimateDistances(al, submodel=HKY85())
Printing d
before execution shows its status.
>>> print(d)
=========================================================================
Seq1 \ Seq2 Human HowlerMon Mouse NineBande DogFaced
-------------------------------------------------------------------------
Human * Not Done Not Done Not Done Not Done
HowlerMon Not Done * Not Done Not Done Not Done
Mouse Not Done Not Done * Not Done Not Done
NineBande Not Done Not Done Not Done * Not Done
DogFaced Not Done Not Done Not Done Not Done *
-------------------------------------------------------------------------
Which in this case is to simply indicate nothing has been done.
>>> d.run(show_progress=False)
>>> print(d)
=====================================================================
Seq1 \ Seq2 Human HowlerMon Mouse NineBande DogFaced
---------------------------------------------------------------------
Human * 0.0730 0.3363 0.1804 0.1972
HowlerMon 0.0730 * 0.3487 0.1865 0.2078
Mouse 0.3363 0.3487 * 0.3813 0.4022
NineBande 0.1804 0.1865 0.3813 * 0.2019
DogFaced 0.1972 0.2078 0.4022 0.2019 *
---------------------------------------------------------------------
Note that pairwise distances can be distributed for computation across multiple CPU’s. In this case, when statistics (like distances) are requested only the master CPU returns data.
We’ll write a phylip formatted distance matrix.
>>> d.write('dists_for_phylo.phylip', format="phylip")
We’ll also save the distances to file in Python’s pickle format.
>>> import pickle
>>> f = open('dists_for_phylo.pickle', "wb")
>>> pickle.dump(d.get_pairwise_distances(), f)
>>> f.close()