.. _calculating-pairwise-distances: Calculate pairwise distances between sequences ============================================== .. sectionauthor:: Gavin Huttley An example of how to calculate the pairwise distances for a set of sequences. .. doctest:: >>> from cogent3 import load_aligned_seqs >>> from cogent3.evolve import distance Import a substitution model (or create your own) .. doctest:: >>> from cogent3.evolve.models import HKY85 Load my alignment .. doctest:: >>> al = load_aligned_seqs("data/long_testseqs.fasta") Create a pairwise distances object with your alignment and substitution model .. doctest:: >>> d = distance.EstimateDistances(al, submodel=HKY85()) Printing ``d`` before execution shows its status. .. doctest:: >>> print(d) ========================================================================= Seq1 \ Seq2 Human HowlerMon Mouse NineBande DogFaced ------------------------------------------------------------------------- Human * Not Done Not Done Not Done Not Done HowlerMon Not Done * Not Done Not Done Not Done Mouse Not Done Not Done * Not Done Not Done NineBande Not Done Not Done Not Done * Not Done DogFaced Not Done Not Done Not Done Not Done * ------------------------------------------------------------------------- Which in this case is to simply indicate nothing has been done. .. doctest:: >>> d.run(show_progress=False) >>> print(d) ===================================================================== Seq1 \ Seq2 Human HowlerMon Mouse NineBande DogFaced --------------------------------------------------------------------- Human * 0.0730 0.3363 0.1804 0.1972 HowlerMon 0.0730 * 0.3487 0.1865 0.2078 Mouse 0.3363 0.3487 * 0.3813 0.4022 NineBande 0.1804 0.1865 0.3813 * 0.2019 DogFaced 0.1972 0.2078 0.4022 0.2019 * --------------------------------------------------------------------- Note that pairwise distances can be distributed for computation across multiple CPU's. In this case, when statistics (like distances) are requested only the master CPU returns data. We'll write a phylip formatted distance matrix. .. doctest:: >>> d.write('dists_for_phylo.phylip', format="phylip") We'll also save the distances to file in Python's pickle format. .. doctest:: >>> import pickle >>> f = open('dists_for_phylo.pickle', "wb") >>> pickle.dump(d.get_pairwise_distances(), f) >>> f.close() .. clean up .. doctest:: :hide: >>> import os >>> for file_name in 'dists_for_phylo.phylip', 'dists_for_phylo.pickle': ... os.remove(file_name)