Compute the effect of a nucleotide substitution on residue polarity in two different genetic codes using GeneticCode and AAIndex

Section author: Greg Caporaso

This document illustrates how to work with a genetic code object, and compare two different genetic codes. Here we compare the change in residue polarity, as judged by the Woese Polarity Requirement index (Woese 1973), resulting from a nucleotide substitution if the sequence is translated with the standard nuclear genetic code, or the vertebrate mitochondrial genetic code.

First, we load the genetic code objects and look at how they differ from one another.

>>> from cogent3.core.genetic_code import GeneticCode
>>> standard_nuclear_genetic_code = GeneticCode(code)
>>> vertebrate_mitochondrial_genetic_code = GeneticCode(code)
>>> standard_nuclear_genetic_code == vertebrate_mitochondrial_genetic_code

We’ll make some synonyms for the objects for simplicity, and then look at the differences between the two codes:

>>> ngc = standard_nuclear_genetic_code
>>> mgc = vertebrate_mitochondrial_genetic_code

>>> differences = list(ngc.changes(mgc).items())
>>> differences.sort()
>>> differences
[('AGA', 'R*'), ('AGG', 'R*'), ('ATA', 'IM'), ('TGA', '*W')]

Next, let’s load the Woese Polar Requirement AAIndex data, and find the effect of an ATA to ATG substitution with each of the two GeneticCode objects.

>>> from cogent3.parse.aaindex import getWoeseDistanceMatrix
>>> woese_distance_matrix = getWoeseDistanceMatrix()
>>> woese_distance_matrix[ngc['ATA']][ngc['ATG']]
>>> woese_distance_matrix[mgc['ATA']][mgc['ATG']]

This illustrates that there is a difference in residue polarity associated with substitution only in the standard nuclear code (where ATA to ATG translates to an isoleucine to methionine substitution). In the vertebrate mitochondrial code, ATA to ATG is a synonymous substitution. Calculations of this type were central to [1] which presents the study that these modules were initially developed for.

GeneticCode objects can also be used to translate DNA sequences (where asterisks in the results refer to stop-translation characters):

>>> ngc.translate(dna)
>>> mgc.translate(dna)

The standard nuclear genetic code can also be loaded as DEFAULT:

>>> from cogent3.core.genetic_code import DEFAULT
>>> DEFAULT == standard_nuclear_genetic_code


[1]Caporaso, Yarus, and Knight. Error minimization and coding triplet/binding site associations are independent features of the canonical genetic code. J Mol Evol, 61(5):597-607, 2005.