DNA and RNA sequences

Creating a DNA sequence from a string

All sequence and alignment objects have a molecular type, or MolType which provides key properties for validating sequence characters. Here we use the DNA MolType to create a DNA sequence.

>>> from cogent3 import DNA
>>> my_seq = DNA.make_seq("AGTACACTGGT")
>>> my_seq
DnaSequence(AGTACAC... 11)
>>> print(my_seq)
AGTACACTGGT
>>> str(my_seq)
'AGTACACTGGT'

Creating a RNA sequence from a string

>>> from cogent3 import RNA
>>> rnaseq = RNA.make_seq('ACGUACGUACGUACGU')

Converting to FASTA format

>>> from cogent3 import DNA
>>> my_seq = DNA.make_seq('AGTACACTGGT')
>>> print(my_seq.to_fasta())
>0
AGTACACTGGT

Convert a RNA sequence to FASTA format

>>> from cogent3 import RNA
>>> rnaseq = RNA.make_seq('ACGUACGUACGUACGU')
>>> rnaseq.to_fasta()
'>0\nACGUACGUACGUACGU'

Creating a named sequence

>>> from cogent3 import DNA
>>> my_seq = DNA.make_seq('AGTACACTGGT','my_gene')
>>> my_seq
DnaSequence(AGTACAC... 11)
>>> type(my_seq)
<class 'cogent3.core.sequence.DnaSequence'>

Setting or changing the name of a sequence

>>> from cogent3 import DNA
>>> my_seq = DNA.make_seq('AGTACACTGGT')
>>> my_seq.name = 'my_gene'
>>> print(my_seq.to_fasta())
>my_gene
AGTACACTGGT

Complementing a DNA sequence

>>> from cogent3 import DNA
>>> my_seq = DNA.make_seq("AGTACACTGGT")
>>> print(my_seq.complement())
TCATGTGACCA

Reverse complementing a DNA sequence

>>> print(my_seq.reversecomplement())
ACCAGTGTACT

The rc method name is easier to type

>>> print(my_seq.rc())
ACCAGTGTACT

Translate a DnaSequence to protein

>>> from cogent3 import DNA
>>> my_seq = DNA.make_seq('GCTTGGGAAAGTCAAATGGAA','protein-X')
>>> pep = my_seq.get_translation()
>>> type(pep)
<class 'cogent3.core.sequence.ProteinSequence'>
>>> print(pep.to_fasta())
>protein-X
AWESQME

Converting a DNA sequence to RNA

>>> from cogent3 import DNA
>>> my_seq = DNA.make_seq('ACGTACGTACGTACGT')
>>> print(my_seq.to_rna())
ACGUACGUACGUACGU

Convert an RNA sequence to DNA

 >>> from cogent3 import RNA
>>> rnaseq = RNA.make_seq('ACGUACGUACGUACGU')
>>> print(rnaseq.to_dna())
ACGTACGTACGTACGT

Testing complementarity

>>> from cogent3 import DNA
>>> a = DNA.make_seq("AGTACACTGGT")
>>> a.can_pair(a.complement())
False
>>> a.can_pair(a.reversecomplement())
True

Joining two DNA sequences

>>> from cogent3 import DNA
>>> my_seq = DNA.make_seq("AGTACACTGGT")
>>> extra_seq = DNA.make_seq("CTGAC")
>>> long_seq = my_seq + extra_seq
>>> long_seq
DnaSequence(AGTACAC... 16)
>>> str(long_seq)
'AGTACACTGGTCTGAC'

Slicing DNA sequences

>>> my_seq[1:6]
DnaSequence(GTACA)

Getting 3rd positions from codons

The easiest approach is to work off the cogent3 ArrayAlignment object.

We’ll do this by specifying the position indices of interest, creating a sequence Feature and using that to extract the positions.

>>> from cogent3 import DNA
>>> seq = DNA.make_array_seq('ATGATGATGATG')
>>> pos3 = seq[2::3]
>>> assert str(pos3) == 'GGGG'

Getting 1st and 2nd positions from codons

In this instance we can use the annotatable sequence classes.

>>> from cogent3 import DNA
>>> seq = DNA.make_seq('ATGATGATGATG')
>>> indices = [(i, i+2) for i in range(len(seq))[::3]]
>>> pos12 = seq.add_feature('pos12', 'pos12', indices)
>>> pos12 = pos12.get_slice()
>>> assert str(pos12) == 'ATATATAT'

Return a randomized version of the sequence

print rnaseq.shuffle()
ACAACUGGCUCUGAUG

Remove gaps from a sequence

 >>> from cogent3 import RNA
>>> s = RNA.make_seq('--AUUAUGCUAU-UAu--')
>>> print(s.degap())
AUUAUGCUAUUAU