Overview of using apps

The general idea is you create the flavour of function you want from an existing app. What you have created is then callable. It can be applied to a single data object (like an alignment file), or a series of them (like a directory of alignment files). Alternatively, an app can be combined with other apps to make a pipeline.

I illustrate the general approach for a simple example – extracting third codon positions.

Define the apps

from cogent3.app import io, sample

reader = io.load_aligned(format="fasta", moltype="dna")
cpos3 = sample.take_codon_positions(3)
writer = io.write_seqs("path/to/write/thirdpos.zip", format="fasta")

Using apps like functions

data = reader("some/path/to/seqs.fasta")
just3rd = cpos3(data)
m = writer(just3rd, identifier="3rdpos_data.fasta")

In the above, m is a DataStoreMember. The result will be written into the zip archive specified in constructing the writer.

Composing a multi-step process from several apps

The above can be simplified by creation of a single composed function. Executing this with the path argument will generate the same output.

process = reader + cpos3 + writer
m = process("some/path/to/seqs.fasta")

Applying a process to multiple data records

We use a data store to identify all data files in a directory that we want to analyse. process can be then applied to all records in the data store.

dstore = io.get_data_store("path/to/dir", suffix="fasta")
r = process.apply_to(dstore)

Here r is a list of all the DataStoreMember instances.

Other important features

You can track progress

process.apply_to(dstore, show_progress=True)

You can do parallel computation

process.apply_to(dstore, parallel=True)

By default, this will use all available processors on your machine. If you are running in an mpi environment, you can add the argument par_kw=dict(use_mpi=True).

You can log the settings and data analysed

process.apply_to(dstore, logger=True)

All of the above

process.apply_to(dstore, parallel=True, logger=True, show_progress=True)

If you use the json based output formats (either explicitly, or via using the tinydb data store type), any “failures” (see The NotCompleted object) will be written to file also and a convenient interface is provided for interrogating those.