Overview of using apps¶
The general idea is you create the flavour of function you want from an existing app. What you have created is then callable. It can be applied to a single data object (like an alignment file), or a series of them (like a directory of alignment files). Alternatively, an app can be combined with other apps to make a pipeline.
I illustrate the general approach for a simple example – extracting third codon positions.
Define the apps¶
from cogent3.app import io, sample
reader = io.load_aligned(format="fasta", moltype="dna")
cpos3 = sample.take_codon_positions(3)
writer = io.write_seqs("path/to/write/thirdpos.zip", format="fasta")
Using apps like functions¶
data = reader("some/path/to/seqs.fasta")
just3rd = cpos3(data)
m = writer(just3rd, identifier="3rdpos_data.fasta")
In the above, m
is a DataStoreMember
. The result will be written into the zip archive specified in constructing the writer
.
Composing a multi-step process from several apps¶
The above can be simplified by creation of a single composed function. Executing this with the path argument will generate the same output.
process = reader + cpos3 + writer
m = process("some/path/to/seqs.fasta")
Applying a process to multiple data records¶
We use a data store to identify all data files in a directory that we want to analyse. process
can be then applied to all records in the data store.
dstore = io.get_data_store("path/to/dir", suffix="fasta")
r = process.apply_to(dstore)
Here r
is a list of all the DataStoreMember
instances.
Other important features¶
You can track progress¶
process.apply_to(dstore, show_progress=True)
You can do parallel computation¶
process.apply_to(dstore, parallel=True)
By default, this will use all available processors on your machine. If you are running in an mpi environment, you can add the argument par_kw=dict(use_mpi=True)
.
You can log the settings and data analysed¶
process.apply_to(dstore, logger=True)
All of the above¶
process.apply_to(dstore, parallel=True, logger=True, show_progress=True)
If you use the json
based output formats (either explicitly, or via using the tinydb data store type), any “failures” (see The NotCompleted object) will be written to file also and a convenient interface is provided for interrogating those.