...
Use the module name sourcetracker2
to discover versions available and to load the application.
...
Beartooth
Compared to earlier versions on Teton, the gibbs
command is NOT required:
Code Block |
---|
[salexan5@tlog1 ~]$ sourcetracker2 gibbs --help Usage: sourcetracker2 gibbs [OPTIONS] Gibb's sampler for Bayesian estimation of microbial sample sources. For details, see the project README file. Options: -i, --table_fp FILE Path to input BIOM table. [required] otu_table.biom -m, --mapping_fp FILE Path to sample metadata mapping file. [required] -o, --output_dir FILE Path to the output directory to be created. [required] --loo Classify each sample in `sources` using a leave-one-out strategy. Replicates -s option in Knights et al. sourcetracker. [default: False] --jobs INTEGER Number of processes to launch. [default: 1] --alpha1 FLOAT Prior counts of each species in the training environments. Higher values decrease the trust in the training environments, and make the source environment distrubitons over taxa smoother. By default, this is set to 0.001, which indicates reasonably high trust in all source environments, even those with few training sequences. This is useful when only a small number of biological samples are available from a source environment. A more conservative value would be 0.01. [default: 0.001] --alpha2 FLOAT Prior counts of each species in Unknown environment. Higher values make the Unknown environment smoother and less prone to overfitting given a training sample. [default: 0.001] --beta INTEGER Count to be added to each species in each environment, including `unknown`. [default: 10] --source_rarefaction_depth INTEGER Depth at which to rarify sources. If 0, no rarefaction performed. [default: 1000] --sink_rarefaction_depth INTEGER Depth at which to rarify sinks. If 0, no rarefaction performed. [default: 1000] --restarts INTEGER Number of independent Markov chains to grow. `draws_per_restart` * `restarts` gives the number of samplings of the mixing proportions that will be generated. [default: 10] --draws_per_restart INTEGER Number of times to sample the state of the Markov chain for each independent chain grown. [default: 1] --burnin INTEGER Number of passes (withdarawal and reassignment of every sequence in the sink) that will be made before a sample (draw) will be taken. Higher values allow more convergence towards the true distribtion before draws are taken. [default: 100] --delay INTEGER Number passes between each sampling (draw) of the Markov chain. Once the burnin passes have been made, a sample will be taken every `delay` number of passes. This is also known as `thinning`. Thinning helps reduce the impact of correlation between adjacent states of the Markov chain. [default: 10] --cluster_start_delay INTEGER When using multiple jobs, the script has to map.txt -o example1/ |
Teton
On Teton, the gibbs
command has to be used:
Code Block |
---|
[]$ sourcetracker2 --help Usage: sourcetracker2 [OPTIONS] COMMAND [ARGS]... Options: --version Show the version and exit. --help Show this message and exit. Commands: gibbs Gibb's sampler for Bayesian estimation of microbial sample sources. []$ sourcetracker2 gibbs --help Usage: sourcetracker2 gibbs [OPTIONS] Gibb's sampler for Bayesian estimation of microbial sample sources. For details, see the project README file. Options: -i, --table_fp FILE Path to input start an `ipcluster`. If ipcluster does not recognize that it has been successfully started, the jobs will not be successfully launched. If this is happening, increase this parameter. [default: 25] --source_sink_column TEXT Sample metadata column indicating which samples should be treated as sources and which as sinks. [default: SourceSink] --source_column_value TEXT Value in source_sink_column indicating which samples should be treated as sources. [default: source] --sink_column_value TEXT Value in source_sink_column indicating which samples should be treated as sinks. [default: sink] --source_category_column TEXT Sample metadata column indicating the type of each source sample. [default: Env] --help Show this message and exit. |
Parallelization
The SourceTracker2 documentation indicates that jobs can be run in parallel using the --jobs options.
At this stage ARCC is unsure if this is actually working correctly.
Following the examples, and setting the following in the submission script:
Code Block |
---|
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=5 |
The example that uses:
Code Block |
---|
BIOM table. [required] ... # Example []$ sourcetracker2 gibbs -i otu_table.biom -m map.txt -o example6/ --jobs 5 |
actually runs slower than if --jobs 1
as jobs are taking significantly longer to run.
This could be because of the test data being used.
Examples
...
example1/ |
Error: No --per_sink_feature_assignments option
Following the examples on the website: Running the following on Teton:
Code Block |
---|
sourcetracker2 gibbs -i otu_table.biom -m map.txt -o example7/ --jobs 5 --per_sink_feature_assignments |
...
Code Block |
---|
Error: no such option: --per_sink_feature_assignments |
...
Multicore
The sourcetracker2
command can be ran with multiple threads, see the sourcetracker2 --help
for more details on the --jobs
option.
Example: Beartooth:
Code Block |
---|
#SBATCH --cpus-per-task=16 |
The example that uses:
Code Block |
---|
sourcetracker2 -i otu_table.biom -m map.txt -o example6/ --jobs 16 |