Put it into Practice Ex01: Conda and Job Submission

Goal: Work through the steps of pulling some data from the Internet, creating a Conda environment that will be used to perform some analysis on this data, that is performed from a job submission.


This exercise is structured into three parts:

  1. Description: This details what to do and results to check your work against. Try and perform this as is and see how far you can get to test your current knowledge and highlight areas to look back on review.

  2. Pointers and Guides: Use these sections to assist you if you’re unsure and would like some hints and suggestions.

  3. Answer: This will lay out one (of potentially many) approaches to perform this exercise.

    1. Please do not just jump to this section and cut-n-paste - what have you actually learned from doing this?

    2. To become a good HPC user you need to engage with this exercise and work through it and learn from applying/verifying what you know, and problem solving and resolving mistakes.

The Exercise Extensions section will provide questions for you to consider to maybe make your workflow more advanced and introduces circumstances that we have experienced with existing users.



Description

High Level:

  • Create a self contained Conda environment that provides the HTseq application that can be used to submit a job that utilizes a single node using four cores to perform some guided analysis.

  • The Conda environment needs to be created under a project and share-able with others within the project.

  • You will be directed to where data for the analysis can be retrieved from the Internet. This data will need to be downloaded to the cluster.

  • Scripts, data and resulting analysis will need to be stored within the /project/<project-name> and share-able.

Data: Retrieve data from the HTSeq example data folder. Specifically you will be using the following two files:

  1. bamfile_no_qualities.bam

  2. bamfile_no_qualities.gtf

Once downloaded, the two files should have size:

966147 bamfile_no_qualities.bam 282781 bamfile_no_qualities.gtf
16S_rRNA 2 23S_rRNA 9 5S_rRNA-1 0 5S_rRNA-2 0 TK0001 8 TK0002 0 TK0003 0 TK0004 0 TK0005 0 TK0006 0
tRNA-Tyr 0 tRNA-Val-1 0 tRNA-Val-2 0 tRNA-Val-3 0 tRNA-Val-4 0 __no_feature 290 __ambiguous 270 __too_low_aQual 0 __not_aligned 0 __alignment_not_unique 0

Pointers and Guides: Initial Consideration


Data Management: Structure and Organize the Work:


Getting the Data


Creating the Conda Environment


Plan Your Workflow


Submit the Job


Analyze the Results


Answer


Setup Structure Under a Project


Get the Data


Create the Conda Environment


Submit the Job


Look at the Results


Exercise Extensions