Put it into Practice Ex01: Conda and Job Submission
Goal: Work through the steps of pulling some data from the Internet, creating a Conda environment that will be used to perform some analysis on this data, that is performed from a job submission.
This exercise is structured into three parts:
Description: This details what to do and results to check your work against. Try and perform this as is and see how far you can get to test your current knowledge and highlight areas to look back on review.
Pointers and Guides: Use these sections to assist you if you’re unsure and would like some hints and suggestions.
Answer: This will lay out one (of potentially many) approaches to perform this exercise.
Please do not just jump to this section and cut-n-paste - what have you actually learned from doing this?
To become a good HPC user you need to engage with this exercise and work through it and learn from applying/verifying what you know, and problem solving and resolving mistakes.
The Exercise Extensions section will provide questions for you to consider to maybe make your workflow more advanced and introduces circumstances that we have experienced with existing users.
Description
High Level:
Create a self contained Conda environment that provides the HTseq application that can be used to submit a job that utilizes a single node using four cores to perform some guided analysis.
The Conda environment needs to be created under a project and share-able with others within the project.
You will be directed to where data for the analysis can be retrieved from the Internet. This data will need to be downloaded to the cluster.
Scripts, data and resulting analysis will need to be stored within the
/project/<project-name>
and share-able.
Data: Retrieve data from the HTSeq example data folder. Specifically you will be using the following two files:
bamfile_no_qualities.bam
bamfile_no_qualities.gtf
Once downloaded, the two files should have size:
966147 bamfile_no_qualities.bam
282781 bamfile_no_qualities.gtf
Pointers and Guides: Initial Consideration
Data Management: Structure and Organize the Work:
Getting the Data
Creating the Conda Environment
Plan Your Workflow
Submit the Job
Analyze the Results
Answer
Setup Structure Under a Project
Get the Data
Create the Conda Environment
Submit the Job
Look at the Results
Exercise Extensions
| Workshop Home | Next |