Interproscan
Overview
InterPro is a database which integrates together predictive information about proteins' function from a number of partner resources, giving an overview of the families that a protein belongs to and the domains and sites it contains.
Users who have novel nucleotide or protein sequences that they wish to functionally characterise can use the software package InterProScan to run the scanning algorithms from the InterPro database in an integrated way. Sequences are submitted in FASTA format. Matches are then calculated against all of the required member database's signatures and the results are then output in a variety of formats.
Using
Use the module name interproscan
, you may load version 5.61-93.0.
More information on running interproscan and further documentation may be found here: https://interproscan-docs.readthedocs.io/en/5.56-89.0/HowToRun.html
Users should allocate at least --mem=60GB when running analysis. If you receive an out of memory error, consider requesting more memory.
Example
[]$ module load interproscan
[]$ salloc --account=<my account> --time=1:00:00 --mem=60GB
salloc: Granted job allocation
salloc: Waiting for resource configuration
salloc: Nodes are ready for job
[]$ interproscan.sh -i ~/test_all_appl.fasta -f tsv -dp
01/05/2023 08:22:41:521 Welcome to InterProScan-5.61-93.0
01/05/2023 08:22:41:523 Running InterProScan v5 in STANDALONE mode... on Linux
01/05/2023 08:22:54:075 RunID: mtest2_20230501_082253698_fi3i
01/05/2023 08:23:12:222 Loading file /test_all_appl.fasta
01/05/2023 08:23:12:234 Running the following analyses:
[AntiFam-7.0,CDD-3.20,Coils-2.2.1,FunFam-4.3.0,Gene3D-4.3.0,Hamap-2021_04,MobiDBLite-2.0,PANTHER-17.0,Pfam-35.0,PIRSF-3.10,PIRSR-2021_05,PRINTS-42.0,ProSitePatterns-2022_05,ProSiteProfiles-2022_05,SFLD-4,SMART-9.0,SUPERFAMILY-1.75,TIGRFAM-15.0]
Pre-calculated match lookup service DISABLED. Please wait for match calculations to complete...
01/05/2023 08:24:15:125 25% completed
01/05/2023 08:25:04:303 50% completed
01/05/2023 08:26:06:396 76% completed
01/05/2023 08:27:04:227 90% completed
01/05/2023 08:27:52:696 100% done: InterProScan analyses completed
Below is an example of a batch file:
#!/bin/bash
#SBATCH --account=arcc
#SBATCH --nodes=1
#SBATCH --mem=60
#SBATCH --time=0:30:00
module load interproscan
interproscan.sh -i ~/test_all_appl.fasta -f tsv -dp
wait
Match Lookup Service
This service is disabled on Beartooth, and analysis runs locally without match lookup.
Parallel Jobs
Out of the box, interproscan has a cluster mode but this does not integrate with SLURM and therefore Interproscan is installed on Beartooth to run in standalone mode.
The interproscan.properties file has been set as indicated.