Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Use the module name gsutil to discover versions available and to load the application.

Note: Please do not run this command from a login node as it will by default use ALL the cores on the node it is running from. Please run from an interactive salloc session.

Examples

1: Find all commands

...

Code Block
[]$ gsutil cp gs://genomics-public-data/resources/broad/hg38/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz .
Copying gs://genomics-public-data/resources/broad/hg38/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]
Operation completed over 1 objects/2.9 MiB.

4: Copy Data from an Entire Public Bucket in Parallel

The example above copies a single file. If we want to download everything from v0, we can use the following:

Code Block
[]$ time gsutil -m cp -r gs://genomics-public-data/resources/broad/hg38/v0/ .
Copying gs://genomics-public-data/resources/broad/hg38/v0/1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf...
Copying gs://genomics-public-data/resources/broad/hg38/v0/1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf.idx...
...
Copying gs://genomics-public-data/resources/broad/hg38/v0/scattered_calling_intervals/temp_0040_of_50/scattered.interval_list...
Copying gs://genomics-public-data/resources/broad/hg38/v0/scattered_calling_intervals/temp_0007_of_50/scattered.interval_list...
- [76/76 files][ 32.3 GiB/ 32.3 GiB] 100% Done  85.8 MiB/s ETA 00:00:00

Notice that we are using the -m option which "Causes supported operations (acl ch, acl set, cp, mv, rm, rsync, and setmeta) to run in parallel. This can significantly improve performance if you are performing operations on a large number of files over a reasonably fast network connection.“ More details of which can be found by running gsutil help options from the command line.

Please be aware that according to the Global Command Line Options that “Using the -m option can consume a significant amount of network bandwidth and cause problems or make your performance worse if you use a slower network.