Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
[]$ gsutil cp gs://genomics-public-data/resources/broad/hg38/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz .
Copying gs://genomics-public-data/resources/broad/hg38/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]
Operation completed over 1 objects/2.9 MiB.

4: Copy Data from an Entire Public Bucket in Parallel

The example above copies a single file. If we want to download everything from v0, we can use the following:

Code Block
[]$ time gsutil -m cp -r gs://genomics-public-data/resources/broad/hg38/v0/ GATK_bundle/hg38_Mar21_2022.
Copying gs://genomics-public-data/resources/broad/hg38/v0/1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf...
Copying gs://genomics-public-data/resources/broad/hg38/v0/1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf.idx...
...
Copying gs://genomics-public-data/resources/broad/hg38/v0/scattered_calling_intervals/temp_0040_of_50/scattered.interval_list...
Copying gs://genomics-public-data/resources/broad/hg38/v0/scattered_calling_intervals/temp_0007_of_50/scattered.interval_list...
- [76/76 files][ 32.3 GiB/ 32.3 GiB] 100% Done  85.8 MiB/s ETA 00:00:00

Notices Notice that we are using the -m option which "Causes supported operations (acl ch, acl set, cp, mv, rm, rsync, and setmeta) to run in parallel. This can significantly improve performance if you are performing operations on a large number of files over a reasonably fast network connection.“ More details of which can be found by running gsutil help options from the command line.

...