...
Code Block |
---|
[]$ gsutil cp gs://genomics-public-data/resources/broad/hg38/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz . Copying gs://genomics-public-data/resources/broad/hg38/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz... / [1 files][ 2.9 MiB/ 2.9 MiB] Operation completed over 1 objects/2.9 MiB. |
4: Copy Data from an Entire Public Bucket in Parallel
The example above copies a single file. If we want to download everything from v0
, we can use the following:
Code Block |
---|
[salexan5@wi001 gsutil]$ time gsutil -m cp -r gs://genomics-public-data/resources/broad/hg38/v0/ GATK_bundle/hg38_Mar21_2022. Copying gs://genomics-public-data/resources/broad/hg38/v0/1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf... Copying gs://genomics-public-data/resources/broad/hg38/v0/1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf.idx... ... Copying gs://genomics-public-data/resources/broad/hg38/v0/scattered_calling_intervals/temp_0040_of_50/scattered.interval_list... Copying gs://genomics-public-data/resources/broad/hg38/v0/scattered_calling_intervals/temp_0007_of_50/scattered.interval_list... - [76/76 files][ 32.3 GiB/ 32.3 GiB] 100% Done 85.8 MiB/s ETA 00:00:00 |
Notices Notice that we are using the -m
option which "Causes supported operations (acl ch, acl set, cp, mv, rm, rsync, and setmeta) to run in parallel. This can significantly improve performance if you are performing operations on a large number of files over a reasonably fast network connection.“ More details of which can be found by running gsutil help options
from the command line.
...