Overview
gsutil is a Python application that lets you access Cloud Storage from the command line. You can use gsutil to do a wide range of bucket and object management tasks, including:
Creating and deleting buckets.
Uploading, downloading, and deleting objects.
Listing buckets and objects.
Moving, copying, and renaming objects.
Editing object and bucket ACLs.
gsutil performs all operations, including uploads and downloads, using HTTPS and transport-layer security (TLS).
List of available gsutil commands: use the help facility to find specific details
Using
Use the module name gsutil
to discover versions available and to load the application.
Note: Please do not run this command from a login node as it will by default use ALL the cores on the node it is running from. Please run from an interactive salloc
session.
Examples
1: Find all commands
[]$ gsutil --help Usage: gsutil [-D] [-DD] [-h header]... [-i service_account] [-m] [-o section:flag=value]... [-q] [-u user_project] [command [opts...] args...] Available commands: acl Get, set, or change bucket and/or object ACLs autoclass Configure autoclass feature bucketpolicyonly Configure uniform bucket-level access cat Concatenate object content to stdout ... version Print version info about gsutil versioning Enable or suspend versioning for one or more buckets web Set a main page and/or error page for one or more buckets Additional help topics: acls Working With Access Control Lists crc32c CRC32C and Installing crcmod ... versions Object Versioning and Concurrency Control wildcards Wildcard Names Use gsutil help <command or topic> for detailed help.
2: Find help on a specific command
[]$ gsutil cp --help NAME cp - Copy files and objects SYNOPSIS gsutil cp [OPTION]... src_url dst_url gsutil cp [OPTION]... src_url... dst_url gsutil cp [OPTION]... -I dst_url DESCRIPTION The ``gsutil cp`` command allows you to copy data between your local file system and the cloud, within the cloud, and between cloud storage providers. For example, to upload all text files from the local directory to a bucket, you can run: gsutil cp *.txt gs://my-bucket ...
3: Copy Data from a Public Bucket
Data can be downloaded from the one of many public data resources. Using this genomics-public-data as an example, the public gsutil URI can be found from clicking on the three vertical dots to right-hand side of the file to get more information.
Once you’ve selected the file you require, you can then use the URI with the gsutil command to download it. For example:
[]$ gsutil cp gs://genomics-public-data/resources/broad/hg38/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz . Copying gs://genomics-public-data/resources/broad/hg38/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz... / [1 files][ 2.9 MiB/ 2.9 MiB] Operation completed over 1 objects/2.9 MiB.
4: Copy Data from an Entire Public Bucket
The example above copies a single file. If we want to download everything from v0
, we can use the following:
[salexan5@wi001 gsutil]$ time gsutil -m cp -r gs://genomics-public-data/resources/broad/hg38/v0/ GATK_bundle/hg38_Mar21_2022 Copying gs://genomics-public-data/resources/broad/hg38/v0/1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf... Copying gs://genomics-public-data/resources/broad/hg38/v0/1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf.idx... ... Copying gs://genomics-public-data/resources/broad/hg38/v0/scattered_calling_intervals/temp_0040_of_50/scattered.interval_list... Copying gs://genomics-public-data/resources/broad/hg38/v0/scattered_calling_intervals/temp_0007_of_50/scattered.interval_list... - [76/76 files][ 32.3 GiB/ 32.3 GiB] 100% Done 85.8 MiB/s ETA 00:00:00
Notices that we are using the -m option. More details of which can be found by running gsutil help options
from the command line.
Please be aware that according to the Global Command Line Options that “Using the -m option can consume a significant amount of network bandwidth and cause problems or make your performance worse if you use a slower network.”