gsutil
Overview
gsutil is a Python application that lets you access Cloud Storage from the command line. You can use gsutil to do a wide range of bucket and object management tasks, including:
Creating and deleting buckets.
Uploading, downloading, and deleting objects.
Listing buckets and objects.
Moving, copying, and renaming objects.
Editing object and bucket ACLs.
gsutil performs all operations, including uploads and downloads, using HTTPS and transport-layer security (TLS).
List of available gsutil commands: use the help facility to find specific details
Using
Use the module name gsutil
to discover versions available and to load the application.
Note: Please do not run this command from a login node as it will by default use ALL the cores on the node it is running from. Please run from an interactive salloc
session.
Examples
1: Find all commands
[]$ gsutil --help
Usage: gsutil [-D] [-DD] [-h header]... [-i service_account] [-m] [-o section:flag=value]... [-q] [-u user_project] [command [opts...] args...]
Available commands:
acl Get, set, or change bucket and/or object ACLs
autoclass Configure autoclass feature
bucketpolicyonly Configure uniform bucket-level access
cat Concatenate object content to stdout
...
version Print version info about gsutil
versioning Enable or suspend versioning for one or more buckets
web Set a main page and/or error page for one or more buckets
Additional help topics:
acls Working With Access Control Lists
crc32c CRC32C and Installing crcmod
...
versions Object Versioning and Concurrency Control
wildcards Wildcard Names
Use gsutil help <command or topic> for detailed help.
2: Find help on a specific command
[]$ gsutil cp --help
NAME
cp - Copy files and objects
SYNOPSIS
gsutil cp [OPTION]... src_url dst_url
gsutil cp [OPTION]... src_url... dst_url
gsutil cp [OPTION]... -I dst_url
DESCRIPTION
The ``gsutil cp`` command allows you to copy data between your local file
system and the cloud, within the cloud, and between
cloud storage providers. For example, to upload all text files from the
local directory to a bucket, you can run:
gsutil cp *.txt gs://my-bucket
...
3: Copy Data from a Public Bucket
Data can be downloaded from the one of many public data resources. Using this genomics-public-data as an example, the public gsutil URI can be found from clicking on the three vertical dots to right-hand side of the file to get more information.
Once you’ve selected the file you require, you can then use the URI with the gsutil command to download it. For example:
[]$ gsutil cp gs://genomics-public-data/resources/broad/hg38/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz .
Copying gs://genomics-public-data/resources/broad/hg38/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz...
/ [1 files][ 2.9 MiB/ 2.9 MiB]
Operation completed over 1 objects/2.9 MiB.
4: Copy Data from an Entire Public Bucket in Parallel
The example above copies a single file. If we want to download everything from v0
, we can use the following:
Notice that we are using the -m
option which "Causes supported operations (acl ch, acl set, cp, mv, rm, rsync, and setmeta) to run in parallel. This can significantly improve performance if you are performing operations on a large number of files over a reasonably fast network connection.“ More details of which can be found by running gsutil help options
from the command line.
Please be aware that according to the Global Command Line Options that “Using the -m option can consume a significant amount of network bandwidth and cause problems or make your performance worse if you use a slower network.”