Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Current »

Overview

gsutil is a Python application that lets you access Cloud Storage from the command line. You can use gsutil to do a wide range of bucket and object management tasks, including:

  • Creating and deleting buckets.

  • Uploading, downloading, and deleting objects.

  • Listing buckets and objects.

  • Moving, copying, and renaming objects.

  • Editing object and bucket ACLs.

gsutil performs all operations, including uploads and downloads, using HTTPS and transport-layer security (TLS).

List of available gsutil commands: use the help facility to find specific details

Using

Use the module name gsutil to discover versions available and to load the application.

Note: Please do not run this command from a login node as it will by default use ALL the cores on the node it is running from. Please run from an interactive salloc session.

Examples

1: Find all commands

[]$ gsutil --help
Usage: gsutil [-D] [-DD] [-h header]... [-i service_account] [-m] [-o section:flag=value]... [-q] [-u user_project] [command [opts...] args...]
Available commands:
  acl              Get, set, or change bucket and/or object ACLs
  autoclass        Configure autoclass feature
  bucketpolicyonly Configure uniform bucket-level access
  cat              Concatenate object content to stdout
...
  version          Print version info about gsutil
  versioning       Enable or suspend versioning for one or more buckets
  web              Set a main page and/or error page for one or more buckets

Additional help topics:
  acls             Working With Access Control Lists
  crc32c           CRC32C and Installing crcmod
...
  versions         Object Versioning and Concurrency Control
  wildcards        Wildcard Names

Use gsutil help <command or topic> for detailed help.

2: Find help on a specific command

[]$ gsutil cp --help
NAME
  cp - Copy files and objects

SYNOPSIS

  gsutil cp [OPTION]... src_url dst_url
  gsutil cp [OPTION]... src_url... dst_url
  gsutil cp [OPTION]... -I dst_url

DESCRIPTION
  The ``gsutil cp`` command allows you to copy data between your local file
  system and the cloud, within the cloud, and between
  cloud storage providers. For example, to upload all text files from the
  local directory to a bucket, you can run:

    gsutil cp *.txt gs://my-bucket
...

3: Copy Data from a Public Bucket

Data can be downloaded from the one of many public data resources. Using this genomics-public-data as an example, the public gsutil URI can be found from clicking on the three vertical dots to right-hand side of the file to get more information.

Once you’ve selected the file you require, you can then use the URI with the gsutil command to download it. For example:

[]$ gsutil cp gs://genomics-public-data/resources/broad/hg38/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz .
Copying gs://genomics-public-data/resources/broad/hg38/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.hg38.vcf.gz...
/ [1 files][  2.9 MiB/  2.9 MiB]
Operation completed over 1 objects/2.9 MiB.

4: Copy Data from an Entire Public Bucket

The example above copies a single file. If we want to download everything from v0, we can use the following:

[]$ time gsutil -m cp -r gs://genomics-public-data/resources/broad/hg38/v0/ GATK_bundle/hg38_Mar21_2022
Copying gs://genomics-public-data/resources/broad/hg38/v0/1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf...
Copying gs://genomics-public-data/resources/broad/hg38/v0/1000G.phase3.integrated.sites_only.no_MATCHED_REV.hg38.vcf.idx...
...
Copying gs://genomics-public-data/resources/broad/hg38/v0/scattered_calling_intervals/temp_0040_of_50/scattered.interval_list...
Copying gs://genomics-public-data/resources/broad/hg38/v0/scattered_calling_intervals/temp_0007_of_50/scattered.interval_list...
- [76/76 files][ 32.3 GiB/ 32.3 GiB] 100% Done  85.8 MiB/s ETA 00:00:00

Notices that we are using the -m option which "Causes supported operations (acl ch, acl set, cp, mv, rm, rsync, and setmeta) to run in parallel. This can significantly improve performance if you are performing operations on a large number of files over a reasonably fast network connection.“ More details of which can be found by running gsutil help options from the command line.

Please be aware that according to the Global Command Line Options that “Using the -m option can consume a significant amount of network bandwidth and cause problems or make your performance worse if you use a slower network.

  • No labels