/
Data Moving and Access

Data Moving and Access

Overview

In the world of HPC new users are often unsure of the best ways to access their data, and frequent need to move and copy data between locations.
This page covers various methods of performing these basic file operations. Because there are several different options, this page can be thought of as an ARCC central reference for data access and transfer.

If you’re new to HPC, we recommend you begin with Graphical User Interface options as these tend to be more intuitive for new HPC users with minimal command line experience.

Graphical User Interface Options:

Mapping ARCC Storage as a network drive (Available for access to Beartooth and Alcova Data)
Access to ARCC Storage with Globus (Available for access to Beartooth, Alcova, Pathfinder, and Wildiris Data)
Access to you Beartooth Data with Southpass (Available for access to Beartooth Data, WI Ondemand for Wildiris Data)

Command Line Tools

Here are several command line tools to make managing files easier:

scp (Secure Copy)

Secure copy or SCP is a means of securely transferring computer files between a local host and a remote host or between two remote hosts. It is based on the Secure Shell (SSH) protocol. From the command line interface, the local host is always represented by the text before the $ in your command line prompt.

Example: user@hostname:location$ This prompt reference the username of the user you’re logged in as, the hostname (the computer you're logged into on the command line prompt) and location, the folder you're currently in. For example if Cowboy Joe were logged into Beartooth and in his home folder, the prompt may like: [cowboyjoe@blog1 ~]$ Since users usually connect to Beartooth over a login node, blog1 references the current node on the Beartooth HPC that Cowboy Joe was assigned to, when he logged into the cluster. This is the specific node on the cluster he's using. ~ is an abbreviation and reference to his /home folder in the cluster which he's currently "in". The [hostname] from the login prompt represents the local host in all following SCP examples.

Copying a file or folder from a remote host to your local host using SCP:

$ scp username@from_host:/path_to_file/file.txt /local/folder/
$ scp -r username@from_host:/path_to_folder/folder  /local/folder

Copying a file or folder from local host to a remote host using SCP:

$ scp file.txt username@to_host:/remote/folder/

Copying a file from one remote host to another remote host using SCP:

Copy all the contents from a local host folder into a remote host folder using SCP:

sftp

Secure Shell File Transfer Protocol is a command-line interface client program to securely transfer files using an encrypted Secure Shell connection. SFTP should not be confused with running an FTP client over an SSH connection.

Start the sftp interface: SFTP username@beartooth.uwyo.edu

Examples

  • This gets a txt file from Teton and moves it to the Local System:
    get hello_world.txt destination_directory

  • This puts a txt file from the Local System onto Teton:
    put source_directory/hello_world.txt destination_directory

rclone

  • rclone is an updated and multithreaded version of the rsync utility, with robust capabilities.

    • functions as both a file synchronization and file transfer program

  • The rsync algorithm is a type of delta encoding and is used to minimize network usage. Zlib may be used for additional compression, and SSH or stunnel can be used for data security

  • Rsync is typically used to synchronize files and directories between two different systems.

    • For example, if the command rclone local-file user@beartooth.arcc.uwyo.edu:remote-file is run, rclone will use SSH to connect as the user to remote-host

Examples

  • To sync the contents of dir1 to dir2 on the same system:
    rclone -r dir1/ dir2

  • To sync with a remote system:
    rclone -a ~/dir1 username@beartooth.arcc.uwyo.edu:destination_directory

FTP

FTP (File Transfer Protocol) is a network protocol for transmitting files between computers over Transmission Control Protocol/Internet Protocol (TCP/IP) connections. Within the TCP/IP suite, FTP is considered an application layer protocol.

  • Although some systems might have a ftp client installed directly on the system, due to various security concerns, and that the protocol is being deprecated, ARCC does not have a client installed.

    • As a starting point, see the Wiki entry that states “Throughout 2021, the two major web browser vendors removed this ability. Support for the FTP protocol was first disabled in Google Chrome 88 in January 2021,[4] followed by Firefox 88.0 in April 2021.[5] In July 2021, Firefox 90 dropped FTP entirely,[6] and Google followed suit in October 2021, removing FTP entirely in Google Chrome 95”.

Alternatives

Where users require an ftp client we can offer a number of alternatives:

Globus Online

Globus manages file transfers between two computer systems. It is ideal for large files and available for many institutional clusters and networks.
This document covers the basics of using Globus and provides external links to more detailed information.
If you’re a first time user or simply need a refresher, please refer to Globus’ excellent step by step guide.

Using Globus in Brief

  1. Login to Globus' Web app.

    1. Click “Login”.

    2. To use UWyo’s organizational login, search for ‘University of Wyoming’.

      1. Note that this step may be skipped if Globus still has you cached.

  2. Enter or search for an endpoint or collection in the ‘Collection’ field or find recently used endpoints under ENDPOINTS in the left pane.

    1. The Teton/Alcova collection is ‘ARCC Teton’.

  3. Click ‘Transfer or Sync to…’ in the panel on the right side of the page.

  4. Enter/find a second collection.

  5. Browse to the appropriate source and destination folders in both collections.

  6. Select the files/folders you wish to copy and click ‘Start’.

  7. Click ‘Activity’ in the left pane to observe the transfer progress.

Globus file share instructions

See Globus' instructions on sharing data to learn how to create and share an upload/download repository that you can control access to.

Globus Connect Personal

Globus Connect Personal allows you to share and transfer files to and from your laptop or desktop computer. GCP supports the three primary OS’s. The default is for Mac; click the ‘Show me other supported operating systems’ toggle to download for Linux and Windows.

Note that GCP has a “High Assurance” option for Protected Health Information or Controlled Unclassified Information.

Globus Command Line Interface (CLI)

If you need to use a command line interface, please refer to Globus’ excellent step by step guide.

Related pages