Command Line Data Transfer
While transferring data over a web application such as Open OnDemand, or over a client such as Cyberduck are easy-to-use, they are difficult to automate within a compute job. However, in some use cases, people may want to transfer data, run some computation on that data, transfer it back and so on. These type of tasks can be accomplished using the command line interface (CLI) on MedicineBow. There are many CLI options to use including the previously discussed Globus CLI, scp, SFTP, and rsync which will all work on MedicineBow, but in this module we will detail rclone because it is ARCC’s recommended command line tools due to it’s ability to work with desktops, HPC, and cloud storage systems as well as it’s ability to be multi-threaded to facilitate faster transfers.
scp and SFTP CLI Tools and Examples
Before diving into providing information on rclone, we’ll briefly cover some of the other command line tools and give examples for how to use them on MedicineBow.
scp - Uses SSH (Secure Shell) to authenticate, then securely transfer data. This means the data is authenticated by the user initiating the connection.
Example for from local to MedicineBow on Linux/Mac
Basic syntax of a scp command: scp file username@server:directory to transfer to
dylan@fireball:~$ scp transfer.file dperkin6@medicinebow.arcc.uwyo.edu:/project/arcc transfer.file 100% 0 0.0KB/s 00:00 dylan@fireball:~$
SFTP - can also be used on the CLI as well as clients. Compared to the SCP protocol, which only allows file transfers, the SFTP protocol allows for a wider range of operations on remote files. SFTP clients provide extra capabilities include resuming interrupted transfers, directory listings, and remote file removal.
SFTP is generally more platform-independent than SCP.
Example of interactive use of SFTP
dylan@fireball:~$ sftp dperkin6@medicinebow.arcc.uwyo.edu Connected to medicinebow.arcc.uwyo.edu. sftp>
Helpful SFTP commands
?
- is how you access the help,put
- to upload a file,mput
- to upload multiple filesget
- to download a file or directory,mget
- to download multiple files
rsync CLI Tool and example
rsync is another very useful tool, that has many options. Rsync (Remote Sync) is a most commonly used command for copying and synchronizing files and directories remotely as well as locally in Linux/Unix systems. With the help of rsync command you can copy and synchronize your data remotely and locally across directories, across disks and networks, perform data backups and mirroring between two Linux machines.
Basic syntax of rsync command: rsync options source destination
dylan@fireball:~$ rsync transfer.file dperkin6@medicinebow.arcc.uwyo.edu:/project/arcc/dperkin6
Some common options used with rsync commands
About rclone
Rclone is a command-line program to manage files on remote storage. It is a feature-rich alternative to cloud vendors' web storage interfaces. Over 70 cloud storage products support rclone including S3 object stores, business & consumer file storage services, as well as standard transfer protocols. Rclone has powerful cloud equivalents to the unix commands rsync, cp, mv, mount, ls, ncdu, tree, rm, and cat. It is used at the command line, in scripts or via its API.
Rclone mounts any local, cloud or virtual filesystem as a disk on Windows, macOS, linux and FreeBSD, and also serves these over SFTP, HTTP, WebDAV, FTP and DLNA.
Rclone helps you:
Backup (and encrypt) files to cloud storage
Restore (and decrypt) files from cloud storage
Mirror cloud data to other cloud services or locally
Migrate data to the cloud, or between cloud storage vendors
Mount multiple, encrypted, cached or diverse cloud storage as a disk
Union file systems together to present multiple local and/or cloud file systems as one
rclone Configuration
Rclone does require some configuration for any “transfer partner” this is a long process, but once setup it is useful to use over and again. Examples will be for configuring transfers to/from MedicineBow using authentication with an ssh-key:
You will then be given choices to make to continue configuring the ‘medbow’ remote configuration, in our case we will pick the number ‘27’ SSH/SFTP Connection
Then you will have to give more login info, in this case we will accept defaults by hitting ‘enter/return’ to continue until we get to the ‘key_file’ option and enter the location of the ssh key file:
The next options will be relating to passwords and certificates. For MedicineBow, none of this applies so we keep hitting enter until we get to the cipher where we will enter '1' for false. The final two steps are entering an advanced configuration and then saving before exiting the configuration setup.
Using rclone
The basic syntax goes as follows rclone <function> <source> <destination endpoint>:<bucket>.
the basic functions are:
copy - to copy files/directories to or from somewhere
sync - (one way) to make a directory identical
move - files to cloud storage deleting the local after verification
check - for missing/extra files
mount - your cloud storage as a network disk
More information on each function can be found at https://rclone.org/#what. An example of a copy from local to MedicineBow would be:
Next Steps
Previous | Workshop Home | Next |