Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 35 Next »

Named after one of Wyoming’s reservoirs on the North Platte River, Pathfinder is a low-cost storage solution that enables a Cloud-like presence for research data hosted by ARCC. The system is built to be expandable and provides data protection. Its core functionality is hosting onsite backups as well as enabling data sharing and collaboration.

Contents

Glossary


How it Works

Pathfinder uses the Simple Storage Service (S3) protocol originally developed by Amazon that they define as “storage for the Internet”. S3 works on object storage through a service provided by Red Hat Enterprise Linux called Ceph.

Unique Characteristics of S3

ARCC's S3 presences, Like Pathfinder, do not function like Windows or traditional storage systems. Below is a list of a few unique characteristics of S3.

  • S3 has two primary entities called buckets and objects.

    • Buckets are the access points and objects are stored inside them.

      • Bucket names have to be globally unique irrespective of which region they are created in.

      • As buckets can be accessed using URLs, it is recommended that bucket names follow DNS naming conventions: all letters should be in lowercase and don’t contain special characters.

    • Objects are directories or files.

      • Basically, it works like you upload images and you want to differentiate it from other files, you can create a file for it and store it so that the logical address of the file would have the prefix ‘pictures.’

      • For example, pictures/hello.jpg that would differentiate it from images/hello.jpg.

  • 'Users' are replaced with Access Keys and 'passwords' are replaced with Secret Keys.

    • Access keys consist of two parts: an access key ID (for example, AKIAIOSFODNN7EXAMPLE) and a secret access key (for example, wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY). 

    • Like a user name and password, you must use both the access key ID and secret access key together to authenticate your access to your buckets.

      • Manage your access keys as securely as you do your user name and password.

      • However, access keys are associated with a project, lab, or department and can not be associated with a specific UWYO user.

  • Permissions are functionally limited and are only supported for basic usage.

    • For example, granular access to a single folder or directory that is possible in a traditional storage system is not well supported in S3.

    • Typically multiple users are able to use the same single access key to buckets with nothing that distinguishes between them.

Purpose of System

The Pathfinder S3-Ceph storage architecture is designed and hosted by ARCC to serve two primary purposes:

  1. The system will act as an onsite backup target for other ARCC services such as the petaLibrary or Teton.

  2. The system will act as a publicly accessible data transfer platform via the S3 protocol.

    1. Users will be able to host their own 'bucket' to share data.

    2. User can also obtain data from external collaborators.

The system also serves a wide variety of supplementary functions:

  • Programmatically access data hosted within S3. Codes run on Teton can pull data from and push data to S3.

  • Onsite backup of important user-defined data. For users that run their own storage system but still need backups.

  • Timed/temp download links - S3 allows users to make data available publicly (with a tokenized link) that will expire after a specified time-frame. i.e. Make a file temporarily available to external users

Use Cases

  • Data Transfer

Host data publicly that end users can be allowed to download directly, or with credentials.

  • User-based Backups

Back data up to Pathfinder as a second (or third) copy of your critical research, using a wide variety of open-source tools.

This space is a stand-alone entity, and will not be mounted directly on other ARCC resources.

This system is *NOT* backed up. Data that reside on this system should be available in other location(s). This system is intended as a secondary backup and a temporary repository for data transfers ONLY

S3 Clients

The S3 protocol requires a client to connect to the server. There are a variety of Graphical User Interface (GUI) and Command Line Interface (CLI) clients that can be used to connect to Pathfinder. With so many S3 clients available, not all have been tested by ARCC but the few that we have are detailed in the table below.

Client Name

Operating System

GUI or CLI

Free?

ARCC recommended/supported

MSP360 Explorer (Cloudberry)

Windows, macOS

GUI

Yes, but larger transfers will require a license

Yes

Cyberduck

Windows, macOS,

GUI

Yes

Best Effort

Transmit

macOS

GUI

No

Best Effort

Dragon Disk

Windows, macOS, Linux

GUI

Yes

No

rclone

Windows, macOS, Linux

CLI

Yes

Yes

s3cmd

macOS, Linux

CLI

Yes

Best Effort

Instructions for using Pathfinder with MSP360 Explorer (Cloudberry)

Instructions for using Pathfinder with rclone

Scripting/Programming Packages

Some programming languages provide software packages that can use the S3 protocol for accessing data. ARCC has tried a few of these and are detailed in the table below.

Package Name

Language

ARCC Tested

boto3

Python

Yes

aws.s3

R

Yes

AWS

C#

No

Cost

Price Structure for S3

This price structure is based on actual hardware costs and does not include personnel or infrastructure (network/datacenter) costs. Those have been subsidized by ARCC and the Office of Research and Economic Development.

  • one-time fee of $50 per Accesskey/Secretkey

  • $35 per terabyte per year, billed monthly based on usage


  • No labels