Pathfinder_old
Named after one of Wyoming’s reservoirs on the North Platte River, Pathfinder is a low-cost storage solution that enables a Cloud-like presence for research data hosted by ARCC. The system is built to be expandable and provides data protection. Its core functionality is hosting onsite backups as well as enabling data sharing and collaboration.
Contents
Frequently Asked Questions (FAQs)
How it Works
Pathfinder uses the Simple Storage Service (S3) protocol originally developed by Amazon that they define as “storage for the Internet”. S3 works on object storage through a service provided by Red Hat Enterprise Linux called Ceph.
Unique Characteristics of S3
ARCC's S3 presences, Like Pathfinder, do not function like Windows or traditional storage systems. Below is a list of a few unique characteristics of S3.
S3 has two primary entities called buckets and objects.
Buckets are the access points and objects are stored inside them.
Bucket names have to be globally unique irrespective of which region they are created in.
As buckets can be accessed using URLs, it is recommended that bucket names follow DNS naming conventions: all letters should be in lowercase and don’t contain special characters.
Objects are directories or files.
Basically, it works like you upload images and you want to differentiate it from other files, you can create a file for it and store it so that the logical address of the file would have the prefix ‘pictures.’
For example, pictures/hello.jpg that would differentiate it from images/hello.jpg.
'Users' are replaced with Access Keys and 'passwords' are replaced with Secret Keys.
Access keys consist of two parts: an access key ID (for example,
AKIAIOSFODNN7EXAMPLE
) and a secret access key (for example,wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
).Like a user name and password, you must use both the access key ID and secret access key together to authenticate your access to your buckets.
Manage your access keys as securely as you do your user name and password.
However, access keys are associated with a project, lab, or department and can not be associated with a specific UWYO user.
Permissions are functionally limited and are only supported for basic usage.
For example, granular access to a single folder or directory that is possible in a traditional storage system is not well supported in S3.
Typically multiple users are able to use the same single access key to buckets with nothing that distinguishes between them.
Purpose of System
The Pathfinder S3-Ceph storage architecture is designed and hosted by ARCC to serve two primary purposes:
The system will act as an onsite backup target for other ARCC services such as the petaLibrary or Teton.
The system will act as a publicly accessible data transfer platform via the S3 protocol.
Users will be able to host their own 'bucket' to share data.
User can also obtain data from external collaborators.
The system also serves a wide variety of supplementary functions:
Programmatically access data hosted within S3. Codes run on Teton can pull data from and push data to S3.
Onsite backup of important user-defined data. For users that run their own storage system but still need backups.
Timed/temp download links - S3 allows users to make data available publicly (with a tokenized link) that will expire after a specified time-frame. i.e. Make a file temporarily available to external users
Use Cases
Data Transfer
Host data publicly that end users can be allowed to download directly, or with credentials.
User-based Backups
Back data up to Pathfinder as a second (or third) copy of your critical research, using a wide variety of open-source tools.
This space is a stand-alone entity, and will not be mounted directly on other ARCC resources.
This system is *NOT* backed up. Data that reside on this system should be available in other location(s). This system is intended as a secondary backup and a temporary repository for data transfers ONLY
S3 Clients
The S3 protocol requires a client to connect to the server. There are a variety of Graphical User Interface (GUI) and Command Line Interface (CLI) clients that can be used to connect to Pathfinder. With so many S3 clients available, not all have been tested by ARCC but the few that we have are detailed in the table below.
Client Name | Operating System | GUI or CLI | Free? | ARCC recommended/supported |
---|---|---|---|---|
Windows, macOS | GUI | Yes, but larger transfers will require a license | Yes | |
Windows, macOS, | GUI | Yes | Best Effort | |
macOS | GUI | No | Best Effort | |
Windows, macOS, Linux | GUI | Yes | No | |
Windows, macOS, Linux | CLI | Yes | Yes | |
macOS, Linux | CLI | Yes | Best Effort |
Instructions for using Pathfinder with MSP360 Explorer (Cloudberry)
Instructions for using Pathfinder with Rclone
Scripting/Programming Packages
Some programming languages provide software packages that can use the S3 protocol for accessing data. ARCC has tried a few of these and are detailed in the table below.
Package Name | Language | ARCC Tested |
---|---|---|
boto3 | Python | Yes |
aws.s3 | R | Yes |
AWS | C# | No |
Cost
Price Structure for S3
This price structure is based on actual hardware costs and does not include personnel or infrastructure (network/datacenter) costs. Those have been subsidized by ARCC and the Office of Research and Economic Development.
one-time fee of $50 per Accesskey/Secretkey
$45 per terabyte per year, billed monthly based on usage
Requesting Access
To request access to Pathfinder and receive an Accesskey/Secretkey combo please do so by emailing arcc-help@uwyo.edu with the subject of “Pathfinder access request”.