Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Goal: Introduction to UW ARCC and our services.

...

Core Service 1: High Performance Computing: HPC

We maintain a number of clusters for the purpose of allowing researchers to perform a variety of use cases such as running:

...

Core Service 2: Research Data Storage 

Safe and secure storage and transfer of data that researchers can share and collaborate on with others within UW, and other institutions across the world. 

  1.  Alcova:

    1. High performance data storage geared toward project-oriented data.

    2. Storage for published research data.

  2.  Pathfinder: 

    1. Low-cost storage solution that enables a Cloud-like presence for research data hosted by ARCC. 

    2. Hosting onsite backups and enabling data sharing and collaboration.

...

Core Service 2: Research Data Storage: Which One?

Neither of these options is “tied to” HPC resources. You do not NEED to perform work on our HPC clusters to use either of these storage options.
Considerations: Cost, accessibility, use cases:

 Alcova: More “Traditional” Block and File Storage

 Pathfinder:  Cloud-based like S3 Object Storage

  1. Cost:

    1. Free, up to quota. Then charged more per TB than Pathfinder.

  2. Accessibility:

    1. Consider as more traditional storage

    2. Can be accessed via SMB/AD via a traditional Windows File Explorer, or Globus

  3. Permissions: follows the idea of a project that users are part of and authenticated via username.

    1. Management of permissions is tied to Active Directory.

    2. Or shared through Globus.

  1. Cost:

    1. A less expensive storage solution. Charged per TB.

  2. Accessibility:

    1. Accessed either via an S3 compatible client (cloudberry, cyberduck, some limited functionality in Globus) and/or programmatically (rclone, rsync, etc) over S3 to provide object storage via buckets).

    2. Provided via access/secret key tokens. Tokens can be time-based.

    3. Data can be made publicly available through a URL. URLs can be set with expiration dates.

  3. Permissions: It does not user the notion of projects/usernames, permissions are strictly maintained through key tokens.

Use cases:

  1. Very fast.

  2. Traditional use cases. Very fastHierarchy of files and folders/directories.

  3. Focused on data storage for research.

  4. Shared to others, explicitly. (Not easily made available publicly)

Use cases:

  1. Not as fast. May have some S3 latency compared to traditional research data storage, but most users will not notice a difference unless very large datasets stored/accessed.

  2. Access for secure data is harder to get used tomy be less familiar for most users, initially.

  3. Data easier to share publicly over a URL/web link.

  4. Flexible data structure storage - you can store any kind of data (vs traditional databases with rigid data schemas).

  5. Functionality to allow access to wide scale of large datasets for web-based applications

Come and discuss what your needs and use cases are…

...

Core Service 2: Research Data Storage Changes:

  1. Data Portal:

    1. Effective June 1, 2024, ARCC introduced the ‘ARCC Data Portal’ serving the dual purpose of providing high performance back end storage for the MedicineBow HPC system and a data storage solution for researchers needing a centralized data repository for ongoing research projects.

    2. Data Portal storage is FREE up to the default allocation quota.

    3. ARCC’s Data Portal is compromised of VAST data storage compromised of high speed all-NVMe storage, housing 3 petabytes of raw storage. VAST storage employs data de-duplication allowing the system to logically store more than the raw 3PB available.

    4. MedicineBow vs Alcova Spaces:

      1. Alcova storage on the ARCC Data Portal can be thought of as the “new Alcova” and will replace the prior Alcova storage space listed here. This space is intended for use as collaborative data storage space using SMB protocol for interactive access. This space is backed up by ARCC and can only be used by researchers with a uwyo.edu account.

      2. MedBow space can be thought of as the root level directory of the HPC system, separated into home, project, and gscratch directories, intended for use with HPC workflows where speed and minimal overhead are prioritized over backups.

        1. MedicineBow Data Storage is available upon the go-live of MedicineBow on July 15th.

– the essence of these services will remain

– but the underlying systems are being updated

...

Core Service 3: End User Support

...