Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Goal: Introduction to UW ARCC and our services.

...

Core Service 1: High Performance Computing: HPC

We maintain a number of clusters for the purpose of allowing researchers to perform a variety of use cases such as running:

...

  1. MedicineBow

    1. Released to campus beginning July 15th, 2024

    2. This is a high performance HPC cluster with enhanced GPU offerings to expand research capabilities in AI, machine and deep learning, and enhanced modeling.

    3. Currently consists of ~40 nodes with over 4224 CPU cores, 152 GPUs and 3PB of storage.

    4. This is the HPC resource we’ve been performing our training on throughout the bootcamp.

    5. Users can view partition information on our Hardware Summary Table

  2. Beartooth

    1. First released to campus January 2023

    2. A HPC cluster with ~375 nodes, over 10K CPU cores, 52 GPUs, and 1.2 PB of storage.

    3. Eventually, all Beartooth nodes and their associated hardware will be consolidated into MedicineBow and Beartooth is planned for retirement at the end of 2024.

    4. Users can view partition information on our Hardware Summary Table

...

Exercises:

  1. Log Into MedicineBow OnDemand. What do you initially see?

  2. How would you open a new ssh/shell window/connection?

  3. How would you get help?

...

Core Service 2: Research Data Storage 

Safe and secure storage and transfer of data that researchers can share and collaborate on with others within UW, and other institutions across the world. 

  1.  Alcova:

    1. High performance data storage geared toward project-oriented data.

    2. Storage for published research data.

  2.  Pathfinder: 

    1. Low-cost storage solution that enables a Cloud-like presence for research data hosted by ARCC. 

    2. Hosting onsite backups and enabling data sharing and collaboration.

...

  1.  Alcova:

    1. Consider as more traditional storage that can be accessed via SMB/AD via a traditional Windows File Explorer/Globus. 

    2. Access follows the idea of a project that users are part of and authenticated via username/AD.

  2.  Pathfinder: 

    1. A cheaper storage solution that is accessed either via a client and/or programmatically that uses S3 to provide object storage via buckets.

    2. Access it provides is via access/secret key tokens, that can be time based.

    3. Data can be made publicly available.

    4. It does not user the notion of projects/usernames.

Come and discuss what your needs and use cases are…

...

Core Service 2: Research Data Storage Changes:

  1. Data Portal:

    1. Effective June 1, 2024, ARCC introduced the ‘ARCC Data Portal’ serving the dual purpose of providing high performance back end storage for the MedicineBow HPC system and a data storage solution for researchers needing a centralized data repository for ongoing research projects.

    2. Data Portal storage is FREE up to the default allocation quota.

    3. ARCC’s Data Portal is compromised of VAST data storage compromised of high speed all-NVMe storage, housing 3 petabytes of raw storage. VAST storage employs data de-duplication allowing the system to logically store more than the raw 3PB available.

    4. MedicineBow vs Alcova Spaces:

      1. Alcova storage on the ARCC Data Portal can be thought of as the “new Alcova” and will replace the prior Alcova storage space listed here. This space is intended for use as collaborative data storage space using SMB protocol for interactive access. This space is backed up by ARCC and can only be used by researchers with a uwyo.edu account.

      2. MedBow space can be thought of as the root level directory of the HPC system, separated into home, project, and gscratch directories, intended for use with HPC workflows where speed and minimal overhead are prioritized over backups.

        1. MedicineBow Data Storage is available upon the go-live of MedicineBow on July 15th.

– the essence of these services will remain

– but the underlying systems are being updated

...

Core Service 3: End User Support

...