Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

In this module section of the workshop we will discuss data management for research workflows, why it’s important, and introduce how you can use ARCC resources to manage your data. This page give background information for future topics, if you are looking for specific examples, please head back to the main Data Management page to navigate to other pages.

...

Research data storage is a core service that ARCC provides and we have several storage options available for you that will be discussed in subsequent modules, but to state it briefly there are three core storage systems that ARCC provides that fit different phases of the Research Data Life-cycle each filling different roles detailed in the table below:

The ARCC Data Portal(Storage)

MedicineBow HPC system(Analysis)

Pathfinder(Storage)

  • Free for UWyo researchers up to a default limit

  • Accessible via the UWyo network or VPN

  • Includes backups and snapshots

  • Home (for configuration and profiles)

  • Project (for shared data during analysis)

  • gscratch (for actively read/write during analysis)

    • This MedicineBow is NOT backed up, but includes snapshots

  • Cloud-like backend

  • Web-enabled S3 buckets for data storage, data transfer, etc.

  • Is NOT backed up

Transferring data to and from these systems is discussed in another workshop. Please also be aware that none of these systems meet any federal compliance requirements.

...

The Analysis Phase

The analysis phase can include a variety of methodologies and tools to complete. This phase also often includes different stages and versions of data. Here are a few questions to ask yourself before entering this phase of the Research Data Life-cycle:

  • How large are the data that I am working with?

  • Will I need a powerful system such as a High Performance Computing system to complete this work?

  • What software will I need to perform the analysis?

  • Will there be new data generated as a result of this work (simulated data for model training, summarized subset of raw data etc.)

  • Will this work change my raw data and do I need to keep a copy of either the raw data or results?

  • How will I manage the changes that will happen during this phase and maintain a record of them?

...

How ARCC Can Help With the Analysis Phase

High Performance Computing is another core ARCC service and we offer an assortment of support for this type of work. Along with the MedicineBow HPC system, we provide documentation, troubleshooting consultations, software management, and workshops among the system administration of the system. Additionally, we provide facilitation of and technical support for NCAR Wyoming Supercomputing Center’s Derecho system.

If neither of these systems meets your needs for the analysis phase and you still require assistance, please reach out to us via our service portal to discuss want your requirements are and potential options.

Another service that may be of use during this phase that ARCC provides is GitLab for collaborative code development and version control. We do also recommend maintaining a README file that is associated with your work to record additional metadata that will be useful for the publishing phase of the Research Data Life-cycle. Metadata and README files are discussed in the next module.

...

The Publishing Phase

This phase of the Research Data Life-cycle usually occurs after the work has been completed but before other work (such as a manuscript) is published. What exactly it involves depends on the requirements of the various funding agencies and/or scientific journals that you are working with. For example, if your work was funded by the NSF the resulting data of your work must be made publicly available, and if you are wanting to publish in the Journal of Science, your data has to be available before your manuscript will be published itself. Good scholarly metadata (described in the next section) will be key to completing this phase. Other key concepts in this phase are:

  • Discipline specific data repositories

  • General or institutional data repositories

  • Digital identifiers, such as a Digital Object Identifiers (DOIs)

  • Personal scholarly identifiers, such as an ORCID

...

How ARCC Can Help With the Publishing Phase

ARCC supports some of the systems used in publishing research data along with the Data Librarians at The University of Wyoming Libraries. The Data Librarians will be the primary points of contact during this phase and can seek ARCC’s assistance if needed. Additionally, some larger datasets will require ARCC to host or move for the researcher. Lastly, if the data to be published are already stored on one of ARCC’s systems, ARCC can assist in getting it moved to the appropriate place for publishing.

...

Next Steps

Link to Previous sub-module or Home Module

 

Align left link to next sub-module or home

Workshop Home

Intro to Data Management

Next 

Metadata and README files