Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

In this module section of the workshop we will discuss data management for research workflows, why it’s important, and introduce how you can use ARCC resources to manage your data. This page give background information for future topics, if you are looking for specific examples, please head back to the main Data Management page to navigate to other pages.

It is also important to note that the content of these pages are general suggestions ans guidelines to assist in your research workflows. The content is NOT rules or requirements for using ARCC resources during your research project.

...

Table of Contents
minLevel1
maxLevel1
outlinefalse
stylenone
typelist
printabletrue

...

Research Data Life-cycle

The Life-cycle of data in a research project can be broken down into multiple phases. These phases can be thought of as distinct phases, but often blend into each other with little to distinguish the differences between them. Below we provide details and guidelines for each phase.

 

ResearchDataLife.png

...

There are multiple types of data and collection of these data vary greatly depending on the kind of research being done. Below is a table of some types of data that could apply top any management scenario.

...

Two Column Tables are nice ways to separate content/ Background info along with an image example on the same “Slide”. Please notice the table width. This should stop scroll bars from appearing

  • Bullets are nice to include for distinct points

  • yep

  • they

  • sure

  • are

    This is 14 lines

...

 

Please note this is not a comprehensive list and many more types of data that exist.

Classic

Simulated/automated

Social

  • Text files

  • Tabular (Spreadsheets, Databases, etc.)

  • Matrices

  • Observations/field notes

  • Computer Models

  • Instruments (Microscopes, Weather Stations, Satellite Imagery, etc.)

  • Audio/video recordings

  • Surveys

  • Interviews

  • Focus groups

  • Exit Polling

During this phase, it is important to keep data that are being collected organized and named with appropriate conventions to assist with the next phases, and examples will be discussed in other modules.

...

How ARCC Can Help With the Collection Phase

 

...

The Storage Phase

How ARCC Can Help With the Storage Phase

The Analysis Phase

How ARCC Can Help With the Analysis Phase

The Publishing Phase

Since ARCC does not advise on how research should be done, how data are collected is not usually an area of expertise we provide. However, we can provide advice on how the data maybe used in later phases of the Research Data Management Life-cycle, that you may want to be mindful of while you are in the collection phase. If you are unsure about anything that you may run into, please remember that ARCC provides the following that may assist you:

  • The ARCC documentation and polices can provide researchers with much of the background information required about resources available

  • ARCC resources are described in a Facilities Statement

  • ARCC is always willing to meet with researchers to discuss any Data Management issues by scheduling through our ticketing system

...

The Storage Phase

Once your research data are collected, you will need a place to keep them before moving onto the next phases. This phase is often the longest of the phases and sometimes overlaps many of the others. While seemingly trivial, the storage phase is vital to the Data Management Life-cycle. Here are a few nuances to be aware of before we discuss the systems and services ARCC provides that can assist in this phase, and it is important to ask yourself a few questions before making a decision on where your data will be stored:

  • Does the data fall under any federal compliance or other security restrictions?

  • How are the data to be accessed and how frequently?

  • Do the data need to be backed up or version controlled?

  • Do other collaborators require access and are they local to your institution or not?

...

How ARCC Can Help With the Storage Phase

Research data storage is a core service that ARCC provides and we have several storage options available for you that will be discussed in subsequent modules, but to state it briefly there are three core storage systems that ARCC provides that fit different phases of the Research Data Life-cycle each filling different roles detailed in the table below:

The ARCC Data Portal (Storage)

MedicineBow HPC system (Analysis)

Pathfinder (Storage)

  • Free for UWyo researchers up to a default limit

  • Accessible via the UWyo network or VPN

  • Includes backups and snapshots

  • Home (for configuration and profiles)

  • Project (for shared data during analysis)

  • gscratch (for actively read/write during analysis)

    • MedicineBow is NOT backed up, but includes snapshots

  • Cloud-like backend

  • Web-enabled S3 buckets for data storage, data transfer, etc.

  • Is NOT backed up

Transferring data to and from these systems is discussed in another workshop. Please also be aware that none of these systems meet any federal compliance requirements.

...

The Analysis Phase

The analysis phase can include a variety of methodologies and tools to complete. This phase also often includes different stages and versions of data. Here are a few questions to ask yourself before entering this phase of the Research Data Life-cycle:

  • How large are the data that I am working with?

  • Will I need a powerful system such as a High Performance Computing system to complete this work?

  • What software will I need to perform the analysis?

  • Will there be new data generated as a result of this work (simulated data for model training, summarized subset of raw data etc.)

  • Will this work change my raw data and do I need to keep a copy of either the raw data or results?

  • How will I manage the changes that will happen during this phase and maintain a record of them?

...

How ARCC Can Help With the Analysis Phase

High Performance Computing is another core ARCC service and we offer an assortment of support for this type of work. Along with the MedicineBow HPC system, we provide documentation, troubleshooting consultations, software management, and workshops among the system administration of the system. Additionally, we provide facilitation of and technical support for NCAR Wyoming Supercomputing Center’s Derecho system.

If neither of these systems meets your needs for the analysis phase and you still require assistance, please reach out to us via our service portal to discuss want your requirements are and potential options.

Another service that may be of use during this phase that ARCC provides is GitLab for collaborative code development and version control. We do also recommend maintaining a README file that is associated with your work to record additional metadata that will be useful for the publishing phase of the Research Data Life-cycle. Metadata and README files are discussed in the next module.

...

The Publishing Phase

This phase of the Research Data Life-cycle usually occurs after the work has been completed but before other work (such as a manuscript) is published. What exactly it involves depends on the requirements of the various funding agencies and/or scientific journals that you are working with. For example, if your work was funded by the NSF the resulting data of your work must be made publicly available, and if you are wanting to publish in the Journal of Science, your data has to be available before your manuscript will be published itself. Good scholarly metadata (described in the next section) will be key to completing this phase. Other key concepts in this phase are:

  • Discipline specific data repositories

  • General or institutional data repositories

  • Digital identifiers, such as a Digital Object Identifiers (DOIs)

  • Personal scholarly identifiers, such as an ORCID

...

How ARCC Can Help With the Publishing Phase

Next Steps

...

Link to Previous sub-module or Home Module

 

...

ARCC supports some of the systems used in publishing research data along with the Data Librarians at The University of Wyoming Libraries. The Data Librarians will be the primary points of contact during this phase and can seek ARCC’s assistance if needed. Additionally, some larger datasets will require ARCC to host or move for the researcher. Lastly, if the data to be published are already stored on one of ARCC’s systems, ARCC can assist in getting it moved to the appropriate place for publishing.

...

Next Steps