In this module we will discuss data management for research workflows, why it’s important, and introduce how you can use ARCC resources to manage your data. This page give background information for future topics, if you are looking for specific examples, please head back to the main Data Management page to navigate to other pages.
It is also important to note that the content of these pages are general suggestions ans guidelines to assist in your research workflows. The content is NOT rules or requirements for using ARCC resources during your research project.
Research Data Life-cycle
The Life-cycle of data in a research project can be broken down into multiple phases. These phases can be thought of as distinct phases, but often blend into each other with little to distinguish the differences between them. Below we provide details and guidelines for each phase. |
|
The Planning Phase
Data Management Planning is an often overlooked, but critical phase of the Research Data Management Life-cycle. Not only will it be useful for the execution of your research project, a formalized plan is often required by funding agencies such as the National Science Foundation (NSF) and National Institutes of Health (NIH) among many others. The planning phase of the research data life-cycle usually comes after a research project has been conceptualized but before a the project is underway (or even funded), but can always be re-visited in an informal manner. It is important to consider a variety of things during this phase as well as establish goals for your data:
What kind of data is required to answer our research question?
What file formats will be collected?
Is there a particular software needed in the other phases that requires the data to be formatted in a particular way?
Are there any federal compliance requirements?
How will the data be stored and protected prior to analysis?
Will the data be preserved or discarded after the project is complete?
How ARCC Can Help With Planning
ARCC is a good resource for many of the phases in the Research Data Management Life-cycle, but in the planning phase, is a bit more limited in scope. That said, there are some things researchers can interact with ARCC on in forming this plan:
The ARCC documentation and polices can provide researchers with much of the background information required about resources available
ARCC resources are described in a Facilities Statement
ARCC is always willing to meet with researchers to discuss any Data Management issues by scheduling through our ticketing system
We work closely with UWyo Libraries, who are well versed in Data Management and can refer to them for more nuanced questions
They also administer the UWyo instance of a Data Management Planning Tool called DMPTool, which can be very useful for writing data management plans
They also have resources available for publishing research data, which will be discussed later in this module
The Collection Phase
There are multiple types of data and collection of these data vary greatly depending on the kind of research being done. Below is a table of some types of data that could apply top any management scenario. Please note this is not a comprehensive list and many more types of data that exist.
Classic | Simulated/automated | Social |
---|---|---|
|
|
|
During this phase, it is important to keep data that are being collected organized and named with appropriate conventions to assist with the next phases, and examples will be discussed in other modules.
How ARCC Can Help With the Collection Phase
Since ARCC does not advise on how research should be done, how data are collected is not usually an area of expertise we provide. However, we can provide advice on how the data maybe used in later phases of the Research Data Management Life-cycle, that you may want to be mindful of while you are in the collection phase. If you are unsure about anything that you may run into, please remember that ARCC provides the following that may assist you:
The ARCC documentation and polices can provide researchers with much of the background information required about resources available
ARCC resources are described in a Facilities Statement
ARCC is always willing to meet with researchers to discuss any Data Management issues by scheduling through our ticketing system
The Storage Phase
Once your research data are collected, you will need a place to keep them before moving onto the next phases. This phase is often the longest of the phases and sometimes overlaps many of the others. While seemingly trivial, the storage phase is vital to the Data Management Life-cycle. Here are a few nuances to be aware of before we discuss the systems and services ARCC provides that can assist in this phase, and it is important to ask yourself a few questions before making a decision on where your data will be stored:
Does the data fall under any federal compliance or other security restrictions?
How are the data to be accessed and how frequently?
Do the data need to be backed up or version controlled?
Do other collaborators require access and are they local to your institution or not?
How ARCC Can Help With the Storage Phase
Research data storage is a core service that ARCC provides and we have several storage options available for you that will be discussed in subsequent modules, but to state it briefly there are three core storage systems that ARCC provides that fit different phases of the Research Data Life-cycle each filling different roles detailed in the table below:
The ARCC Data Portal (Storage) | MedicineBow HPC system (Analysis) | Pathfinder (Storage) |
---|---|---|
|
|
|
Transferring data to and from these systems is discussed in another workshop. Please also be aware that none of these systems meet any federal compliance requirements.
The Analysis Phase
The analysis phase can include a variety of methodologies and tools to complete. This phase also often includes different stages and versions of data. Here are a few questions to ask yourself before entering this phase of the Research Data Life-cycle:
How large are the data that I am working with?
Will I need a powerful system such as a High Performance Computing system to complete this work?
What software will I need to perform the analysis?
Will there be new data generated as a result of this work (simulated data for model training, summarized subset of raw data etc.)
Will this work change my raw data and do I need to keep a copy of either the raw data or results?
How will I manage the changes that will happen during this phase and maintain a record of them?
How ARCC Can Help With the Analysis Phase
High Performance Computing is another core ARCC service and we offer an assortment of support for this type of work. Along with the MedicineBow HPC system, we provide documentation, troubleshooting consultations, software management, and workshops among the system administration of the system. Additionally, we provide facilitation of and technical support for NCAR Wyoming Supercomputing Center’s Derecho system.
If neither of these systems meets your needs for the analysis phase and you still require assistance, please reach out to us via our service portal to discuss want your requirements are and potential options.
The Publishing Phase
How ARCC Can Help With the Publishing Phase
Next Steps
Link to Previous sub-module or Home Module |
Align left link to next sub-module or home |