Organizing data can help make research efforts more efficient and logically separate. Many research programs are composed of multiple projects, investigators, and organizations working collaboratively to collect, share, analyze, or disseminate scientific results. Projects are analogous to a file-storage directory but are more flexible and can hold their own metadata. Within a Program, most projects contain a collection of files that can all be described with similar data collection methods, and which typically come from the same funded effort. Smaller, focused projects may contain only a single dataset, while larger projects that collect or produce multiple types of data may contain several datasets. Identifying the specific dataset(s) that will be produced by a project is a central aim of project data management planning, and is necessary for planning an organizational structure within a project. Projects can also be used to share information that doesn’t require metadata, such as administration or outreach materials.
Project Naming
Project titles should be as concise as possible while still containing key information about the dataset. The title is often the most important piece of metadata describing a resource. It is the first thing seen by people when browsing or searching for a resource, and may be the only information used to evaluate the content of the resource.
At a minimum, project titles should contain the following information:
Location
Data type
Year (or other time unit) range
Program or institution name, if your dataset is part of a large effort
Naming Examples
Poor project titles:
Data Management
Workshop for Mike
Better project titles:
Data Management Workshop, University of Wyoming ARCC, Fall 2024
Conductivity, temperature and depth data for 12 northwestern Gulf of Mexico locations, May to July 2012
SAFARI 2000 Upper Water Column Profiles, Gulf of Alaska, 2011-2012
Project Naming on ARCC Systems
While the project names above are very descriptive and something to record in a README file, ARCC systems have restrictions on how the type of characters and how many can be used in a project name. This is due to how permissions work on the system and very long project names with spaces or other special characters can cause problems with the administration of the system.
The recommended limitation for project names on ARCC are to use acronyms when possible or shorten words in a logical way. The restrictions are as follows:
Lowercase letters and numbers only
Hyphens are allowed, but no other special characters such as underscores
No longer than sixteen (16) characters
For Example, if this tutorial were to be a project on ARCC systems, we would take the long title of “Data Management Workshop, University of Wyoming ARCC, Fall 2024” and change it to read as one of these suggestions:
data-mgmt-arcc24
arcc-datmgt-uw24
Organizing Folders within a Project
Folders are an important way to organize your project files into smaller, easier-to-manage, and identifiable units. Create a logical folder structure to help you stay organized and easily find and retrieve your stored files, and initiate it at the beginning of your project to save time and frustration.
Avoid complex, deeply-hierarchical folder structures, which require extra browsing for file storage and retrieval. Try to keep the folder levels to no more than three deep. Folder structures can be simplified by including all the essential information concisely in the file name.
The following best practices are recommended for creating an effective project folder structure:
Organize folders by major project components.
Create a hierarchical system with nested subfolders (high-level folders for broad topics with more specific folders within). Examples of high-level folder topics include:
Input data files by discreet location/source/type
Metadata
Code or scripts
Results or output data
Organize the data by data type and then by research activity.
Separate preliminary and final data into different folder structures.
Be consistent with your folder organization throughout the life of your project and/or Research Campaign.
Folder Organization Example
On ARCC systems, folder/directory names do not have to comply with the project name restrictions, but it helpful to keep them as short as possible while being descriptive, without using too many special characters and no spaces.
Level of Granularity
It may be unrealistic to anticipate and pre-create every folder that will be needed for a project. Instead, consider the level of folder hierarchy that will provide sufficient structure for users and collaborators on your project to create their own subfolders.
A good approach is to establish the first one or two levels in the hierarchy, then let your collaborators create subfolders for lower levels as needed.
Granularity Examples
Project: Sea Monkey Forage Study
Parent folder: Prey Data
Child folder: 2017
Users can create subfolders within as needed
Child folder: 2018
Users can create subfolders within as needed
Folder Naming
How you name folders will have an impact on you and your collaborator’s ability to find and understand the folder contents. Naming folders consistently and descriptively will help users identify records at a glance, and will help to facilitate the storage and retrieval of data.
Folder names should adhere to the following best practices:
Rename default folder names generated by the Research Workspace with descriptive titles.
Name folders according to the areas of work to which they relate, and not after individuals. Classify file types with broad folder names.
Use folder names that are unambiguous and meaningfully describe the folder contents to you and your collaborators.
Be consistent when developing a naming scheme. Ideally, a scheme is created at the start of a project and used consistently throughout.
Avoid extra long folder names, but use information-rich file names instead (refer to File Naming).
Try to avoid duplicate folder names or paths. For example, if a folder is named “Photos” in one directory, don’t create a subfolders elsewhere named “Images”.
Examples of folder names
Poor folder names:
My Data
My Folder
Better folder names:
Processed herring acoustic summaries, 2012-2016
Raw herring acoustic data, 2012-2016
Next Steps
Link to Previous sub-module or Home Module | Align left link to next sub-module or home |