Organizing Data Logically

Organizing data can help make research efforts more efficient and logically separate. Many research programs are composed of multiple projects, investigators, and organizations working collaboratively to collect, share, analyze, or disseminate scientific results. Projects are analogous to a file-storage directory but are more flexible and can hold their own metadata. Within a Program, most projects contain a collection of files that can all be described with similar data collection methods, and which typically come from the same funded effort. Smaller, focused projects may contain only a single dataset, while larger projects that collect or produce multiple types of data may contain several datasets. Identifying the specific dataset(s) that will be produced by a project is a central aim of project data management planning, and is necessary for planning an organizational structure within a project. Projects can also be used to share information that doesn’t require metadata, such as administration or outreach materials.



Project Naming

Project titles should be as concise as possible while still containing key information about the dataset. The title is often the most important piece of metadata describing a resource. It is the first thing seen by people when browsing or searching for a resource, and may be the only information used to evaluate the content of the resource.

At a minimum, project titles should contain the following information:

  • Location

  • Data type

  • Year (or other time unit) range

  • Program or institution name, if your dataset is part of a large effort

Naming Examples

Poor project titles:

  • Data Management

  • Workshop for Mike

Better project titles:

  • Data Management Workshop, University of Wyoming ARCC, Fall 2024

  • Conductivity, temperature and depth data for 12 northwestern Gulf of Mexico locations, May to July 2012

  • SAFARI 2000 Upper Water Column Profiles, Gulf of Alaska, 2011-2012


Project Naming on ARCC Systems

While the project names above are very descriptive and something to record in a README file, ARCC systems have restrictions on how the type of characters and how many can be used in a project name. This is due to how permissions work on the system and very long project names with spaces or other special characters can cause problems with the administration of the system.

The recommended limitation for project names on ARCC are to use acronyms when possible or shorten words in a logical way. The restrictions are as follows:

  • Lowercase letters and numbers only

  • Hyphens are allowed, but no other special characters such as underscores

  • No longer than sixteen (16) characters

For Example, if this tutorial were to be a project on ARCC systems, we would take the long title of “Data Management Workshop, University of Wyoming ARCC, Fall 2024” and change it to read as one of these suggestions:

  • data-mgmt-arcc24

  • arcc-datmgt-uw24


Organizing Folders within a Project

Folders are an important way to organize your project files into smaller, easier-to-manage, and identifiable units. Create a logical folder structure to help you stay organized and easily find and retrieve your stored files, and initiate it at the beginning of your project to save time and frustration.

Avoid complex, deeply-hierarchical folder structures, which require extra browsing for file storage and retrieval. Try to keep the folder levels to no more than three deep. Folder structures can be simplified by including all the essential information concisely in the file name.

With the above stated, there are situations where the technical issues can arise if there are too many files within a single directory. Hundreds of files in a directory is usually fine, multiple thousands of files in a directory could become problematic.

The following best practices are recommended for creating an effective project folder structure:

  • Organize folders by major project components.

  • Create a hierarchical system with nested subfolders (high-level folders for broad topics with more specific folders within). Examples of high-level folder topics include:

    • Input data files by discreet location/source/type

    • Metadata

    • Code or scripts

    • Results or output data

  • Organize the data by data type and then by research activity.

  • Separate preliminary and final data into different folder structures.

  • Be consistent with your folder organization throughout the life of your project and/or Research Campaign.


Folder Organization Example

On ARCC systems, folder/directory names do not have to comply with the project name restrictions, but it helpful to keep them as short as possible while being descriptive, without using too many special characters and no spaces.

image-20240723-130523.png
image-20240723-131050.png

Level of Granularity

It may be unrealistic to anticipate and pre-create every folder that will be needed for a project. Instead, consider the level of folder hierarchy that will provide sufficient structure for users and collaborators on your project to create their own subfolders.

A good approach is to establish the first one or two levels in the hierarchy, then let your collaborators create subfolders for lower levels as needed.

Granularity Examples

  • Project: data-mgmt-arcc24

    • Parent folder: ResearchData_Location1_05012024

      • Child folder: images

        • Users can create subfolders within as needed

      • Child folder: samples

        • Users can create subfolders within as needed


Folder Naming

How you name folders will have an impact on you and your collaborator’s ability to find and understand the folder contents. Naming folders consistently and descriptively will help users identify records at a glance, and will help to facilitate the storage and retrieval of data.

Folder names should adhere to the following best practices:

  • Rename default folder names generated by the Research Workspace with descriptive titles.

  • Name folders according to the areas of work to which they relate, and not after individuals. Classify file types with broad folder names.

  • Use folder names that are unambiguous and meaningfully describe the folder contents to you and your collaborators.

  • Be consistent when developing a naming scheme. Ideally, a scheme is created at the start of a project and used consistently throughout.

  • Avoid extra long folder names, but use information-rich file names instead (refer to File Naming).

  • Try to avoid duplicate folder names or paths. For example, if a folder is named “Photos” in one directory, don’t create a subfolders elsewhere named “Images”.

Examples of folder names

Poor folder names:

  • My Data

  • Data From Ben

Better folder names:

  • GPS-locations-sagebrush-study-2021

  • Raw-songbird-acoustic-data2012-2016


Next Steps