Skip to main content

Research Data Management

This guide contains information about research data management and best practices for faculty, researchers and graduate students

Types of Research Data

Examples of Research Data include:

  • Documents (text, Word), spreadsheets, print outs
  • Laboratory notebooks, field notebooks, diaries
  • Questionnaires, transcripts, codebooks
  • Audio, video
  • Photographs, films, x-rays, negatives,
  • Protein or genetic sequences
  • Spectra, spectroscope data
  • Test responses
  • Slides, artifacts, specimens, samples
  • Collection of digital objects acquired and generated during the process of research
  • Database contents (video, audio, text, images)
  • Models, algorithms, scripts, code, software
  • Contents of an application (input, output, logfiles for analysis software, simulation software, schemas)
  • Methodologies and workflows
  • Standard operating procedures and protocols
  • Computers and computer data storage devices
  • Synthetic compounds
  • Organisms, cell lines, viruses, cell products
  • Cloned coordinates, plants animals

Metadata Standards

Metadata (data about data) standards help to describe data in a consistent manner. Metadata can include descriptive information, provenance, quality and access/use of data.  Here are a few standards that may be useful in describing your data for access and preservation.

Data Dictionaries

USGS defines a Data Dictionary as a repository of structured data names that define and describe a resource.

See Best Practices for Data Dictionary Definitions and Usage by Northwest Environmental Data Network

Source: USGS Data Dictionaries and Thesauri

How to Document Your Data

Documenting your data includes capturing sufficient metadata (descriptive information) about your data in order to make it discoverable, identifiable and usable in the future.  Information you capture should include some, if not all, of the following elements:

Title of the dataset or research project
Creator names of individuals or institutions responsible for creating the data
Unique Identifier that helps distinguish the data used to identify the data
Dates: Project start and end dates, release date, any other date of importance during the length of the research study
Subject: Keywords or phrases describing the subject or content of the data
Funding Agency responsible for funding the research
Intellectual Property Rights associate with the data
Language(s) in which data is generated
Sources for data derived from other sources
Geographical location or coverage where data was collected
Methodology for data collection
Version of the dataset if updated

Using sustainable metadata standards is highly recommended though to ensure that data are accessible in the future. Such standards are open (not proprietary), used widely, uncompressed, use standard encoding and contain enough information to analyze the context, content and structure of record.
 

Metadata schema sources

CalTech Library's File Naming Convention Worksheet
This worksheet helps researchers to build their own work file names

Reproducibility of Data

When searching for data, whether locally on one's machine or in external repositories, one may use a variety of search terms. In addition, data are often housed in databases or clearinghouses where a query is required in order access data. In order to reproduce the search results and obtain similar, if not the same results, it is necessary to document which terms and queries were used.

  • Note the location of the originating data set
  • Document which search terms were used
  • Document any additional parameters that were used, such as any controls that were used (pull-down boxes, radio buttons, text entry forms)
  • Document the query term that was used, where possible
  • Note the database version and/or date, so you can any limit newly-added data sets since the query was last performed
  • Note the name of the website and URL, if applicable
Description Rationale

In order to reproduce a data set or result set, it is necessary to document which terms were originally used to capture that data. By documenting this information while the search is being conducted, one greatly enhances the chance of being able to reproduce the results at a later date.

Source: DataONE

Data Storage and Preservation

Storage

Storing data reliably is an important function of data management. There are several options to store your data files -

  • Personal computers, external hard drives, departmental or university servers
  • ‚ÄčOther cloud storage services that may suit your data storage/backup needs include Amazon S3Elephant DriveJungle DiskMozyCarbonite
  • CDs or DVDs are not recommended because they fail frequently.

Security

  • Unencrypted security is ideal for storing your data so that you and others can easily read it, but if encryption is required because of sensitive data:
    • Keep passwords and keys on paper (2 copies) and in a PGP (pretty good privacy) encrypted digital file.
    • Don’t rely on 3rd party encryption alone.
  • Uncompressed is also ideal for storage, but if you need to do so to conserve space limit compression to your 3rd backup copy.

To make sure your backup system is working properly, test your system periodically. Try to retrieve data files and make sure you can read them.

The UK Data Archive provides additional guidelines on data storage, back-up, and security.