Skip to Main Content Banner Image

Data Management: Data Management Basics

Basics

Managing research data is important to ensure your research is clear, reproducible, and the data can be shared with other researchers outside your institution. This is a different set of skills from data collection and data analysis, which are both very important. Data management involves the careful planning of data structure, naming, storage, sharing, and archiving. 

Some publishers also require you to provide a Data Management Plan, which is essentially information on where and how you will make your data available in a repository or upon request.

Why is Data Management Important?

Source: Data Sharing and Management Snafu in 3 Short Acts by Karen Hanson, Alisa Surkis & Karen Yacobucci, New York University Health Sciences Libraries under a CC-BY license

Data Management - More Information

Research Data Management - CalTech 

Data Management for Research - Carnegie Mellon Univ 

Research Data Management - USC Library

Data Management General Guidance - DMPTool 

Data Security

Viewing and Securing Confidential Information

Because we deal with patient information, HonorHealth researchers are responsible for knowing and following HonorHealth’s Information Security and Patient Privacy policies. Refer to the Information Security and Patient Privacy Areas in the Policy Library to familiarize yourself with current policies.

Recommended Policy Reviews include:

Common Mistakes

Common Mistakes in Data Management and Sharing

The short video "Data Sharing and Management Snafu in 3 short acts" shows a few ways researchers can make mistakes when managing and sharing data. What seemed like a logical order and plan tot he researcher did not translate well when someone else wanted to use the data.

Here are some common mistakes that can occur when working with research data:

  • Forgetting to create backups of data - make sure your data is stored in more than one secured location.
  • Not correctly naming or versioning files - data can be over-written if file names are not accurately and systematically organized, you could also be working with an older version of data and lose time or data.
  • Not documenting the data - use metadata or at least a readme.txt file to provide basic information about your data so that it will be identifiable, including contact information, dates, project, other information necessary to use the information.
  • Not creating a data dictionary for data tables - while naming of data fields may seem logical to you at the time, that information can be easily confused or lost over time, or if co-authors leave the institution, data dictionaries ensure the data will be identifiable regardless of who uses it
  • Using propriety software or not identifying required software - some data requires special software to be readable. Not identifying the software, using outdated, proprietary or homegrown software can make the data usable in the future.
  • Collecting compound data - if at all possible, use the most basic data available for the raw data, then use that to determine other information needed, such as recording height and weight, versus recording BMI. This will make the data flexible for future analysis