Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Banner Image

Data Management: Data Needs

Data Needs

Data Needs - It is important to understand your data needs before beginning a project, in order to map out storage, naming and sharing needs.

  • Define your data needs before you begin your project.
    • Is there existing data which could be integrated into the project?
    • What will you be using your data to determine?
    • What specific data should you collect? Collecting component data (height and weight) allows you to do additional data analysis versus collecting BMI, which is limited in usage.
  • Determine what type of data you will be collecting and what formats and file types you will be using.
    • Depending on the type of data and recording method can guide what tools and formats you can use. Raw data can be in many formats.
    • What analysis will you be conducting? Do you need statistical software? Do you need the data is a specific format?
    • Try to use non-proprietary formats
  • Develop a plan for documenting and storing your data.
    • Will your data change frequently? What level of versioning do you need to use?
    • How long will it need to be stored?
    • What is the audience for your data?
    • What form of metadata and documentation will you use to identify the data?
  • Determine if and how you will share your data.
    • What type of data is it? This can determine what repositories you can use.
    • Is the data proprietary? Is it confidential or protected by HIPPA?
    • Will you be publishing? Many publishers require you to share you data either in a public repository or make it available upon request.

More information from CalTech "Evaluating Data Needs" guide.

Data Types

Types of Data

  • Text: field or laboratory notes, survey responses
  • Numeric: tables, counts, measurements
  • Audiovisual: images, sound recordings, video
  • Models, computer code
  • Discipline-specific: FITS in astronomy, CIF in chemistry
  • Instrument-specific: equipment outputs

 

File Formats

File formats

Formats likely to be accessible in the future are:

  • Non-proprietary
  • Open, with documented standards
  • In common usage by the research community
  • Using standard character encodings (i.e., ASCII, UTF-8)
  • Uncompressed (space permitting)

Examples of preferred format choices:

  • Image: JPEG, JPG-2000, PNG, TIFF
  • Text: plain text (TXT), HTML, XML, PDF/A
  • Audio: AIFF, WAVE
  • Containers: TAR, GZIP, ZIP
  • Databases: prefer XML or CSV to native binary formats

Data Sources

Determining what data to use is part of the data design. But it is important to stress again when considering Data Management that the type and source data will have a major impact on the quality of the research output. 

Select data that will fit the purpose of your research. Consider whether you are working on:

  • Quality Improvement or Experimental Research,
  • looking at local data or national data,
  • making program/unit/departmental changes, or trying to prove a hypothesis and generalize to the larger population.

Use sources of data that will provide the appropriate level of evidence:

  • surveys (qualitative or quantitative data)
  • patient data (EPIC, Population Builder, etc)
  • national data (from CDC, census data, examples/links here)
  • other studies (for systematic reviews)
  • bioinformatic data (NCBI, genomic, virus data, etc.) 
  • Find more statistical sources in the Statics Guide

EPIC vs Population Builder

When should you use Population Builder vs EPIC

Population Builder EPIC

Any Enterprise Data Warehouse (EDW) Data Table (Clinical, Financial, HR, Supply Chain, etc.)

Epic Data Models Only

Has some capability to see broader summary of chart (good for nonepic users, quality improvement, mgmt, etc.)

Can directly access the chart for documentation, chart review, etc. (better for providers and primary epic users)

Limited graphics – can be linked to visuals in leading wisely

Better capability for graphics (line charts, bar charts, etc.)

Can join two tables of data together

Cannot join two data models together