Skip to Main Content
Banner image with CORE Library logo

How to Evaluate Screening Tools: A Guide to Reliability, Validity, and Writing About Evidence

Practical Considerations When Determining Reliability and Validity

You don't have to be a statistical analyst to be able to determine the reliability and validity of a screening instrument.  Others will do that for you.  You simply need to be able to find studies that show their work and to know what to look for in those studies.

Here are some practical considerations when evaluating healthcare screenings and assessments:

  1. Ensure the sample size is adequate for statistical analysis.
  2. Consider the practical aspects of administration, such as time required and ease of use.
  3. Assess whether the tool is appropriate for the target population, considering factors like language and cultural relevance.
  4. Remember that reliability and validity exist on a spectrum; no measure is perfectly reliable or valid.

Search Tips

Search for the tool's validation studies.  Use library databases like HaPI (Health and Psychosocial Instruments), PsycInfo, Medline, CINAHL, and be sure to also check PubMed.

When you search, use the following:

  • The name of the tool

  • Plus keywords like "reliability," "validity," "psychometrics," or "validation study"

    • Edinburgh Postnatal Depression Scale AND validity AND postpartum women

    • GAD-7 AND psychometric properties AND primary care

Searches tend to work better if you use the fields in the advanced search box and place each term in its own field with Boolean operators inbetween.

Use fields in the Basic or Advanced Search to separate terms and add boolean operators

What to Look for in the Studies

When reading articles about healthcare screenings or assessments, students should look for the following key elements to identify information on reliability and validity:

  1. Read the abstract first.  This might briefly state reliability coefficients or validity outcomes, giving a quick indication of the tool's quality. It helps students quickly determine if the paper contains the type of reliability or validity information they're seeking.

  2. Methods section: This often contains details about how reliability and validity were assessed.  Sometimes there will even be a subsection specifically about reliability and validity.

  3. Statistical measures: Look for terms like Cronbach's alpha, intraclass correlation coefficient (ICC), or correlation coefficient (r).

  4. Test-retest reliability: Check if the study mentions administering the test multiple times to the same group.

  5. Interrater reliability: Look for information about multiple raters assessing the same subjects.

  6. Internal consistency: See if the article discusses how well different parts of a test measure the same construct.

  7. Construct validity: Check if the study compares results with other established measures or theories.

  8. Content validity: Look for mentions of expert reviews or comprehensive coverage of the topic.

  9. Criterion validity: See if the test is compared to a "gold standard" or established measure.

  10. Sample size: Note the number of participants, as larger samples generally provide more reliable results.

  11. Limitations: Authors often discuss potential issues with reliability or validity in the limitations section.

  12. Tables or figures: These may summarize reliability and validity data.

  13. Keywords: Watch for terms like "psychometric properties," "instrument development," or "scale validation."

How to Write About Reliability and Validity

While the assignment will have specific instructions, in general, any discussion of a tool's reliability and validity in the context of a bigger assignment can be handled in a paragraph or two.   

Example:  You are preparing a clinical pathway and part of this task is to select a screening tool and discuss why you chose it.  

Example 2:  You are proposing a group medical appointment for a chronic condition and as part of the qualifications for inclusion, participants must have undergone screening using your chosen tool.  You must justify the choice of tool for this group and condition.

In assignments like these, discussing reliability and validity is done to justify your choice of screening tool.  This is different than an entire paper on the reliability and validity of a specific tool.  However, in a paper on a single tool, you might be asked to present some final conclusions or a summary of the reliability and validity, which is pretty much the same thing as writing a paragraph or two to justify a choice in a larger assignment.


Here's a formula you can use to write a summary of the reliability and validity of a screening instrument:

1.  Start with a statement that makes an overall judgment about the reliability and validity of the tool.

2.  Support that judgment with one (1) source that presents evidence of reliability.

3.  Cite a second source that demonstrates validity.  

4.  If your assignment requires more than 2 sources, continue citing other sources and evidence as needed.

5.  Explicitly connect the tool to the population, condition, and care setting you intend to use it in.

Example 1:

The GAD-7 is a reliable and valid screening instrument for identifying generalized anxiety in adult women in primary care. According to Spitzer et al. (2006), it demonstrates excellent internal consistency (Cronbach’s α = .92) in medical populations. In a separate study, Löwe et al. (2008) provided evidence of construct validity, finding that GAD-7 scores strongly correlated with the Beck Anxiety Inventory and clinician-rated anxiety diagnoses. These findings support the use of the GAD-7 in integrated primary care settings for efficient and accurate screening.

Example 2:

The CRAFFT is a reliable and valid screening tool for identifying substance use risk among adolescent girls in school-based health centers. According to Knight et al. (2002), the CRAFFT shows strong test-retest reliability, with consistent results over a 2-week period. In another study, Dhalla et al. (2011) found predictive validity for future substance-related problems, as higher CRAFFT scores were associated with adverse health outcomes six months later. These findings support its use in adolescent health programs integrated with behavioral health services.

 

Note:  These examples were generated by Chat GPT and thus may reference inaccurate or false information for the purpose of the example.