Clinical scenario: Elderly woman with possible iron deficiency anaemia.

Are the results of this study valid?

Returning to our clinical scenario from the question formulation tutorial:

You admit a 75 year old woman with community-acquired pneumonia. She responds nicely to appropriate antibiotics but her hemoglobin remains at 100 g/l with an MCV of 80. Her peripheral blood smear shows hypochromia, she is otherwise well and is on no incriminating medications. You contact her family physician and find out that her Hgb was 105 g/l 6 months ago. She has never been investigated for anaemia. A ferritin has been ordered and comes back at 40 mmol/l. You admit to yourself that you’re unsure how to interpret a ferritin result and don’t know how precise and accurate it is.

In the tutorial on clinical questions we formulated the following question: In an elderly woman with hypochromic, microcytic anaemia, can a low ferritin diagnose iron deficiency anaemia?

Our search of the literature to answer this question retrieved an article from the Am J of Medicine (1990;88:205-9).

How do we critically appraise this diagnosis paper? We’ll start off by considering validity first and the following list outlines the questions that we need to consider when deciding if a diagnosis paper is valid.

  1. Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis?

    In considering this question, we need to determine whether all patients in the study underwent both the diagnostic test under evaluation (in our scenario, the serum ferritin) and the reference standard (in our scenario, bone marrow biopsy) to show that they definitely do or do not have the target disorder. We should also ensure that those investigators who are applying and interpreting the reference standard do not know the results from the diagnostic test.

    We also need to consider if the reference standard is appropriate. Sometimes a reference standard may not be clear cut, (such as in the diagnosis of delirium) and in this case, we’d need to review the rationale for the choice of reference standard as outlined by the study authors.

    All patients in the study we found underwent serum ferritin testing and bone marrow biopsy.

  2. Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)?

    The study should include both patients with common presentations of the target disorder and those with conditions that are commonly confused with the target disorder of interest. If the study only includes patients with severe symptoms of the target disorder (and who would be very obvious to diagnose) it is not likely to be useful to us. We need to find out if patients with varying severity of the disease were included in the study and also whether it includes patients with target disorders that are often confused with this one. For example, anaemic patients can be symptomatic or asymptomatic and the anaemia can result from a number of causes – we would want to ensure that the study we retrieved included patients with a variety of presentations and symptoms.

    Reviewing the ferritin study, it included consecutive patients over the age of 65 who were admitted with anaemia to a university-affiliated hospital in Canada. It excluded patients from institutions and patients who were too ill or who had severe dementia. No details are provided on the definitions used for ‘too ill’ or ‘severe dementia’.

  3. Was the reference standard applied regardless of the diagnostic test result?

    We need to check to see that even if a patient’s serum ferritin was normal, the study investigators performed the reference standard. Sometimes if the reference standard is invasive, it may be considered unethical to perform it on patients with a negative test result. For example, if a patient with chest pain is suspected to be at low risk of a pulmonary embolism and has a negative V/Q scan, an investigator (who is performing a study looking at the accuracy of the V/Q scan in diagnosing pulmonary embolism) may not want to subject the patient to pulmonary angiography which is not without morbidity and mortality. Indeed, this was what the investigators did in the PIOPED study – if patients were considered to be at a low risk of a pulmonary embolism and had a negative V/Q scan, rather than undergoing a pulmonary angiogram, they were followed up clinically for several months, without receiving antithrombotic therapy to see if an event occurred.

    In the ferritin study, all patients received both the diagnostic test and the reference standard.

  4. Was the test (or cluster of tests) validated in a second, independent group of patients?

    The tests should be assessed in an independent ‘test’ set of patients. This question is important in studies looking at multiple diagnostic elements.

If the study fails any of the above criteria, we need to consider if the flaw is significant and threatens the validity of the study. If this is the case, we’ll need to look for another study. Returning to our clinical scenario, the paper we found satisfies all of the above criteria and we will proceed to assessing it for importance.

Are the results of this study important?

Let’s begin by drawing a 2×2 table, using the results from the study that we identified:

Target Disorder (iron deficiency anaemia) Totals
Present Absent
Diagnostic test result (serum ferritin) $$text{Test Positive (≤ 45 mmol/l)}$$
$$a + b$$
$$text{Test Negative (>45 mmol/l)}$$
$$c + d$$
$$a + c$$
$$b + d$$
$$a + b + c + d$$

Our patient’s serum ferritin comes back at 40 mmol/l and looking at the Table, we can see that she fits in somewhere in the top row (either cell ‘a’ or cell ‘b’). From the Table we can also see that 82% (70/85) of people who have iron deficiency anaemia have a serum ferritin in the same range as our patient – this is called the sensitivity of a test. And, 10% (15/150) of people without this diagnosis have a serum ferritin in the same range as our patient – this is the complement of the specificity (1-specificity). The specificity is the proportion of people without iron deficiency anemia who have a negative or normal test result. We’re interested in how likely a serum ferritin of 40 mmol/l is in a patient with iron deficiency anaemia as compared to someone without this target disorder. Our patient’s serum ferritin is 8 (82%/10%) times as likely to occur in a patient with iron deficiency than in someone without iron deficiency anaemia – this is called the likelihood ratio for a positive test. We can now use this likelihood ratio to calculate our patient’s posttest probability of having iron deficiency anaemia. posttest odds/(posttest odds + 1)

Our patient’s posttest probability of having iron deficiency anaemia is obtained by calculating:

$$ text{posttest odds } / (text{ posttest odds} + 1) $$


$$ text{posttest odds } = text{ pretest odds} times text{likelihood ratio}$$

The pretest odds are calculated as pretest probability/1-pretest probability. We judge our patient’s pretest probability of having iron deficiency anaemia as being similar to that of the patients in this study (a+c/a+b+c+d = 85/235 = 36%)

text{pretest odds}&=(0.36/(1-0.36)\
&= 0.56

Using this we can calculate

text{posttest odds}&=0.56 times 8\
&= 4.5

And, finally,

text{posttest probability}&=4.5/5.5\
&= 82%

For an even easier method to determine the posttest probability, try the stats calculator and graph the posttest probability.

With this information, we can conclude that based on our patient’s serum ferritin, it is very likely that she has iron deficiency anaemia (post test probability > 80%) and that our posttest probability is sufficiently high that we would want to work our patient up for causes of this target disorder.

Instead of doing all of the above calculations, we could simply use the likelihood ratio nomogram. Considering that our patient’s pretest probability of iron deficiency anaemia was 36%, and that the likelihood ratio for a serum ferritin of 40 mmol/l was 8, we can see that her posttest probability of iron deficiency anaemia is just over 80%.

Multilevel tests

In the paper we found, the serum ferritin results are divided into 3 levels: =45 mmol/l, 46-100 mmol/l and >100 mmol/l. We can see that more information about the diagnostic test is available when results are presented in multilevels:

Diagnostic test result Target Disorder (iron deficiency anaemia) Likelihood ratio
Present Absent
≤ 45 mmol/l 70/85 15/150 8
> 45 ≤ 100 mmol/l 7/85 27/150 0.4
> 100 mmol/l 8/85 108/150 0.1

If our patient’s serum ferritin was 110 mmol/l (and using her pretest probability of 36% and the likelihood ratio of 0.1), her posttest probability of iron deficiency anaemia would be less than 3%, virtually ruling out the possibility of this diagnosis. However, if her serum ferritin came back at 65, her posttest probability would be 10% and we’d have to decide if this was sufficiently low to stop testing or if we needed to do further investigations.

Where to go from here?

Now that we’ve decided our article is both valid and important, we need to decide if we can apply it to our patient.

Other options:

  • Do you want to consider the validity of a diagnostic test?
  • Do you want to see a ‘CAT’ for this paper?
    • CAT for Gastroenterology and Hepatology
    • CAT for Geriatric Medicine
  • Do you want to learn about critically appraising:
    • Prognosis articles
    • Therapy articles – single trials
    • Systematic reviews of therapy articles
    • Harm articles
  • Do you want some practice critically appraising diagnosis articles from other clinical specialties?
  • Do you want more reading about critically appraising diagnosis articles?

Diagnosis articles from other clinical specialties

Child Health

In young infants with projectile vomiting and no palpable pyloric tumour, what is the probability of pyloric stenosis with a negative or a positive ultrasound of the pylorus?

Neilson D, Hollman AS. The ultrasonic diagnosis of infantile hypertrophic pyloric stenosis: technique and accuracy. Clinical Radiology 1994;49:246-7.

Critical Care Medicine

In mechanically ventilated patients, can the respiratory rate to tidal volume ratio be used to predict successful extubation?

Yang KL, Tobin MJ. A prospective study of indexes predicting the outcome of trials of weaning from mechanical ventilation. N Engl J Med 1991; 324:1445-50.

EBM in Developing Countries

In a patient with suspected typhoid fever, what is the accuracy of the Typhidot test for making the diagnosis?

Phil J of Med – Infectious Dis 1997; 26:61-63.

Gastroenterology and Hepatology

In a patient with anaemia can a low serum ferritin be used to diagnose iron deficiency?

Guyatt GH, Patterson CH, Ali M. Diagnosis of iron-deficiency anemia in the elderly. Am J Med 1990; 88:205-9.

General Practice

What is the accuracy of microtympanometry for the diagnosis of hearing loss from a middle ear effusion in young children?

Holty I, Forster DP. Evaluation of pure tone audiometry and impedance screening in infant schoolchildren. J Epidemiol Community Health 1992; 46: 21-25.

General Surgery

In a patient with a history of breast cancer and breast conserving surgery what is the best method for detecting local recurrence?

Drew PJ et al. Routine screening for local recurrence following breast-conserving therapy for cancer with dynamic contrast-enhanced magnetic resonance imaging of the breast. Ann Surg Onc 1998;5:265-70.

Geriatric Medicine

In an elderly woman with hypochromic, microcytic anaemia, can a low ferritin diagnose iron deficiency anaemia?

Guyatt GH, Patterson C, Ali M, Singer J, Levine M, Turpie I, Meyer R Diagnosis of iron-deficiency anemia in the elderly. Am J Med 1990; 88: 205-9

Mental Health

In female college students, are there any screening tests that reliably detect those students that have eating disorders?

Mintz et al. Questionnaire for eating disorder: reliability and validity of operationalising DSM-IV criteria into a self-report format. Journal of Counseling Psychology 44:63-79.

Neonatal Medicine

In preterm infants, what is the accuracy of the clinical examination as a diagnostic test for patent ductus arteriosus (PDA) with left to right shunting?

Davis P., Turner-Gomes S., Cunningham K., et al.
Precision and Accuracy of Clinical and Radiological Signs in Premature Infants at Risk of Patent Ductus Arteriosus. Arch Pediatr Adolesc Med. 1995; 149: 1136-1141.


In patients with suspected depression what is the accuracy of a 2-question case-finding instrument for depression compared with 6 previously validated instruments?

Whooley MA, Avins, AL, Miranda J, Browner WS.
Case-finding instruments for depression: Two questions are as good as many. J. Gen Intern Med 1997;12:439-45.

Physiotherapy Practice

Is your clinical diagnosis of internal derangement correct when presented with a patient with the following clinical [history of locking, preauricular pain, reproducible and reciprocal click, reduced mouth opening (35 mm), local muscle tenderness, reduced joint play] and tomography findings (diminished anterior translation of both condyles, no osseous changes)?

Schiffman EL, Anderson GC, Fricton JR, Burton K, Schellhas KP. Diagnostic criteria for intrarticular temporomandibular disorders. Community Dent Oral Epidemiol 1989; 17:252-257.


In patients with a skin lesion, how accurate are general physicians at diagnosing skin cancer?

Whited JD, Hall RP, Simel DL, Horner RD. Primary care clinicians’ performance for detecting actinic keratoses and skin cancer. Arch Intern Med. 1997;157:985-90.

Further reading on diagnosis