Understanding Numbers
in Clinical Research
A visual, plain-English journey through every statistical concept you need to read, interpret, and critically appraise clinical studies — from the very first principles.
The Core Problem: Chance
Imagine you give 10 patients a new drug and 7 improve. Has the drug worked, or did 7 patients simply get better on their own? You cannot know just by looking at the number.
Clinical research almost always involves samples — small groups chosen to represent a much larger population. Any measurement from a sample contains some degree of natural randomness, which we call sampling variability or chance error.
Statistics gives us a rigorous, objective way to ask: "How likely is it that what I observed happened purely by chance?" and "How confident can I be that my result reflects the true effect in the real population?"
Flipping a Coin
If you flip a fair coin 10 times and get 7 heads, you might wonder if the coin is biased. But 7 heads in 10 flips can happen by pure chance. Statistics tells you exactly how often that would happen with a fair coin (about 12% of the time), so you can decide whether to be suspicious.
Clinical statistics works the same way — it quantifies how surprising (or unsurprising) an observed result is.
Three Concepts That Underlie Everything Else
Population vs. Sample
The population is every person who has or could have the disease. The sample is the finite group enrolled in a study. We use the sample to make inferences about the population — always with some uncertainty.
The Null Hypothesis (H₀)
Every study starts with a "boring" assumption: "There is no difference between the two treatments." Statistics then measures how strongly the data argue against this assumption. We never prove H₀ true — we only reject it or fail to reject it.
Alpha (α) — The Threshold
Before running a study, researchers set a threshold (usually α = 0.05) for how much chance they are willing to tolerate. If the probability of seeing a result this extreme by chance is less than 5%, they reject the null hypothesis and declare the result "statistically significant."
Type I Error (α)
Concluding there IS a difference when there isn't one. A "false positive." The risk equals your chosen α (5%). Think: convicting an innocent person.
Type II Error (β)
Concluding there IS NO difference when there actually is one. A "false negative." Related to statistical power. Think: acquitting a guilty person.
Statistical Power (1−β)
The probability of correctly detecting a real effect if one truly exists. Well-designed studies aim for ≥80% power. Low power = high chance of missing a true benefit.
The Exact Definition (Read This Slowly)
"The p-value is the probability of obtaining results at least as extreme as those observed, assuming the null hypothesis is true."
In plain language: if there were truly no difference between the two treatments, how often would I see a difference this large (or larger) just by random chance? A small p-value means the data are very surprising under the assumption of "no effect," which leads us to reject that assumption.
What the P-Value IS
- A measure of how incompatible the data are with the null hypothesis
- A probability ranging from 0 to 1 (often expressed as a percentage)
- A tool to guard against being fooled by chance
- One piece of evidence — to be interpreted alongside effect size
What the P-Value is NOT
- ❌ The probability that the null hypothesis is true
- ❌ The probability that your result happened by chance
- ❌ A measure of the size or importance of the effect
- ❌ Proof that the treatment works in the real world
How to Interpret Common P-Value Ranges
| P-Value | What It Means | Decision | Caution |
|---|---|---|---|
| p < 0.001 | Less than 0.1% chance if H₀ is true | Strongly significant | Still needs clinical context |
| p = 0.01–0.05 | 1–5% chance — below the standard threshold | Significant | Effect size must be meaningful |
| p = 0.05–0.10 | 5–10% chance — "borderline" | Not significant | May still reflect real trend; study may be underpowered |
| p > 0.10 | More than 10% chance — easily explained by chance | Not significant | Does NOT mean treatments are equal |
The "p = 0.049 vs p = 0.051" Fallacy
These two values are not meaningfully different. The 0.05 cutoff is a convention, not a law of nature. Always interpret p-values alongside effect size, CI width, and clinical context.
Did it happen by chance?
A result is statistically significant when the p-value falls below the chosen threshold (usually 0.05). It simply means: the observed difference is unlikely to be a random fluke.
With a large enough sample, even a tiny, meaningless difference becomes statistically significant.
Does it actually matter to patients?
A result is clinically significant when the size of the effect is large enough to change what you would do for a patient — and to matter to that patient's life, quality of care, or outcomes.
Clinical significance is about the absolute numbers: How many patients need treatment to prevent one bad outcome?
Large Trials Can Make Tiny Differences Look Important
Imagine a study of 200,000 patients testing a cholesterol drug. After 5 years, the treated group has an event rate of 10.0% vs. 10.2% in the placebo group.
The p-value is p < 0.001 — highly significant. But the Absolute Risk Reduction = 0.2%. The NNT = 500 patients treated for 5 years to prevent one event. Statistics cannot answer whether that is worth it — clinical judgment must.
The Four Possibilities
| Scenario | Action |
|---|---|
| ✅ Stat + clinically significant | Adopt |
| ⚠️ Stat sig, NOT clinical sig | Be cautious |
| 🔍 NOT stat sig, big effect size | Underpowered — investigate |
| ❌ Neither | Abandon |
Absolute vs. Relative Measures
Absolute measures tell you the actual, tangible difference in numbers. Relative measures tell you the ratio — how much bigger or smaller one group's risk is compared to another's.
Relative measures almost always look more impressive than absolute ones, which is why pharmaceutical marketing loves them. As a clinician, always demand the absolute numbers.
The Golden Rule
Always convert a relative measure to an absolute measure before making a clinical decision. A "50% relative risk reduction" could mean going from 2% to 1% risk (NNT = 100) or from 40% to 20% risk (NNT = 5). These are vastly different clinical scenarios.
| Measure | Type | Study Type |
|---|---|---|
| ARR / ARI Absolute Risk Reduction/Increase | Absolute | RCTs, cohorts |
| RR / RRR Risk Ratio / Relative Risk Reduction | Relative | RCTs, cohorts |
| OR Odds Ratio | Relative | Case-control |
| HR Hazard Ratio | Relative | Survival / RCTs |
| NNT / NNH Number Needed to Treat/Harm | Absolute | Any |
Same Data — Completely Different Impressions
Placebo group: 10% stroke rate. Treatment group: 7% stroke rate. Look how differently this appears depending on which measure you report:
The ARR of 3% looks modest; the RRR of 30% sounds impressive — yet they describe the exact same data.
NNT from This Trial
Treat 34 patients to prevent 1 stroke. Worth it? That depends on cost, side effects, and baseline risk — not on the p-value.
The Universal Template
Every study compares two groups and counts how many in each group had the outcome of interest. These four numbers — a, b, c, d — are the building blocks of all statistical measures.
| Outcome: YES (Event occurred) |
Outcome: NO (No event) |
Total | |
|---|---|---|---|
| Exposed / Treatment | aTrue Positives | bFalse Positives | a + b |
| Unexposed / Control | cFalse Negatives | dTrue Negatives | c + d |
| Total | a + c | b + d | N |
Every Formula From a, b, c, d
500 patients receive apixaban; 500 receive warfarin. After 2 years: 25 apixaban patients had stroke; 50 warfarin patients had stroke.
| Stroke (YES) | No Stroke (NO) | Total | |
|---|---|---|---|
| Apixaban (Treatment) | a = 25 | b = 475 | 500 |
| Warfarin (Control) | c = 50 | d = 450 | 500 |
| Total | 75 | 925 | 1000 |
The Concept
The Risk Ratio asks: "How many times more likely is the event in Group A compared to Group B?" It divides the risk in the treated group by the risk in the control group.
The RR is only used in prospective studies (RCTs, cohort studies) where you follow people forward in time and directly measure incidence.
How to Interpret the RR
Treatment INCREASES risk
Harmful exposure
No difference
Null effect
Treatment REDUCES risk
Protective
RR = 2.0
Twice the risk. Treated group has 2× the event rate of controls.
RR = 1.0
No difference. Both groups have identical event rates.
RR = 0.5
Half the risk. Treatment reduces events by 50% relative to control.
From RR to Relative Risk Reduction (RRR)
Using our apixaban example: RR = 0.50, so RRR = 50%. Sounds dramatic — but the ARR is only 5%, and NNT = 20.
The RRR Marketing Trap
A drug reducing events from 2% to 1% has an RRR of 50% — identical to a drug reducing events from 40% to 20%. The NNT values are 100 vs. 5. These are completely different clinical scenarios. Always calculate ARR = CER − EER.
Probability vs. Odds — The Critical Difference
A Betting Analogy
In horse racing, "5 to 1 odds" means 5 losses for every 1 win. The Odds Ratio uses this exact concept — it compares the odds of an event in two groups.
Calculating the Odds Ratio
Also: OR = [a/b] ÷ [c/d] = odds in exposed ÷ odds in controls
More disease in exposed
No association
Less disease in exposed
When to Use OR vs. RR — and Why It Matters
Use RR when:
You have a prospective study (RCT or cohort) where you followed people forward in time and can directly measure the incidence of events.
Use OR when:
You have a case-control study where you started with people who already have the disease and looked backward. You cannot calculate true incidence, so you must use odds instead. The OR is the only valid measure in this design.
When are OR and RR similar?
When the disease is rare (event rate <10%), the OR approximates the RR closely — this is the rare disease assumption. As events become more common, the OR always overestimates the RR.
| Event Rate | OR ≈ RR? |
|---|---|
| < 5% | Yes, very close |
| 5–10% | Reasonably close |
| > 20% | Significant overestimate |
| Lung Cancer (Cases) | No Cancer (Controls) | |
|---|---|---|
| Smoker | a = 160 | b = 80 |
| Non-smoker | c = 40 | d = 120 |
The Exact Definition
"A 95% confidence interval means: if we repeated this study 100 times with different samples, approximately 95 of those studies would produce a confidence interval that contains the true population value."
In practical terms: the CI gives you the range of values that are plausible for the true effect, given the data you observed. The point estimate (e.g., RR = 0.72) is your best single guess; the CI (e.g., 0.58–0.89) is the range of uncertainty around that guess.
Three Things the CI Tells You
Is the entire CI on the benefit side (below 1.0 for RR/OR)? Entirely harmful (above 1.0)? Or does it cross the null, suggesting uncertainty about even the direction?
If the 95% CI does not include the null value (1.0 for RR/OR; 0 for ARR), the result is statistically significant at p < 0.05. The CI tells you everything the p-value does — plus more.
A narrow CI = precise estimate (large study). A wide CI = high uncertainty — treat with caution, even if statistically significant.
The Null Value — Your Reference Point
| Measure | Null Value | CI crosses null if... |
|---|---|---|
| Risk Ratio (RR) | 1.0 | Includes 1.0 |
| Odds Ratio (OR) | 1.0 | Includes 1.0 |
| Hazard Ratio (HR) | 1.0 | Includes 1.0 |
| ARR / Difference | 0.0 | Includes 0 |
The Rule
If the CI crosses the null value, the result is NOT statistically significant. There is insufficient evidence to rule out "no difference."
90% CI
Narrower. α = 0.10. Easier to reach significance. Rare in clinical research.
95% CI ⭐
The universal standard. α = 0.05. Always assume 95% unless stated otherwise.
99% CI
Wider. α = 0.01. Used when errors are very costly. Harder to achieve significance.
Visual CI Interpretation — Six Scenarios for RR
The vertical dashed red line represents the null value (RR = 1.0). Each horizontal bar is a 95% CI; the dot is the point estimate (best guess). Scroll right if needed on narrow screens.
Bonus: The Hazard Ratio (HR) — RR for Survival Data
When a study tracks participants over time and measures when events occur (not just whether they occur), the appropriate measure is the Hazard Ratio. It is the ratio of the rate at which events are occurring in the treatment group vs. the control group at any given moment in time.
The HR is the standard output of Kaplan-Meier survival analysis and Cox proportional hazards models — the two most common statistical methods in major cardiovascular, oncology, and infectious disease trials.
Example: ARISTOTLE Trial (Apixaban vs. Warfarin)
HR for stroke or systemic embolism = 0.79 (95% CI: 0.66–0.95; p = 0.01). Apixaban users experienced stroke at 79% of the rate of warfarin users. The CI does not cross 1.0, confirming statistical significance.
Number Needed to Treat (NNT)
The NNT is the number of patients who must receive a treatment for one additional patient to benefit — compared to the control group. A lower NNT means a more effective treatment.
NNT is always reported for a specific time period and a specific outcome. An NNT of 20 to prevent stroke over 5 years is meaningless without that context.
Number Needed to Harm (NNH)
The NNH is the number of patients who must be treated for one additional patient to experience harm — compared to the control group. A higher NNH means a safer treatment.
Always compare NNT vs. NNH. A drug with NNT = 50 but NNH = 20 causes more harm than benefit — even if statistically significant.
NNT Benchmarks — A Rough Clinical Guide
e.g., antibiotics for bacterial meningitis. Strong, direct benefit.
e.g., antihypertensives for high-risk patients. Generally justifiable.
Requires shared decision-making. Depends on cost, burden, patient preference.
Rarely justifiable for primary prevention unless intervention is very cheap and safe.
A new anticoagulant reduces DVT but increases major bleeding. Data over 6 months:
Efficacy Data (DVT Prevention)
Harm Data (Major Bleeding)
Master Reference: All Key Statistics at a Glance
| Statistic | What It Measures | Null | Favors Treatment | Best Used In | Watch Out For |
|---|---|---|---|---|---|
| P-value | Probability result is due to chance | — | p < 0.05 | Any study | Says nothing about effect size |
| 95% CI | Range of plausible true values | Null excluded | CI entirely on benefit side | Any study | Wide CI = imprecise despite significance |
| ARR | Absolute difference in event rates | 0 | > 0 | RCT, cohort | Always report with NNT |
| RRR | % reduction relative to control rate | 0% | > 0% | RCT, cohort | Can mislead — always check ARR too |
| RR | Ratio of event rates between groups | 1.0 | < 1.0 | RCT, prospective cohort | Not valid for case-control designs |
| OR | Ratio of odds between groups | 1.0 | < 1.0 | Case-control studies | Overestimates RR when events are common |
| HR | Instantaneous ratio of event rates over time | 1.0 | < 1.0 | Survival analysis / landmark RCTs | Assumes proportional hazards throughout |
| NNT | Patients treated per 1 benefited | ∞ | Small (<20) | Any (derived from ARR) | Must specify time period & outcome |
| NNH | Patients treated per 1 harmed | ∞ | Large (>100) | Any (derived from ARI) | Compare against NNT for benefit-risk balance |
Frequently Confused — Quick Clarifications
Why can't I just use RR for case-control studies?
If OR ≈ RR for rare diseases, why does it matter which one I use?
A trial reports p = 0.06 and I think the treatment is still useful. Am I wrong?
Can the NNT be negative? What does that mean?
My CI is 0.95–1.02. The result is "not significant." But it's almost entirely below 1.0 — is there still a benefit?
Why does the OR always exaggerate the RR for common outcomes?
When You Pick Up a Clinical Paper — Ask These Questions in Order
Step 1 — Validity
Was this an RCT (use RR/HR) or case-control (use OR)? Was randomization concealed? Was blinding adequate? Was follow-up >80%? Was an intention-to-treat analysis used?
Step 2 — Results
What is the p-value? What is the 95% CI — does it cross the null? What is the ARR (not just RRR)? What is the NNT? What is the NNH? Is there a clinically meaningful effect?
Step 3 — Applicability
Does my patient match the study population? Is this drug available locally? Do my patient's values align with the outcomes measured? Is the NNT acceptable given my patient's baseline risk?
Understanding Numbers
in Clinical Research
A visual, plain-English journey through every statistical concept you need to read, interpret, and critically appraise clinical studies — from the very first principles.
The Core Problem: Chance
Imagine you give 10 patients a new drug and 7 improve. Has the drug worked, or did 7 patients simply get better on their own? You cannot know just by looking at the number.
Clinical research almost always involves samples — small groups chosen to represent a much larger population. Any measurement from a sample contains some degree of natural randomness, which we call sampling variability or chance error.
Statistics gives us a rigorous, objective way to ask: "How likely is it that what I observed happened purely by chance?" and "How confident can I be that my result reflects the true effect in the real population?"
Flipping a Coin
If you flip a fair coin 10 times and get 7 heads, you might wonder if the coin is biased. But 7 heads in 10 flips can happen by pure chance. Statistics tells you exactly how often that would happen with a fair coin (about 12% of the time), so you can decide whether to be suspicious.
Clinical statistics works the same way — it quantifies how surprising (or unsurprising) an observed result is.
Three Concepts That Underlie Everything Else
Population vs. Sample
The population is every person who has or could have the disease. The sample is the finite group enrolled in a study. We use the sample to make inferences about the population — always with some uncertainty.
The Null Hypothesis (H₀)
Every study starts with a "boring" assumption: "There is no difference between the two treatments." Statistics then measures how strongly the data argue against this assumption. We never prove H₀ true — we only reject it or fail to reject it.
Alpha (α) — The Threshold
Before running a study, researchers set a threshold (usually α = 0.05) for how much chance they are willing to tolerate. If the probability of seeing a result this extreme by chance is less than 5%, they reject the null hypothesis and declare the result "statistically significant."
Type I Error (α)
Concluding there IS a difference when there isn't one. A "false positive." The risk equals your chosen α (5%). Think: convicting an innocent person.
Type II Error (β)
Concluding there IS NO difference when there actually is one. A "false negative." Related to statistical power. Think: acquitting a guilty person.
Statistical Power (1−β)
The probability of correctly detecting a real effect if one truly exists. Well-designed studies aim for ≥80% power. Low power = high chance of missing a true benefit.
The Exact Definition (Read This Slowly)
"The p-value is the probability of obtaining results at least as extreme as those observed, assuming the null hypothesis is true."
In plain language: if there were truly no difference between the two treatments, how often would I see a difference this large (or larger) just by random chance? A small p-value means the data are very surprising under the assumption of "no effect," which leads us to reject that assumption.
What the P-Value IS
- A measure of how incompatible the data are with the null hypothesis
- A probability ranging from 0 to 1 (often expressed as a percentage)
- A tool to guard against being fooled by chance
- One piece of evidence — to be interpreted alongside effect size
What the P-Value is NOT
- ❌ The probability that the null hypothesis is true
- ❌ The probability that your result happened by chance
- ❌ A measure of the size or importance of the effect
- ❌ Proof that the treatment works in the real world
How to Interpret Common P-Value Ranges
| P-Value | What It Means | Decision | Caution |
|---|---|---|---|
| p < 0.001 | Less than 0.1% chance if H₀ is true | Strongly significant | Still needs clinical context |
| p = 0.01–0.05 | 1–5% chance — below the standard threshold | Significant | Effect size must be meaningful |
| p = 0.05–0.10 | 5–10% chance — "borderline" | Not significant | May still reflect real trend; study may be underpowered |
| p > 0.10 | More than 10% chance — easily explained by chance | Not significant | Does NOT mean treatments are equal |
The "p = 0.049 vs p = 0.051" Fallacy
These two values are not meaningfully different. The 0.05 cutoff is a convention, not a law of nature. Always interpret p-values alongside effect size, CI width, and clinical context.
Did it happen by chance?
A result is statistically significant when the p-value falls below the chosen threshold (usually 0.05). It simply means: the observed difference is unlikely to be a random fluke.
With a large enough sample, even a tiny, meaningless difference becomes statistically significant.
Does it actually matter to patients?
A result is clinically significant when the size of the effect is large enough to change what you would do for a patient — and to matter to that patient's life, quality of care, or outcomes.
Clinical significance is about the absolute numbers: How many patients need treatment to prevent one bad outcome?
Large Trials Can Make Tiny Differences Look Important
Imagine a study of 200,000 patients testing a cholesterol drug. After 5 years, the treated group has an event rate of 10.0% vs. 10.2% in the placebo group.
The p-value is p < 0.001 — highly significant. But the Absolute Risk Reduction = 0.2%. The NNT = 500 patients treated for 5 years to prevent one event. Statistics cannot answer whether that is worth it — clinical judgment must.
The Four Possibilities
| Scenario | Action |
|---|---|
| ✅ Stat + clinically significant | Adopt |
| ⚠️ Stat sig, NOT clinical sig | Be cautious |
| 🔍 NOT stat sig, big effect size | Underpowered — investigate |
| ❌ Neither | Abandon |
Absolute vs. Relative Measures
Absolute measures tell you the actual, tangible difference in numbers. Relative measures tell you the ratio — how much bigger or smaller one group's risk is compared to another's.
Relative measures almost always look more impressive than absolute ones, which is why pharmaceutical marketing loves them. As a clinician, always demand the absolute numbers.
The Golden Rule
Always convert a relative measure to an absolute measure before making a clinical decision. A "50% relative risk reduction" could mean going from 2% to 1% risk (NNT = 100) or from 40% to 20% risk (NNT = 5). These are vastly different clinical scenarios.
| Measure | Type | Study Type |
|---|---|---|
| ARR / ARI Absolute Risk Reduction/Increase | Absolute | RCTs, cohorts |
| RR / RRR Risk Ratio / Relative Risk Reduction | Relative | RCTs, cohorts |
| OR Odds Ratio | Relative | Case-control |
| HR Hazard Ratio | Relative | Survival / RCTs |
| NNT / NNH Number Needed to Treat/Harm | Absolute | Any |
Same Data — Completely Different Impressions
Placebo group: 10% stroke rate. Treatment group: 7% stroke rate. Look how differently this appears depending on which measure you report:
The ARR of 3% looks modest; the RRR of 30% sounds impressive — yet they describe the exact same data.
NNT from This Trial
Treat 34 patients to prevent 1 stroke. Worth it? That depends on cost, side effects, and baseline risk — not on the p-value.
The Universal Template
Every study compares two groups and counts how many in each group had the outcome of interest. These four numbers — a, b, c, d — are the building blocks of all statistical measures.
| Outcome: YES (Event occurred) |
Outcome: NO (No event) |
Total | |
|---|---|---|---|
| Exposed / Treatment | aTrue Positives | bFalse Positives | a + b |
| Unexposed / Control | cFalse Negatives | dTrue Negatives | c + d |
| Total | a + c | b + d | N |
Every Formula From a, b, c, d
500 patients receive apixaban; 500 receive warfarin. After 2 years: 25 apixaban patients had stroke; 50 warfarin patients had stroke.
| Stroke (YES) | No Stroke (NO) | Total | |
|---|---|---|---|
| Apixaban (Treatment) | a = 25 | b = 475 | 500 |
| Warfarin (Control) | c = 50 | d = 450 | 500 |
| Total | 75 | 925 | 1000 |
The Concept
The Risk Ratio asks: "How many times more likely is the event in Group A compared to Group B?" It divides the risk in the treated group by the risk in the control group.
The RR is only used in prospective studies (RCTs, cohort studies) where you follow people forward in time and directly measure incidence.
How to Interpret the RR
Treatment INCREASES risk
Harmful exposure
No difference
Null effect
Treatment REDUCES risk
Protective
RR = 2.0
Twice the risk. Treated group has 2× the event rate of controls.
RR = 1.0
No difference. Both groups have identical event rates.
RR = 0.5
Half the risk. Treatment reduces events by 50% relative to control.
From RR to Relative Risk Reduction (RRR)
Using our apixaban example: RR = 0.50, so RRR = 50%. Sounds dramatic — but the ARR is only 5%, and NNT = 20.
The RRR Marketing Trap
A drug reducing events from 2% to 1% has an RRR of 50% — identical to a drug reducing events from 40% to 20%. The NNT values are 100 vs. 5. These are completely different clinical scenarios. Always calculate ARR = CER − EER.
Probability vs. Odds — The Critical Difference
A Betting Analogy
In horse racing, "5 to 1 odds" means 5 losses for every 1 win. The Odds Ratio uses this exact concept — it compares the odds of an event in two groups.
Calculating the Odds Ratio
Also: OR = [a/b] ÷ [c/d] = odds in exposed ÷ odds in controls
More disease in exposed
No association
Less disease in exposed
When to Use OR vs. RR — and Why It Matters
Use RR when:
You have a prospective study (RCT or cohort) where you followed people forward in time and can directly measure the incidence of events.
Use OR when:
You have a case-control study where you started with people who already have the disease and looked backward. You cannot calculate true incidence, so you must use odds instead. The OR is the only valid measure in this design.
When are OR and RR similar?
When the disease is rare (event rate <10%), the OR approximates the RR closely — this is the rare disease assumption. As events become more common, the OR always overestimates the RR.
| Event Rate | OR ≈ RR? |
|---|---|
| < 5% | Yes, very close |
| 5–10% | Reasonably close |
| > 20% | Significant overestimate |
| Lung Cancer (Cases) | No Cancer (Controls) | |
|---|---|---|
| Smoker | a = 160 | b = 80 |
| Non-smoker | c = 40 | d = 120 |
The Exact Definition
"A 95% confidence interval means: if we repeated this study 100 times with different samples, approximately 95 of those studies would produce a confidence interval that contains the true population value."
In practical terms: the CI gives you the range of values that are plausible for the true effect, given the data you observed. The point estimate (e.g., RR = 0.72) is your best single guess; the CI (e.g., 0.58–0.89) is the range of uncertainty around that guess.
Three Things the CI Tells You
Is the entire CI on the benefit side (below 1.0 for RR/OR)? Entirely harmful (above 1.0)? Or does it cross the null, suggesting uncertainty about even the direction?
If the 95% CI does not include the null value (1.0 for RR/OR; 0 for ARR), the result is statistically significant at p < 0.05. The CI tells you everything the p-value does — plus more.
A narrow CI = precise estimate (large study). A wide CI = high uncertainty — treat with caution, even if statistically significant.
The Null Value — Your Reference Point
| Measure | Null Value | CI crosses null if... |
|---|---|---|
| Risk Ratio (RR) | 1.0 | Includes 1.0 |
| Odds Ratio (OR) | 1.0 | Includes 1.0 |
| Hazard Ratio (HR) | 1.0 | Includes 1.0 |
| ARR / Difference | 0.0 | Includes 0 |
The Rule
If the CI crosses the null value, the result is NOT statistically significant. There is insufficient evidence to rule out "no difference."
90% CI
Narrower. α = 0.10. Easier to reach significance. Rare in clinical research.
95% CI ⭐
The universal standard. α = 0.05. Always assume 95% unless stated otherwise.
99% CI
Wider. α = 0.01. Used when errors are very costly. Harder to achieve significance.
Visual CI Interpretation — Six Scenarios for RR
The vertical dashed red line represents the null value (RR = 1.0). Each horizontal bar is a 95% CI; the dot is the point estimate (best guess). Scroll right if needed on narrow screens.
Bonus: The Hazard Ratio (HR) — RR for Survival Data
When a study tracks participants over time and measures when events occur (not just whether they occur), the appropriate measure is the Hazard Ratio. It is the ratio of the rate at which events are occurring in the treatment group vs. the control group at any given moment in time.
The HR is the standard output of Kaplan-Meier survival analysis and Cox proportional hazards models — the two most common statistical methods in major cardiovascular, oncology, and infectious disease trials.
Example: ARISTOTLE Trial (Apixaban vs. Warfarin)
HR for stroke or systemic embolism = 0.79 (95% CI: 0.66–0.95; p = 0.01). Apixaban users experienced stroke at 79% of the rate of warfarin users. The CI does not cross 1.0, confirming statistical significance.
Number Needed to Treat (NNT)
The NNT is the number of patients who must receive a treatment for one additional patient to benefit — compared to the control group. A lower NNT means a more effective treatment.
NNT is always reported for a specific time period and a specific outcome. An NNT of 20 to prevent stroke over 5 years is meaningless without that context.
Number Needed to Harm (NNH)
The NNH is the number of patients who must be treated for one additional patient to experience harm — compared to the control group. A higher NNH means a safer treatment.
Always compare NNT vs. NNH. A drug with NNT = 50 but NNH = 20 causes more harm than benefit — even if statistically significant.
NNT Benchmarks — A Rough Clinical Guide
e.g., antibiotics for bacterial meningitis. Strong, direct benefit.
e.g., antihypertensives for high-risk patients. Generally justifiable.
Requires shared decision-making. Depends on cost, burden, patient preference.
Rarely justifiable for primary prevention unless intervention is very cheap and safe.
A new anticoagulant reduces DVT but increases major bleeding. Data over 6 months:
Efficacy Data (DVT Prevention)
Harm Data (Major Bleeding)
Master Reference: All Key Statistics at a Glance
| Statistic | What It Measures | Null | Favors Treatment | Best Used In | Watch Out For |
|---|---|---|---|---|---|
| P-value | Probability result is due to chance | — | p < 0.05 | Any study | Says nothing about effect size |
| 95% CI | Range of plausible true values | Null excluded | CI entirely on benefit side | Any study | Wide CI = imprecise despite significance |
| ARR | Absolute difference in event rates | 0 | > 0 | RCT, cohort | Always report with NNT |
| RRR | % reduction relative to control rate | 0% | > 0% | RCT, cohort | Can mislead — always check ARR too |
| RR | Ratio of event rates between groups | 1.0 | < 1.0 | RCT, prospective cohort | Not valid for case-control designs |
| OR | Ratio of odds between groups | 1.0 | < 1.0 | Case-control studies | Overestimates RR when events are common |
| HR | Instantaneous ratio of event rates over time | 1.0 | < 1.0 | Survival analysis / landmark RCTs | Assumes proportional hazards throughout |
| NNT | Patients treated per 1 benefited | ∞ | Small (<20) | Any (derived from ARR) | Must specify time period & outcome |
| NNH | Patients treated per 1 harmed | ∞ | Large (>100) | Any (derived from ARI) | Compare against NNT for benefit-risk balance |
Frequently Confused — Quick Clarifications
Why can't I just use RR for case-control studies?
If OR ≈ RR for rare diseases, why does it matter which one I use?
A trial reports p = 0.06 and I think the treatment is still useful. Am I wrong?
Can the NNT be negative? What does that mean?
My CI is 0.95–1.02. The result is "not significant." But it's almost entirely below 1.0 — is there still a benefit?
Why does the OR always exaggerate the RR for common outcomes?
When You Pick Up a Clinical Paper — Ask These Questions in Order
Step 1 — Validity
Was this an RCT (use RR/HR) or case-control (use OR)? Was randomization concealed? Was blinding adequate? Was follow-up >80%? Was an intention-to-treat analysis used?
Step 2 — Results
What is the p-value? What is the 95% CI — does it cross the null? What is the ARR (not just RRR)? What is the NNT? What is the NNH? Is there a clinically meaningful effect?
Step 3 — Applicability
Does my patient match the study population? Is this drug available locally? Do my patient's values align with the outcomes measured? Is the NNT acceptable given my patient's baseline risk?