Skip to Content
Clinical Pharmacology & EBM · Complete Beginners' Guide

Understanding Numbers
in Clinical Research

A visual, plain-English journey through every statistical concept you need to read, interpret, and critically appraise clinical studies — from the very first principles.

P-Value Confidence Intervals Risk Ratio Odds Ratio ARR & NNT Stat vs. Clinical Significance Hazard Ratio
01
Why Do We Need Statistics at All?
Before we dive into formulas, let's understand the problem statistics exist to solve — and why intuition alone is not enough in clinical research.

The Core Problem: Chance

Imagine you give 10 patients a new drug and 7 improve. Has the drug worked, or did 7 patients simply get better on their own? You cannot know just by looking at the number.

Clinical research almost always involves samples — small groups chosen to represent a much larger population. Any measurement from a sample contains some degree of natural randomness, which we call sampling variability or chance error.

Statistics gives us a rigorous, objective way to ask: "How likely is it that what I observed happened purely by chance?" and "How confident can I be that my result reflects the true effect in the real population?"

A Simple Analogy

Flipping a Coin

If you flip a fair coin 10 times and get 7 heads, you might wonder if the coin is biased. But 7 heads in 10 flips can happen by pure chance. Statistics tells you exactly how often that would happen with a fair coin (about 12% of the time), so you can decide whether to be suspicious.

Clinical statistics works the same way — it quantifies how surprising (or unsurprising) an observed result is.

Key Vocabulary — Learn These First

Three Concepts That Underlie Everything Else

Population vs. Sample

The population is every person who has or could have the disease. The sample is the finite group enrolled in a study. We use the sample to make inferences about the population — always with some uncertainty.

The Null Hypothesis (H₀)

Every study starts with a "boring" assumption: "There is no difference between the two treatments." Statistics then measures how strongly the data argue against this assumption. We never prove H₀ true — we only reject it or fail to reject it.

Alpha (α) — The Threshold

Before running a study, researchers set a threshold (usually α = 0.05) for how much chance they are willing to tolerate. If the probability of seeing a result this extreme by chance is less than 5%, they reject the null hypothesis and declare the result "statistically significant."

Type I Error (α)

Concluding there IS a difference when there isn't one. A "false positive." The risk equals your chosen α (5%). Think: convicting an innocent person.

Type II Error (β)

Concluding there IS NO difference when there actually is one. A "false negative." Related to statistical power. Think: acquitting a guilty person.

Statistical Power (1−β)

The probability of correctly detecting a real effect if one truly exists. Well-designed studies aim for ≥80% power. Low power = high chance of missing a true benefit.


02
The P-Value — The Most Misunderstood Number in Medicine
The p-value is the single most cited — and most misinterpreted — statistic in all of clinical research. Let's demystify it completely.

The Exact Definition (Read This Slowly)

"The p-value is the probability of obtaining results at least as extreme as those observed, assuming the null hypothesis is true."

In plain language: if there were truly no difference between the two treatments, how often would I see a difference this large (or larger) just by random chance? A small p-value means the data are very surprising under the assumption of "no effect," which leads us to reject that assumption.

What the P-Value IS

  • A measure of how incompatible the data are with the null hypothesis
  • A probability ranging from 0 to 1 (often expressed as a percentage)
  • A tool to guard against being fooled by chance
  • One piece of evidence — to be interpreted alongside effect size

What the P-Value is NOT

  • ❌ The probability that the null hypothesis is true
  • ❌ The probability that your result happened by chance
  • ❌ A measure of the size or importance of the effect
  • ❌ Proof that the treatment works in the real world
🧪
Worked Example: New Antibiotic vs. Standard Care
Understanding a p-value in context
1
The study: 200 patients with community-acquired pneumonia. 100 receive the new antibiotic, 100 receive standard care. The new antibiotic cures 78 patients; standard care cures 65.
2
The question: Is this 13-patient difference real, or could it happen by chance if the two drugs were actually equally effective?
3
The result: Statistical analysis returns p = 0.04. This means: if the two drugs were truly equal, a difference this large would appear by chance in only 4% of similar studies.
4
The interpretation: Since 4% is below our pre-set threshold of 5% (α = 0.05), we reject the null hypothesis and call the result statistically significant.
⚠️ But wait — is this clinically meaningful? A 13% absolute improvement may or may not matter depending on cost, side effects, and patient context. The p-value tells us the result is probably real; it says nothing about whether it is worth acting on.

How to Interpret Common P-Value Ranges

P-ValueWhat It MeansDecisionCaution
p < 0.001Less than 0.1% chance if H₀ is trueStrongly significantStill needs clinical context
p = 0.01–0.051–5% chance — below the standard thresholdSignificantEffect size must be meaningful
p = 0.05–0.105–10% chance — "borderline"Not significantMay still reflect real trend; study may be underpowered
p > 0.10More than 10% chance — easily explained by chanceNot significantDoes NOT mean treatments are equal
⚠️

The "p = 0.049 vs p = 0.051" Fallacy

These two values are not meaningfully different. The 0.05 cutoff is a convention, not a law of nature. Always interpret p-values alongside effect size, CI width, and clinical context.


03
Statistical Significance vs. Clinical Significance
This is the most critical distinction in all of EBM. Confusing these two ideas has led to harmful prescribing decisions worldwide.
Statistical Significance

Did it happen by chance?

A result is statistically significant when the p-value falls below the chosen threshold (usually 0.05). It simply means: the observed difference is unlikely to be a random fluke.

With a large enough sample, even a tiny, meaningless difference becomes statistically significant.

Remember p < 0.05 = "probably real" NOT "definitely important"
Clinical Significance

Does it actually matter to patients?

A result is clinically significant when the size of the effect is large enough to change what you would do for a patient — and to matter to that patient's life, quality of care, or outcomes.

Clinical significance is about the absolute numbers: How many patients need treatment to prevent one bad outcome?

The Right Question ARR, NNT, Effect Size Not just the p-value
The Classic Illustration

Large Trials Can Make Tiny Differences Look Important

Imagine a study of 200,000 patients testing a cholesterol drug. After 5 years, the treated group has an event rate of 10.0% vs. 10.2% in the placebo group.

The p-value is p < 0.001 — highly significant. But the Absolute Risk Reduction = 0.2%. The NNT = 500 patients treated for 5 years to prevent one event. Statistics cannot answer whether that is worth it — clinical judgment must.

The Four Possibilities

Scenario Action
✅ Stat + clinically significantAdopt
⚠️ Stat sig, NOT clinical sigBe cautious
🔍 NOT stat sig, big effect sizeUnderpowered — investigate
❌ NeitherAbandon

04
Measuring the Effect — The Full Family
Once we know a result is not due to chance, we need to measure HOW LARGE the effect actually is. This chapter introduces the complete family of effect measures.

Absolute vs. Relative Measures

Absolute measures tell you the actual, tangible difference in numbers. Relative measures tell you the ratio — how much bigger or smaller one group's risk is compared to another's.

Relative measures almost always look more impressive than absolute ones, which is why pharmaceutical marketing loves them. As a clinician, always demand the absolute numbers.

🎯

The Golden Rule

Always convert a relative measure to an absolute measure before making a clinical decision. A "50% relative risk reduction" could mean going from 2% to 1% risk (NNT = 100) or from 40% to 20% risk (NNT = 5). These are vastly different clinical scenarios.

MeasureTypeStudy Type
ARR / ARI
Absolute Risk Reduction/Increase
AbsoluteRCTs, cohorts
RR / RRR
Risk Ratio / Relative Risk Reduction
RelativeRCTs, cohorts
OR
Odds Ratio
RelativeCase-control
HR
Hazard Ratio
RelativeSurvival / RCTs
NNT / NNH
Number Needed to Treat/Harm
AbsoluteAny

Same Data — Completely Different Impressions

Placebo group: 10% stroke rate. Treatment group: 7% stroke rate. Look how differently this appears depending on which measure you report:

Absolute Risk Reduction (ARR)3%
Relative Risk Reduction (RRR)30%
Risk Ratio reduction (RR = 0.70)30%

The ARR of 3% looks modest; the RRR of 30% sounds impressive — yet they describe the exact same data.

NNT from This Trial

34 Number Needed to Treat

Treat 34 patients to prevent 1 stroke. Worth it? That depends on cost, side effects, and baseline risk — not on the p-value.


05
The 2×2 Contingency Table — The Mother of All Calculations
Nearly every effect measure in clinical research is derived from a simple 2×2 table. Master this table, and every formula that follows becomes intuitive.

The Universal Template

Every study compares two groups and counts how many in each group had the outcome of interest. These four numbers — a, b, c, d — are the building blocks of all statistical measures.

Outcome: YES
(Event occurred)
Outcome: NO
(No event)
Total
Exposed / Treatment aTrue Positives bFalse Positives a + b
Unexposed / Control cFalse Negatives dTrue Negatives c + d
Total a + c b + d N

Every Formula From a, b, c, d

Risk in Exposed (EER)a ÷ (a + b)Event rate in the treatment/exposed group
Risk in Controls (CER)c ÷ (c + d)Event rate in the control/unexposed group
Risk Ratio (RR)EER ÷ CER= [a/(a+b)] ÷ [c/(c+d)]
Odds Ratio (OR)(a × d) ÷ (b × c)The cross-product ratio
📊
Worked Example: Apixaban vs. Warfarin — Stroke Prevention in AF
Populating a real 2×2 table from trial data

500 patients receive apixaban; 500 receive warfarin. After 2 years: 25 apixaban patients had stroke; 50 warfarin patients had stroke.

Stroke (YES)No Stroke (NO)Total
Apixaban (Treatment)a = 25b = 475500
Warfarin (Control)c = 50d = 450500
Total759251000
EER = 25/500 = 5%  |  CER = 50/500 = 10%  |  ARR = 5%  |  NNT = 20  |  RR = 0.50  |  OR = (25×450)÷(475×50) = 0.47

06
Risk Ratio (Relative Risk) — RR
The Risk Ratio compares the probability of an event in two groups. It is the most intuitive relative measure and the standard output of RCTs and cohort studies.

The Concept

The Risk Ratio asks: "How many times more likely is the event in Group A compared to Group B?" It divides the risk in the treated group by the risk in the control group.

Risk Ratio Formula RR = EER ÷ CER EER = Event rate in Experimental group | CER = Event rate in Control group

The RR is only used in prospective studies (RCTs, cohort studies) where you follow people forward in time and directly measure incidence.

How to Interpret the RR

RR > 1

Treatment INCREASES risk
Harmful exposure
RR = 1

No difference
Null effect
RR < 1

Treatment REDUCES risk
Protective

RR = 2.0

Twice the risk. Treated group has 2× the event rate of controls.

RR = 1.0

No difference. Both groups have identical event rates.

RR = 0.5

Half the risk. Treatment reduces events by 50% relative to control.

From RR to Relative Risk Reduction (RRR)

RRR Formula RRR = (1 − RR) × 100% Also: (CER − EER) ÷ CER × 100%

Using our apixaban example: RR = 0.50, so RRR = 50%. Sounds dramatic — but the ARR is only 5%, and NNT = 20.

🚨

The RRR Marketing Trap

A drug reducing events from 2% to 1% has an RRR of 50% — identical to a drug reducing events from 40% to 20%. The NNT values are 100 vs. 5. These are completely different clinical scenarios. Always calculate ARR = CER − EER.

💊
Full Worked Example: Statin for MI Prevention
Calculating RR, RRR, ARR, and NNT together
1
Data: 5-year RCT. Statin group: 4% MI rate. Placebo group: 6% MI rate.
2
EER = 0.04  |  CER = 0.06
3
RR = 0.04 ÷ 0.06 = 0.67 — statin patients have 67% of the risk of placebo patients.
4
RRR = (1 − 0.67) × 100% = 33% — this makes the headlines.
5
ARR = 6% − 4% = 2% — this is the honest number.
6
NNT = 1 ÷ 0.02 = 50 patients treated for 5 years to prevent 1 MI.
Clinical judgment: Treating 50 patients for 5 years to prevent 1 MI is justifiable for high-risk patients but requires shared decision-making for low-risk individuals. The p-value alone cannot make this call.

07
The Odds Ratio (OR) — The Case-Control Statistic
The OR is one of the most confusing statistics in clinical research — mainly because people confuse "odds" with "probability." Understanding the difference unlocks everything.

Probability vs. Odds — The Critical Difference

Probability P = Events ÷ Total Example: 1 stroke in 10 patients → P = 1/10 = 10%
Odds Odds = Events ÷ Non-events Example: 1 stroke in 10 patients → Odds = 1/9 = 0.111
💡

A Betting Analogy

In horse racing, "5 to 1 odds" means 5 losses for every 1 win. The Odds Ratio uses this exact concept — it compares the odds of an event in two groups.

Calculating the Odds Ratio

Odds Ratio Formula OR = (a × d) ÷ (b × c) The elegant cross-product formula from the 2×2 table.
Also: OR = [a/b] ÷ [c/d] = odds in exposed ÷ odds in controls
OR > 1

More disease in exposed
OR = 1

No association
OR < 1

Less disease in exposed
The Most Important Distinction

When to Use OR vs. RR — and Why It Matters

Use RR when:

You have a prospective study (RCT or cohort) where you followed people forward in time and can directly measure the incidence of events.

Use OR when:

You have a case-control study where you started with people who already have the disease and looked backward. You cannot calculate true incidence, so you must use odds instead. The OR is the only valid measure in this design.

When are OR and RR similar?

When the disease is rare (event rate <10%), the OR approximates the RR closely — this is the rare disease assumption. As events become more common, the OR always overestimates the RR.

Event Rate OR ≈ RR?
< 5%Yes, very close
5–10%Reasonably close
> 20%Significant overestimate
🔬
Worked Example: Smoking and Lung Cancer (Case-Control)
The classic use case for the Odds Ratio
1
Study design: 200 lung cancer patients (cases) and 200 matched healthy controls. We look back at their smoking history.
2
Lung Cancer (Cases)No Cancer (Controls)
Smokera = 160b = 80
Non-smokerc = 40d = 120
3
Odds in cancer group = 160 ÷ 40 = 4.0
4
Odds in control group = 80 ÷ 120 = 0.667
5
OR = 4.0 ÷ 0.667 = 6.0 (or cross-product: 160×120 ÷ 80×40 = 6.0)
Interpretation: Smokers have 6 times the odds of developing lung cancer compared to non-smokers. Since lung cancer is relatively rare in the population (<10%), this OR closely approximates the true Risk Ratio.

08
Confidence Intervals — The Width of Our Uncertainty
A confidence interval does something a p-value cannot: it shows you not just whether an effect is real, but HOW BIG it might plausibly be — giving you the full range of uncertainty around your estimate.

The Exact Definition

"A 95% confidence interval means: if we repeated this study 100 times with different samples, approximately 95 of those studies would produce a confidence interval that contains the true population value."

In practical terms: the CI gives you the range of values that are plausible for the true effect, given the data you observed. The point estimate (e.g., RR = 0.72) is your best single guess; the CI (e.g., 0.58–0.89) is the range of uncertainty around that guess.

Three Things the CI Tells You

1. The Direction of Effect

Is the entire CI on the benefit side (below 1.0 for RR/OR)? Entirely harmful (above 1.0)? Or does it cross the null, suggesting uncertainty about even the direction?

2. Statistical Significance

If the 95% CI does not include the null value (1.0 for RR/OR; 0 for ARR), the result is statistically significant at p < 0.05. The CI tells you everything the p-value does — plus more.

3. The Precision of the Estimate

A narrow CI = precise estimate (large study). A wide CI = high uncertainty — treat with caution, even if statistically significant.

The Null Value — Your Reference Point

MeasureNull ValueCI crosses null if...
Risk Ratio (RR)1.0Includes 1.0
Odds Ratio (OR)1.0Includes 1.0
Hazard Ratio (HR)1.0Includes 1.0
ARR / Difference0.0Includes 0
📌

The Rule

If the CI crosses the null value, the result is NOT statistically significant. There is insufficient evidence to rule out "no difference."

90% CI

Narrower. α = 0.10. Easier to reach significance. Rare in clinical research.

95% CI ⭐

The universal standard. α = 0.05. Always assume 95% unless stated otherwise.

99% CI

Wider. α = 0.01. Used when errors are very costly. Harder to achieve significance.

Visual CI Interpretation — Six Scenarios for RR

The vertical dashed red line represents the null value (RR = 1.0). Each horizontal bar is a 95% CI; the dot is the point estimate (best guess). Scroll right if needed on narrow screens.

Null (RR=1) 0.25 0.5 1.0 2.0 4.0 ① Significant benefit, precise RR 0.55 (0.42–0.72) ✅ ② Significant benefit, imprecise (wide CI) RR 0.45 (0.22–0.94) ✅ ③ Not significant — CI crosses null RR 0.78 (0.58–1.05) ❌ ④ Very uncertain (tiny study, huge CI) RR 0.60 (0.18–2.10) ❌ ⑤ Significant harm — CI above null RR 1.65 (1.25–2.18) 🚨 ⑥ Large benefit, narrow CI — ideal result RR 0.38 (0.28–0.55) ✅✅

Bonus: The Hazard Ratio (HR) — RR for Survival Data

When a study tracks participants over time and measures when events occur (not just whether they occur), the appropriate measure is the Hazard Ratio. It is the ratio of the rate at which events are occurring in the treatment group vs. the control group at any given moment in time.

Hazard Ratio HR = Hazard(Treatment) ÷ Hazard(Control) Interpreted identically to RR: HR < 1 = treatment reduces events

The HR is the standard output of Kaplan-Meier survival analysis and Cox proportional hazards models — the two most common statistical methods in major cardiovascular, oncology, and infectious disease trials.

📈

Example: ARISTOTLE Trial (Apixaban vs. Warfarin)

HR for stroke or systemic embolism = 0.79 (95% CI: 0.66–0.95; p = 0.01). Apixaban users experienced stroke at 79% of the rate of warfarin users. The CI does not cross 1.0, confirming statistical significance.


09
NNT & NNH — The Most Clinically Useful Numbers
If you could only remember two statistics for bedside practice, these would be the ones. NNT and NNH translate abstract percentages into numbers that directly answer the patient's question: "Will this help me?"
Benefit Side

Number Needed to Treat (NNT)

The NNT is the number of patients who must receive a treatment for one additional patient to benefit — compared to the control group. A lower NNT means a more effective treatment.

NNT Formula NNT = 1 ÷ ARR ARR = Absolute Risk Reduction (CER − EER)

NNT is always reported for a specific time period and a specific outcome. An NNT of 20 to prevent stroke over 5 years is meaningless without that context.

Harm Side

Number Needed to Harm (NNH)

The NNH is the number of patients who must be treated for one additional patient to experience harm — compared to the control group. A higher NNH means a safer treatment.

NNH Formula NNH = 1 ÷ ARI ARI = Absolute Risk Increase (EER − CER for adverse events)

Always compare NNT vs. NNH. A drug with NNT = 50 but NNH = 20 causes more harm than benefit — even if statistically significant.

NNT Benchmarks — A Rough Clinical Guide

2–5 Excellent

e.g., antibiotics for bacterial meningitis. Strong, direct benefit.

10–20 Good

e.g., antihypertensives for high-risk patients. Generally justifiable.

50–100 Modest

Requires shared decision-making. Depends on cost, burden, patient preference.

>200 Very High

Rarely justifiable for primary prevention unless intervention is very cheap and safe.

⚖️
NNT vs. NNH — The Benefit-Risk Balance
A new anticoagulant with both efficacy and harm data

A new anticoagulant reduces DVT but increases major bleeding. Data over 6 months:

Efficacy Data (DVT Prevention)

1
DVT rate in treatment group (EER): 3%
2
DVT rate in control group (CER): 8%
3
ARR = 8% − 3% = 5%
4
NNT = 1 ÷ 0.05 = 20

Harm Data (Major Bleeding)

1
Bleeding rate in treatment group (EER): 4%
2
Bleeding rate in control group (CER): 1%
3
ARI = 4% − 1% = 3%
4
NNH = 1 ÷ 0.03 = 33
Clinical Judgment: NNT = 20 to prevent a DVT; NNH = 33 to cause a major bleed. For every 20 patients who benefit, roughly 0.6 patients bleed. The benefit-risk ratio depends on disease severity, baseline risk, and available monitoring. This is precisely the kind of individualized reasoning that EBM supports.

10
Putting It All Together — A Complete Appraisal Framework
Now you have all the tools. This chapter shows you how to read a paper's results section and extract every number you need in a systematic, structured way.

Master Reference: All Key Statistics at a Glance

StatisticWhat It MeasuresNull Favors TreatmentBest Used InWatch Out For
P-valueProbability result is due to chancep < 0.05Any studySays nothing about effect size
95% CIRange of plausible true valuesNull excludedCI entirely on benefit sideAny studyWide CI = imprecise despite significance
ARRAbsolute difference in event rates0> 0RCT, cohortAlways report with NNT
RRR% reduction relative to control rate0%> 0%RCT, cohortCan mislead — always check ARR too
RRRatio of event rates between groups1.0< 1.0RCT, prospective cohortNot valid for case-control designs
ORRatio of odds between groups1.0< 1.0Case-control studiesOverestimates RR when events are common
HRInstantaneous ratio of event rates over time1.0< 1.0Survival analysis / landmark RCTsAssumes proportional hazards throughout
NNTPatients treated per 1 benefitedSmall (<20)Any (derived from ARR)Must specify time period & outcome
NNHPatients treated per 1 harmedLarge (>100)Any (derived from ARI)Compare against NNT for benefit-risk balance

Frequently Confused — Quick Clarifications

Why can't I just use RR for case-control studies?
In a case-control study, the investigator selects a fixed number of cases and controls — the proportions do not reflect true population prevalence. You cannot calculate the true incidence rate in each group, which is what the RR requires. The OR is the only valid measure because it compares odds, which can be calculated regardless of how the sample was assembled.
If OR ≈ RR for rare diseases, why does it matter which one I use?
Methodological purity and transparency matter. Reporting an OR from an RCT as an "approximate RR" introduces potential confusion and inaccuracy. Additionally, the "rare disease" threshold (typically <10%) is not always clear-cut, and erring on the side of precision builds credibility in your analysis.
A trial reports p = 0.06 and I think the treatment is still useful. Am I wrong?
Not necessarily. A p-value of 0.06 means a 6% chance of seeing these results if the null hypothesis were true — only marginally above the 0.05 threshold. If the study was underpowered, the same true effect might produce p = 0.04 in a larger study. Always look at the CI, the ARR, and the clinical context. A borderline p-value with a large, clinically meaningful effect and narrow CI deserves serious consideration.
Can the NNT be negative? What does that mean?
Yes. A negative NNT indicates the "treatment" causes more harm than benefit — the event rate in the treatment group is higher than in the control group. Seeing a negative NNT in an efficacy analysis is a red flag that the treatment may be detrimental.
My CI is 0.95–1.02. The result is "not significant." But it's almost entirely below 1.0 — is there still a benefit?
This is a critical situation. The CI crosses 1.0 (just barely), making the result technically not statistically significant. However, the fact that nearly the entire interval lies below 1.0 strongly suggests a real benefit that the study was underpowered to confirm. The appropriate response is: the study is likely underpowered; a larger trial is needed. Do not interpret this as "no benefit" — it means "insufficient evidence," which is very different.
Why does the OR always exaggerate the RR for common outcomes?
When an event is common (e.g., 40%), the denominator of the odds calculation (non-events) is relatively small compared to events. This makes the odds much larger than the probability, amplifying the ratio between groups. If P = 0.4, then Odds = 0.4/0.6 = 0.67. But if P = 0.05, Odds = 0.05/0.95 ≈ P. At low probabilities, odds and probabilities are nearly identical; at high probabilities, they diverge substantially.
Your Bedside Appraisal Checklist

When You Pick Up a Clinical Paper — Ask These Questions in Order

Step 1 — Validity

Was this an RCT (use RR/HR) or case-control (use OR)? Was randomization concealed? Was blinding adequate? Was follow-up >80%? Was an intention-to-treat analysis used?

Step 2 — Results

What is the p-value? What is the 95% CI — does it cross the null? What is the ARR (not just RRR)? What is the NNT? What is the NNH? Is there a clinically meaningful effect?

Step 3 — Applicability

Does my patient match the study population? Is this drug available locally? Do my patient's values align with the outcomes measured? Is the NNT acceptable given my patient's baseline risk?