PRAISE FOR INTUITIVE BIOSTATISTICS
Intuitive Biostatistics is a beautiful book that has much to teach experimental biologists of all stripes. Unlike other statistics texts I have seen, it includes extensive and carefully crafted discussions of the perils of multiple comparisons, warnings about common and avoidable mistakes in data analysis, a review of the assumptions that apply to various tests, an emphasis on confidence intervals rather than P values, explanations as to why the concept of statistical significance is rarely needed in scientific work, and a clear explanation of nonlinear regression (commonly used in labs; rarely explained in statistics books).
In fact, I am so pleased with Intuitive Biostatistics that I decided to make it the reference of choice for my postdoctoral associates and graduate students, all of whom depend on statistics and most of whom need a closer awareness of precisely why. Motulsky has written thoughtfully, with compelling logic and wit. He teaches by example what one may expect of statistical methods and, perhaps just as important, what one may not expect of them. He is to be congratulated for this work, which will surely be valuable and perhaps even transformative for many of the scientists who read it.
—Bruce Beutler, 2011 Nobel Laureate, Physiology or Medicine Director, Center for the Genetics of Host Defense UT Southwestern Medical Center
GREAT FOR SCIENTISTS
This splendid book meets a major need in public health, medicine, and biomedical research training—a user-friendly biostatistics text for non-mathematicians that clearly explains how to make sense of statistical results, how to avoid common mistakes in data analysis, how to avoid being confused by statistical nonsense, and (new in this edition) how to make research more reproducible. You may enjoy statistics for the first time!
—Gilbert S. Omenn, Professor of Medicine, Genetics, Public Health, and Computational Medicine & Bioinformatics, University of Michigan
I am entranced by the book. Statistics is a topic that is often difficult for many scientists to fully appreciate. The writing style and explanations of Intuitive Biostatistics makes the concepts accessible. I recommend this text to all researchers. Thank you for writing it.
—Tim Bushnell, Director of Shared Resource Laboratories, University of Rochester Medical Center
GREAT FOR STUDENTS
After struggling with books that weren’t right for my class, I was delighted to find Intuitive Biostatistics. It is the best starting point for undergraduate students seeking to learn the fundamental principles of statistics because of its unique presentation of the important concepts behind statistics. Lots of books give you the “recipe” approach, but only Intuitive Biostatistics explains what it all means. It meticulously goes through common mistakes and shows how to correctly choose, perform, and interpret the proper statistical test. It is accessible to new learners without being condescending.
—Beth Dawson,
The University of Texas at Austin
This textbook emphasizes the thinking needed to interpret statistical analysis in published research over knowledge of the mathematical underpinnings. The basics of choosing tests and doing simpler analyses are covered very clearly and simply. The language is easy to understand yet accurate. It brings in the higher level of intuitive understanding that we hope students will have at the end of an honors undergraduate or MSc program, skipping over the mathematical details that are now handled by software anyway. It is the prefect approach and level for undergraduates beginning research.
—Janet E. Kübler, Biology Department, California State University at Northridge
I read many statistics textbooks and have come across very few that actually explain statistical concepts well. Yours is a stand-out exception. In particular, I think you’ve done an outstanding job of helping readers understand P values and confidence intervals, and yours is one of the very first introductory textbooks to discuss the crucial concept of false discovery rates. I have already recommended your text to postgraduate students and postdoctoral researchers at my own institute.
—Rob Herbert Neuroscience Research Australia
GREAT FOR EVERYONE
I’ve read several statistics books but found that some concepts I was interested in were not mentioned and other concepts were hard to understand. You can ignore the “bio” in Intuitive Biostatistics, as it is the best applied statistics books I have come across, period. Its clear, straightforward explanations have allowed me to better understand research papers and select appropriate statistical tests. Highly recommended.
—Ariel H. Collis, Economist, Georgetown Economic Services
preface xxv
part a Introducing Statistics
1. Statistics and Probability are not Intuitive 3
2. The Complexities of Probability 14
3. From Sample to Population 24
part b Introducing Confidence Intervals
4. C onfidence Interval of a Proportion 31
5. C onfidence Interval of Survival Data 46
6. Confidence Interval of Counted Data (Poisson Distribution) 55
part c Continuous Variables
7. Graphing Continuous Data 63
8. Types of Variables 75
9. Quantifying Scatter 80
10. The Gaussian Distribution 89
11. The Lognormal Distribution and Geometric Mean 95
12. Confidence Interval of a Mean 101
13. The Theory of Confidence Intervals 110
14. Error Bars 118
part d P Values and Statistical Significance
15. Introducing P Values 129
16. Statistical Significance and Hypothesis Testing 145
17. C omparing Groups with Confidence Intervals and P Values 157
18. Interpreting a Result that is Statistically Significant 165
19. Interpreting a Result that is Not Statistically Significant 179
20. Statistical Power 186
21. Testing for Equivalence or Noninferiority 193
part e Challenges in Statistics
22. Multiple Comparisons Concepts 203
23. The Ubiquity of Multiple Comparisons 214
24. Normality Tests 224
25. Outliers 232
26. Choosing a Sample Size 239
part f Statistical Tests
27. Comparing Proportions 263
28. Case-Control Studies 273
29. Comparing Survival Curves 284
30. Comparing Two Means: Unpaired t Test 294
31. Comparing Two Paired Groups 306
32. Correlation 318
part g Fitting Models to Data
33. Simple Linear Regression 331
34. Introducing Models 350
35. Comparing Models 357
36. Nonlinear Regression 366
37. Multiple Regression 378
38. Logistic and Proportional Hazards Regression 395
part h The Rest of Statistics
39. Analysis of Variance 407
40. Multiple Comparison Tests After Anova 418
41. Nonparametric Methods 431
42. S ensitivity, Specificity, and Receiver Operating Characteristic Curves 442
43. Meta-Analysis 452
part i Putting It All Together
44. The Key Concepts of Statistics 463
45. Statistical Traps To Avoid 468
46. Capstone Example 487
47. Statistics and Reproducibility 502
48. Checklists for Reporting Statistical Methods and Results 511
part j appendices 517
references 533 index 548
CONTENTS
part a
preface xxv
Who is this Book for?
What makes the Book Unique? What’s New?
Which Chapters are Essential? Who Helped? Who Am I?
Introducing Statistics
1. Statistics and Probability are not Intuitive 3
We Tend to Jump to Conclusions
We Tend to Be Overconfident
We See Patterns in Random Data
We Don’t Realize that Coincidences are Common
We Don’t Expect Variability to Depend on Sample Size
We Have Incorrect Intuitive Feelings about Probability
We Find It Hard to Combine Probabilities
We Don’t Do Bayesian Calculations Intuitively
We are Fooled by Multiple Comparisons
We Tend to Ignore Alternative Explanations
We are Fooled By Regression to the Mean
We Let Our Biases Determine How We Interpret Data
We Crave Certainty, but Statistics Offers Probabilities
Chapter Summary
Term Introduced in this Chapter
2. The Complexities of Probability 14
Basics of Probability
Probability as Long-Term Frequency
Probability as Strength of Belief
Calculations With Probabilities Can be Easier If You Switch to Calculating with Whole Numbers
Common Mistakes: Probability
Lingo
Probability in Statistics
Q & A
Chapter Summary
Terms Introduced in this Chapter
3. From Sample to Population 24
Sampling from a Population Sampling Error and Bias Models and Parameters
Multiple Levels of Sampling
What if Your Sample is the Entire Population?
Chapter Summary
Terms Introduced in this Chapter
part b
Introducing Confidence Intervals
4. Confidence Interval of a Proportion 31
Data Expressed as Proportions
The Binomial Distribution: From Population to Sample Example: Free Throws in Basketball Example: Polling Voters
Assumptions: Confidence Interval of a Proportion What Does 95% Confidence Really Mean? Are You Quantifying the Event You Care About?
Lingo
Calculating the CI of a Proportion Ambiguity if the Proportion is 0% or 100% An Alternative Approach: Bayesian Credible Intervals
Common Mistakes: CI of a Proportion
Q & A
Chapter Summary
Terms Introduced in this Chapter
5. Confidence Interval of Survival Data 46
Survival Data
Censored Survival Data
Calculating Percentage Survival at Various Times Graphing Survival Curves with Confidence Bands
Summarizing Survival Curves
Assumptions: Survival Analysis
Q & A
Chapter Summary
Terms Introduced in this Chapter
6. C onfidence Interval of Counted
Data (Poisson Distribution) 55
The Poisson Distribution
Assumptions: Poisson Distribution
Confidence Intervals Based on Poisson Distributions
How to Calculate the Poisson CI
The Advantage of Counting for Longer Time Intervals (Or in Larger Volumes)
Q & A
Chapter Summary
Term Introduced in this Chapter
7. Graphing Continuous Data 63
Continuous Data
The Mean and Median
Lingo: Terms Used to Explain Variability Percentiles
Graphing Data to Show Variation
Graphing Distributions
Beware of Data Massage
Q & A
Chapter Summary
Terms Introduced in this Chapter
8. Types of Variables 75
Continuous Variables
Discrete Variables
Why It Matters?
Not Quite as Distinct as They Seem
Q & A
Chapter Summary
Terms Introduced in this Chapter
9. Quantifying Scatter 80
Interpreting a Standard Deviation
How it Works: Calculating SD
Why n – 1?
Situations in Which n Can Seem Ambiguous
SD and Sample Size
Other Ways to Quantify and Display Variability
Q & A
Chapter Summary
Terms Introduced in this Chapter
10. The Gaussian Distribution 89
The Nature of The Gaussian Distribution
SD and the Gaussian Distribution
The Standard Normal Distribution
The Normal Distribution does not Define Normal Limits
Why The Gaussian Distribution is so Central to Statistical Theory
Q & A
Chapter Summary
Terms Introduced in this Chapter
11. The Lognormal Distribution and Geometric Mean 95
The Origin of a Lognormal Distribution
Logarithms?
Geometric Mean
Geometric SD
Common Mistakes: Lognormal Distributions
Q & A
Chapter Summary
Terms Introduced in this Chapter
12. Confidence Interval of a Mean 101
Interpreting A CI of a Mean
What Values Determine the CI of a Mean?
Assumptions: CI of a Mean
How to Calculate the CI of a Mean
More about Confidence Intervals
Q & A
Chapter Summary
Terms Introduced in this Chapter
13. The Theory of Confidence Intervals 110
CI of a Mean Via the t Distribution
CI of a Mean Via Resampling
CI of a Proportion Via Resampling
CI of a Proportion Via Binomial Distribution
Q & A
Chapter Summary
Terms Introduced in this Chapter
14. Error Bars 118
SD Versus Sem
Which Kind of Error Bar Should I Plot?
The Appearance of Error Bars
How are SD and Sem Related to Sample Size?
Geometric SD Error Bars
Common Mistakes: Error Bars
Q & A
Chapter Summary
Terms Introduced in this Chapter
part d P Values and Statistical Significance
15. Introducing P Values 129
Introducing P Values
Example 1: Coin Flipping
Example 2: Antibiotics on Surgical Wounds
Example 3: Angioplasty and Myocardial Infarction
Lingo: Null Hypothesis
Why P Values are Confusing One- Or Two-Tailed P Value?
P Values are Not Very Reproducible
There is much more to Statistics than P Values
Common Mistakes: P Values
Q & A
Chapter Summary
Terms Introduced in this Chapter
16. Statistical Significance and Hypothesis Testing 145
Statistical Hypothesis Testing
Analogy: Innocent Until Proven Guilty
Extremely Significant? Borderline Significant?
Lingo: Type I and Type II Errors
Tradeoffs When Choosing a Significance Level
What Significance Level Should You Choose?
Interpreting A CI, A P Value, and A Hypothesis Test
Statistical Significance vs. Scientific Significance
Common Mistakes: Statistical Hypothesis Testing
Q & A
Chapter Summary
Terms Defined in this Chapter
17. C omparing Groups with Confidence Intervals and P Values 157
CIS and Statistical Hypothesis Testing are Closely Related
Four Examples with CIS, P Values, and Conclusion about Statistical Significance
Q & A
Chapter Summary
18. Interpreting a Result that is Statistically Significant 165
Seven Explanations for Results that are “Statistically Significant”
How Frequently do Type I Errors (False Positives) Occur?
The Prior Probability Influences the FPRP (A Bit of Bayes)
Bayesian Analysis
Accounting for Prior Probability Informally
The Relationship Between Sample Size and P Values
Common Mistakes
Q & A
Chapter Summary
Terms Introduced in this Chapter
19. Interpreting a Result that is not Statistically Significant 179
Five Explanations For “Not Statistically Significant” Results
“Not Significantly Different” does not Mean “No Difference”
Example: α2-Adrenergic Receptors on Platelets
Example: Fetal Ultrasounds
How to Get Narrower CIS
What if the P Value is Really High?
Q & A
Chapter Summary
20. Statistical Power 186
What is Statistical Power?
Distinguishing Power From Beta and the False Discovery Rate
An Analogy to Understand Statistical Power Power of the Two Example Studies
When does It Make Sense to Compute Power?
Common Mistakes: Power
Q & A
Chapter Summary
Terms Introduced in this Chapter
21. Testing For Equivalence or Noninferiority 193
Equivalence must be Defined Scientifically, not Statistically If the Mean is Within the Equivalence Zone
If the Mean is Outside of the Equivalence Zone
Applying the Usual Approach of Statistical Hypothesis
Testing to Testing for Equivalence
Noninferiority Tests
Common Mistakes: Testing for Equivalence
Q & A
Chapter Summary
Terms Introduced in this Chpater
part e Challenges in Statistics
22. Multiple Comparisons Concepts 203
The Problem of Multiple Comparisons
Correcting for Multiple Comparisons is not Always Needed
The Traditional Approach to Correcting for Multiple Comparisons
Correcting for Multiple Comparisons with the False Discovery Rate
Comparing the Two Methods of Correcting for Multiple Comparisons
Q & A
Chapter Summary
Terms Introduced in this Chapter
23. The Ubiquity of Multiple Comparisons 214
Overview
Multiple Comparisons in Many Contexts
When are Multiple Comparisons Data Torture or P-Hacking?
How to Cope with Multiple Comparisons
Q & A
Chapter Summary
Terms Introduced in this Chapter
24. Normality Tests 224
The Gaussian Distribution is an Unreachable Ideal What A Gaussian Distribution Really Looks Like
QQ Plots
Testing for Normality
Alternatives to Assuming a Gaussian Distribution
Common Mistakes: Normality Tests
Q & A
Chapter Summary
Terms Introduced in this Chapter
25. Outliers 232
How do Outliers Arise?
The Need for Outlier Tests
Five Questions to Ask before Testing for Outliers
Outlier Tests
Is It Legitimate to Remove Outliers?
An Alternative: Robust Statistical Tests
Lingo: Outlier
Common Mistakes: Outlier Tests
Q & A
Chapter Summary
Terms Introduced in this Chapter
26. Choosing a Sample Size 239
Sample Size Principles
An Alternative Way to think about Sample Size Calculations
Interpreting a Sample Size Statement
Lingo: Power
Calculating the Predicted FPRP as Part of Interpreting a Sample Size Statement Complexities when Computing Sample Size
Examples
Other Approaches to Choosing Sample Size
Common Mistakes: Sample Size
Q & A
Chapter Summary
Terms Introduced in this Chapter part f Statistical Tests
27. Comparing Proportions 263
Example: Apixaban for Treatment of Thromboembolism
Assumptions
Comparing Observed and Expected Proportions
Common Mistakes: Comparing Proportions
Q & A
Chapter Summary
Terms Introduced in this Chapter
28. Case-Control Studies 273
Example: Does a Cholera Vaccine Work?
Example: Isotretinoin and Bowel Disease
Example: Genome-Wide Association Studies
How are Controls Defined?
How are Cases Defined?
Epidemiology Lingo
Common Mistakes: Case-Control Studies
Q & A
Chapter Summary
Terms Introduced in this Chapter
29. Comparing Survival Curves 284
Example Survival Data
Assumptions when Comparing Survival Curves
Comparing Two Survival Curves
Why Not Just Compare Mean or Median Survival Times or Five-Year Survival?
Intention to Treat
Q & A
Chapter Summary
Terms Introduced in this Chapter
30. Comparing Two Means: Unpaired t Test 294
Interpreting Results from an Unpaired t Test
Assumptions: Unpaired t Test
The Assumption of Equal Variances
Overlapping Error Bars and the t Test
How It Works: Unpaired t Test
Common Mistakes: Unpaired t Test
Q & A
Chapter Summary
Terms Introduced in this Chapter
31. Comparing Two Paired Groups 306
When to Use Special Tests for Paired Data
Example of Paired t Test
Interpreting Results from a Paired t Test
The Ratio Paired t Test
McNemar’s Test for a Paired Case-Control Study
Common Mistakes: Paired t Test
Q & A
Chapter Summary
Terms Introduced in this Chapter
32. Correlation 318
Introducing the Correlation Coefficient
Assumptions: Correlation
Lingo: Correlation
How It Works: Calculating the Correlation Coefficient
Common Mistakes: Correlation
Q & A
Chapter Summary
Terms Introduced in this Chapter
part g Fitting Models to Data
33. Simple Linear Regression 331
The Goals of Linear Regression
Linear Regression Results
Assumptions: Linear Regression
Comparison of Linear Regression and Correlation
Lingo: Linear Regression
Common Mistakes: Linear Regression
Q & A
Chapter Summary
Terms Introduced in this Chapter
34. Introducing Models 350
Lingo: Models, Parameters, and Variables
The Simplest Model
The Linear Regression Model
Why Least Squares?
Other Models and other Kinds of Regression
Common Mistakes: Models
Chapter Summary
Terms Introduced in this Chapter
35. Comparing Models 357
Comparing Models is a Major Part of Statistics
Linear Regression as a Comparison of Models
Unpaired t Test Recast as Comparing the Fit of Two Models
Common Mistakes: Comparing Models
Q & A
Chapter Summary
Terms Introduced in this Chapter
36. Nonlinear Regression 366
Introducing Nonlinear Regression
An Example of Nonlinear Regression
Nonlinear Regression Results
How Nonlinear Regression Works
Assumptions: Nonlinear Regression
Comparing Two Models
Tips for Understanding Models
Learn More About Nonlinear Regression
Common Mistakes: Nonlinear Regression
Q & A
Chapter Summary
Terms Introduced in this Chapter
37. Multiple Regression 378
Goals of Multivariable Regression
Lingo
An Example of Multiple Linear Regression Assumptions
Automatic Variable Selection
Sample Size for Multiple Regression
More Advanced Issues with Multiple Regression
Common Mistakes: Multiple Regression
Q & A
Chapter Summary
Terms Introduced in this Chapter
38. L ogistic and Proportional Hazards
Regression 395
Logistic Regression
Proportional Hazards Regression
Assumptions: Logistic Regression
Common Mistakes: Logistic Regression
Q & A
Chapter Summary
Terms Introduced in this Chapter part
The Rest of Statistics
39. Analysis of Variance 407
Comparing the Means of Three or More Groups
Assumptions: One-Way Anova
How It Works: One-Way Anova
Repeated-Measures One Way Anova
An Example of Two-Way Anova
How Two-Way Anova Works
Repeated Measures Two-way Anova
Common Mistakes: Anova
Q & A
Chapter Summary
Terms Introduced in this Chapter
40. Multiple Comparison Tests after Anova 418
Multiple Comparison Tests for the Example Data
The Logic Of Multiple Comparisons Tests
Other Multiple Comparisons Tests
How It Works: Multiple Comparisons Tests
When Are Multiple Comparisons Tests Not Needed?
Common Mistakes: Multiple Comparisons
Q & A
Chapter Summary
Terms Introduced in this Chapter
41 Nonparametric Methods 431
Nonparametric Tests Based on Ranks
The Advantages and Disadvantages of Nonparametric Tests
Choosing Between Parametric and Nonparametric Tests: Does It Matter?
Sample Size for Nonparametric Tests
Nonparametric Tests that Analyze Values (Not Ranks)
Common Mistakes: Nonparametric Tests
Q & A
Chapter Summary
Terms Introduced in this Chapter
42 S ensitivity, Specificity, and Receiver Operating
Characteristic Curves 442
Definitions of Sensitivity and Specificity
The Predictive Value of a Test
Receiver-Operating Characteristic (ROC) Curves
Bayes Revisited
Common Mistakes
Q & A
Chapter Summary
Terms Introduced in this Chapter
43. Meta-Analysis 452
Introducing Meta-Analysis
Publication Bias
Results from a Meta-Analysis
Meta-Analysis of Individual Participant Data
Assumptions of Meta-Analysis
Common Mistakes: Meta-Analysis
Q & A
Chapter Summary
Terms Introduced in this Chapter
44. The Key Concepts of Statistics 463
Term Introduced in this Chapter
45. Statistical Traps to Avoid 468
Trap #1: Focusing on P Values and Statistical Significance Rather than Effect Size
Trap #2: Testing Hypotheses Suggested by the Data
Trap #3: Analyzing Without a Plan—“P-Hacking”
Trap #4: Making a Conclusion about Causation When the Data Only Show Correlation
Trap #5: Overinterpreting Studies that Measure a Proxy or Surrogate Outcome
Trap #6: Overinterpreting Data from an Observational Study
Trap #7: Making Conclusions about Individuals when the Data Were only Collected for Groups
Trap #8: Focusing Only on Means Without asking about Variability or Unusual Values
Trap #9: Comparing Statistically Significant with Not Statistically Significant
Trap #10: Missing Important Findings Because Data Combine Populations
Trap #11: Invalid Multiple Regression Analyses as a Result of an Omitted Variable
Trap #12: Overfitting
Trap #13: Mixing Up the Significance Level with the FPRP
Trap #14: Not Recognizing How Common False Positive Findings are
Trap #15: Not Realizing How Likely it is that a “Significant” Conclusion From a Speculative Experiment is a False Positive
Trap #16: Not Realizing That Many Published Studies have Little Statistical Power
Trap #17: Trying to Detect Small Signals When there is Lots of Noise
Trap #18: Unnecessary Dichotomizing
Trap #19: Inflating Sample Size by Pseudoreplication
Chapter Summary
Terms Introduced in this Chapter
46. Capstone Example 487
The Case of the Eight Naked IC50 S Look Behind the Data
Statistical Significance by Cheating
Using a t Test That Doesn’t Assume Equal SDs
Unpaired t Test as Linear or Nonlinear Regression
Nonparametric Mann–Whitney Test
Just Report the Last Confirmatory Experiment?
Increase Sample Size?
Comparing the Logarithms of IC50 Values
Sample Size Calculations Revisited Is it Ok to Switch Analysis Methods?
The Usefulness of Simulations
Chapter Summary
47. Statistics and Reproducibility 502
The Repoducibility Crisis
Many Analyses are Biased to Inflate the Effect Size
Even Perfectly Performed Experiments are Less Reproducible than Most Expect
Summary
48. C hecklists for Reporting Statistical Methods and Results 511
Reporting Methods Used for Data Analysis
Graphing Data
Reporting Statistical Results
part j appendices 517
Appendix A: Statistics with Graphpad
Appendix B: Statistics with Excel
Appendix C: Statistics with R
Appendix D: Values of the t Distribution Needed to Compute CIs
Appendix E: A Review of Logarithms
Appendix F: Choosing a Statistical Test
Appendix G: Problems and Answers references 533 index 548
PREFACE
My approach in this book is informal and brisk (at least I hope it is), not ceremonious and plodding (at least I hope it isn’t).
J OHN A LLEN P AULOS (2008)
Intuitive Biostatistics provides a comprehensive overview of statistics without getting bogged down in the mathematical details. I’ve been gratified to learn that many people have found my approach refreshing and useful. Some scientists have told me that statistics had always been baffling until they read Intuitive Biostatistics. This enthusiasm encouraged me to write this fourth edition.
WHO IS THIS BOOK FOR?
I wrote Intuitive Biostatistics for three main audiences:
• Medical (and other) professionals who want to understand the statistical portions of journals they read. These readers don’t need to analyze any data, but they do need to understand analyses published by others and beware of common statistical mistakes. I’ve tried to explain the big picture without getting bogged down in too many details.
• Undergraduate and graduate students, postdocs, and researchers who analyze data. This book explains general principles of data analysis, but it won’t teach you how to do statistical calculations or how to use any particular statistical program. It makes a great companion to the more traditional statistics texts and to the documentation of statistical software.
• Scientists who consult with statisticians. Statistics often seems like a foreign language, and this text can serve as a phrase book to bridge the gap between scientists and statisticians. Sprinkled throughout the book are “Lingo” sections that explain statistical terminology and point out when ordinary words are given very specialized meanings (the source of much confusion).
I wrote Intuitive Biostatistics to be a guidebook, not a cookbook. The focus is on how to interpret statistical results, rather than how to analyze data. This book presents few details of statistical methods and only a few tables required to complete the calculations.