# Statistics for Health Professions

MA3010 – Statistics for Health Professions

Discussion 08.1: Student t-Test

Statistics for Health Professions
Researchers conducted a study to determine whether magnets are effective in treating back pain. The results are shown in the table for the treatment​ (with magnets) group and the sham​ (or placebo) group. The results are a measure of reduction in back pain. Assume that the two samples are independent simple random samples selected from normally distributed​ populations, and do not assume that the population standard deviations are equal.

1.Identify the worksheet (tab) that matches the first letter of your LAST name (i.e., if your last name were “Fudd” you would use the data from the “F” tab). This will be the source data you will use to answer your remaining questions for this initial post.

2.What is the null hypothesis and alternative hypothesis?

3.Identify the test statistic and p-value from your source data.

4.Would you reject or fail to reject your null hypothesis? Explain how you came to this conclusion (i.e. either use the test-statistic or p-value to support your claim).

5.Would there be sufficient evidence (based upon your source data) to support the claim that those treated with magnets have a greater mean reduction in pain than those given a sham treatment? Explain.

BIOSTATISTICS FOR THE BIOLOGICAL AND HEALTH SCIENCES

MARC M. TRIOLA, MD, FACP New York University School of Medicine

MARIO F. TRIOLA Dutchess Community College

JASON ROY, PHD University of Pennsylvania

Perelman School of Medicine

SECOND EDITION

To Ginny Dushana and Marisa Trevor and Mitchell

Director, Portfolio Management Deirdre Lynch Senior Portfolio Manager Suzy Bainbridge Portfolio Management Assistant Justin Billing Content Producer Peggy McMahon Managing Producer Karen Wernholm Courseware QA Manager Mary Durnwald Senior Producer Vicki Dreyfus Product Marketing Manager Yvonne Vannatta Field Marketing Manager Evan St. Cyr

Product Marketing Assistant Jennifer Myers Field Marketing Assistant Erin Rush Senior Author Support/Technology Specialist Joe Vetere Manager, Rights and Permissions Gina M. Cheselka Text and Cover Design, Illustrations, Production Coordination,

Composition Cenveo Publisher Services Cover Image Robert Essel NYC/Getty Images

Names: Triola, Marc M. | Triola, Mario F. | Roy, Jason (Jason Allen) Title: Biostatistics for the biological and health sciences. Description: Second edition / Marc M. Triola, New York University,

Identifiers: LCCN 2016016759| ISBN 9780134039015 (hardcover) | ISBN 0134039017 (hardcover) Subjects: LCSH: Biometry. | Medical statistics.

Marc Triola, MD, FACP is the Associate Dean for Educational Informatics at NYU School of Medicine, the founding director of the NYU Langone Medical Center Institute for Innovations in Medical Education (IIME), and an Associate Professor of Medicine. Dr. Triola’s research experience and expertise focus on the disruptive effects of the present revolution in educa- tion, driven by technological advances, big data, and learn- ing analytics. Dr. Triola has worked to create a “learning

ecosystem” that includes interconnected computer-based e-learning tools and new ways to effectively integrate growing amounts of electronic data in educational re- search. Dr. Triola and IIME have been funded by the National Institutes of Health, the Integrated Advanced Information Management Systems program, the National Science Foundation Advanced Learning Technologies program, the Josiah Macy, Jr. Foundation, the U.S. Department of Education, and the American Medical As- sociation Accelerating Change in Medical Education program. He chairs numer- ous committees at the state and national levels focused on the future of health professions educational technology development and research.

Mario F. Triola is a Professor Emeritus of Mathematics at Dutchess Community College, where he has taught statistics for over 30 years. Marty is the author of Elementary Statistics, 13th edition, Essentials of Sta- tistics, 5th edition, Elementary Statistics Using Excel, 6th edi- tion, and Elementary Statis- tics Using the TI-83>84 Plus Calculator, 4th edition, and he is a co-author of Statistical Reasoning for Everyday Life, 5th edition. Elementary Statis- tics is currently available as an

International Edition, and it has been translated into several foreign languages. Marty designed the original Statdisk statistical software, and he has written several manuals and workbooks for technology supporting statistics education.

He has been a speaker at many conferences and colleges. Marty’s consulting work includes the design of casino slot machines and the design of fishing rods. He has worked with attorneys in determining probabilities in paternity lawsuits, analyz- ing data in medical malpractice lawsuits, identifying salary inequities based on gender, and analyzing disputed election results. He has also used statistical meth- ods in analyzing medical school surveys and in analyzing survey results for the New York City Transit Authority. Marty has testified as an expert witness in the New York State Supreme Court.

Jason Roy, PhD, is Associate Professor of Biostatistics in the Department of Biostatistics and Epidemiology, Perelman School of Medicine, Univer- sity of Pennsylvania. He re- ceived his PhD in Biostatistics in 2000 from the University of Michigan. He was recipi- ent of the 2002 David P. Byar Young Investigator Award from the American Statistical Asso- ciation Biometrics Section. His statistical research interests are in the areas of causal inference, missing data, and prediction

modeling. He is especially interested in the statistical challenges with analyzing data from large health care databases. He collaborates in many different disease areas, including chronic kidney disease, cardiovascular disease, and liver diseases. Dr Roy is Associate Editor of Biometrics, Journal of the American Statistical Association, and Pharmacoepidemiology & Drug Safety, and has over 90 peer- reviewed publications.

CONTENTS

1 INTRODUCTION TO STATISTICS 1 1-1 Statistical and Critical Thinking 4 1-2 Types of Data 13 1-3 Collecting Sample Data 24

2 EXPLORING DATA WITH TABLES AND GRAPHS 40 2-1 Frequency Distributions for Organizing and Summarizing Data 42 2-2 Histograms 51 2-3 Graphs That Enlighten and Graphs That Deceive 56 2-4 Scatterplots, Correlation, and Regression 65

3 DESCRIBING, EXPLORING, AND COMPARING DATA 75 3-1 Measures of Center 77 3-2 Measures of Variation 89 3-3 Measures of Relative Standing and Boxplots 102

4 PROBABILITY 118 4-1 Basic Concepts of Probability 120 4-2 Addition Rule and Multiplication Rule 131 4-3 Complements, Conditional Probability, and Bayes’ Theorem 144 4-4 Risks and Odds 153 4-5 Rates of Mortality, Fertility, and Morbidity 162 4-6 Counting 167

5 DISCRETE PROBABILITY DISTRIBUTIONS 180 5-1 Probability Distributions 182 5-2 Binomial Probability Distributions 193 5-3 Poisson Probability Distributions 206

6 NORMAL PROBABILITY DISTRIBUTIONS 216 6-1 The Standard Normal Distribution 218 6-2 Real Applications of Normal Distributions 231 6-3 Sampling Distributions and Estimators 241 6-4 The Central Limit Theorem 252 6-5 Assessing Normality 261 6-6 Normal as Approximation to Binomial 269

7 ESTIMATING PARAMETERS AND DETERMINING SAMPLE SIZES 282 7-1 Estimating a Population Proportion 284 7-2 Estimating a Population Mean 299 7-3 Estimating a Population Standard Deviation or Variance 315 7-4 Bootstrapping: Using Technology for Estimates 324

8 HYPOTHESIS TESTING 336 8-1 Basics of Hypothesis Testing 338 8-2 Testing a Claim About a Proportion 354 8-3 Testing a Claim About a Mean 366 8-4 Testing a Claim About a Standard Deviation or Variance 377

9 INFERENCES FROM TWO SAMPLES 392 9-1 Two Proportions 394 9-2 Two Means: Independent Samples 406 9-3 Two Dependent Samples (Matched Pairs) 418 9-4 Two Variances or Standard Deviations 428

10 CORRELATION AND REGRESSION 442 10-1 Correlation 444 10-2 Regression 462 10-3 Prediction Intervals and Variation 474 10-4 Multiple Regression 481 10-5 Dummy Variables and Logistic Regression 489

11 GOODNESS-OF-FIT AND CONTINGENCY TABLES 502 11-1 Goodness-of-Fit 503 11-2 Contingency Tables 514

12 ANALYSIS OF VARIANCE 531 12-1 One-Way ANOVA 533 12-2 Two-Way ANOVA 547

13 NONPARAMETRIC TESTS 560 13-1 Basics of Nonparametric Tests 562 13-2 Sign Test 564 13-3 Wilcoxon Signed-Ranks Test for Matched Pairs 575 13-4 Wilcoxon Rank-Sum Test for Two Independent Samples 581 13-5 Kruskal-Wallis Test for Three or More Samples 586 13-6 Rank Correlation 592

14 SURVIVAL ANALYSIS 603 14-1 Life Tables 604 14-2 Kaplan-Meier Survival Analysis 614

APPENDIX A TABLES 625 APPENDIX B DATA SETS 638 APPENDIX C WEBSITES AND BIBLIOGRAPHY OF BOOKS 645 APPENDIX D ANSWERS TO ODD-NUMBERED SECTION EXERCISES 646

(and all Quick Quizzes, all Review Exercises, and all Cumulative Review Exercises)

Credits 683 Index 685

PREFACE

Statistics permeates nearly every aspect of our lives, and its role has become partic- ularly important in the biological, life, medical, and health sciences. From opinion polls to clinical trials in medicine and analysis of big data from health applications, statistics inf luences and shapes the world around us. Biostatistics for the Health and Biological Sciences forges the relationship between statistics and our world through extensive use of a wide variety of real applications that bring life to theory and methods.

Goals of This Second Edition ■ Incorporate the latest and best methods used by professional statisticians.

■ Include features that address all of the recommendations included in the Guide- lines for Assessment and Instruction in Statistics Education (GAISE) as recom- mended by the American Statistical Association.

■ Provide an abundance of new and interesting data sets, examples, and exercises.

■ Foster personal growth of students through critical thinking, use of technology, collaborative work, and development of communication skills.

■ Enhance teaching and learning with the most extensive and best set of supple- ments and digital resources.

Audience ,Prerequisites Biostatistics for the Health and Biological Sciences is written for students major- ing in the biological and health sciences, and it is designed for a wide variety of students taking their first statistics course. Algebra is used minimally, and calculus is not required. It is recommended that students have completed at least an elemen- tary algebra course or that students should learn the relevant algebra components through an integrated or co-requisite course. In many cases, underlying theory is included, but this book does not require the mathematical rigor more appropriate for mathematics majors.

Hallmark Features Great care has been taken to ensure that each chapter of Biostatistics for the Health and Biological Sciences will help students understand the concepts presented. The following features are designed to help meet that objective.

Real Data

Hundreds of hours have been devoted to finding data that are real, meaningful, and interesting to students. Fully 87% of the examples are based on real data, and 89% of the exercises are based on real data. Some exercises refer to the 18 data sets listed in Appendix B, and 12 of those data sets are new to this edition. Exercises requiring use of the Appendix B data sets are located toward the end of each exercise set and are marked with a special data set icon .

Real data sets are included throughout the book to provide relevant and interesting real-world statistical applications, including biometric security, body measurements, brain sizes and IQ scores, and data from births. Appendix B includes descriptions of

the 18 data sets that can be downloaded from the companion website www.pearson- highered.com/triola, the author maintained www.TriolaStats.com and MyStatLab.

TriolaStats.com includes downloadable data sets in formats for technologies including Excel, Minitab, JMP, SPSS, and TI@83>84 Plus calculators. The data sets are also included in the free Statdisk software, which is also available on the website.

Great care, enthusiasm, and passion have been devoted to creating a book that is readable, understandable, interesting, and relevant. Students pursuing any major in the biological, life, medical, or health fields are sure to find applications related to their future work.

Website

This textbook is supported by www.TriolaStats.com, and www.pearsonhighered.com/ triola which are continually updated to provide the latest digital resources, including:

■ Statdisk: A free, robust statistical software package designed for this book.

■ Downloadable Appendix B data sets in a variety of technology formats.

■ Downloadable textbook supplements including Glossary of Statistical Terms and Formulas and Tables.

■ Online instructional videos created specifically for this book that provide step- by-step technology instructions.

■ Triola Blog, which highlights current applications of statistics, statistics in the news, and online resources.

Chapter Features

Chapter Opening Features

■ Chapters begin with a Chapter Problem that uses real data and motivates the chapter material.

■ Chapter Objectives provide a summary of key learning goals for each section in the chapter.

Exercises

Many exercises require the interpretation of results. Great care has been taken to ensure their usefulness, relevance, and accuracy. Exercises are arranged in order of increasing difficulty, and they begin with Basic Skills and Concepts. Most sections include additional Beyond the Basics exercises that address more difficult concepts or require a stronger mathematical background. In a few cases, these exercises introduce a new concept.

End-of-Chapter Features

■ Chapter Quick Quiz provides review questions that require brief answers.

■ Review Exercises offer practice on the chapter concepts and procedures.

■ Cumulative Review Exercises reinforce earlier material.

■ Technology Project provides an activity that can be used with a variety of technologies.

■ From Data to Decision is a capstone problem that requires critical thinking and writing.

■ Cooperative Group Activities encourage active learning in groups.

Other Features

Margin Essays There are 57 margin essays designed to highlight real-world topics and foster student interest.

Flowcharts The text includes flowcharts that simplify and clarify more complex con- cepts and procedures. Animated versions of the text’s flowcharts are available within MyStatLab and MathXL.

Quick-Reference Endpapers Tables A-2 and A-3 (the normal and t distributions) are reproduced on the rear inside cover pages.

Detachable Formula and Table Card This insert, organized by chapter, gives students a quick reference for studying, or for use when taking tests (if allowed by the instruc- tor). It also includes the most commonly used tables. This is also available for download at www.TriolaStats.com, www.pearsonhighered.com/triola and in MyStatLab.

Technology Integration

As in the preceding edition, there are many displays of screens from technology through- out the book, and some exercises are based on displayed results from technology. Where appropriate, sections include a reference to an online Tech Center subsection that in- cludes detailed instructions for Statdisk, Minitab®, Excel®, StatCrunch, or a TI@83>84 Plus® calculator. (Throughout this text, “TI-83>84 Plus” is used to identify a TI-83 Plus or TI-84 Plus calculator). The end-of-chapter features include a Technology Project.

The Statdisk statistical software package is designed specifically for this textbook and contains all Appendix B data sets. Statdisk is free to users of this book, and it can be downloaded at www.statdisk.org.

Changes in This Edition New Features

Chapter Objectives provide a summary of key learning goals for each section in the chapter.

Larger Data Sets: Some of the data sets in Appendix B are much larger than in the previous edition. It is no longer practical to print all of the Appendix B data sets in this book, so the data sets are described in Appendix B, and they can be downloaded at www.TriolaStats.com, www.pearsonhighered.com/triola, and MyStatLab.

New Content: New examples, new exercises, and Chapter Problems provide relevant and interesting real-world statistical applications, including biometric security, drug testing, gender selection, and analyzing ultrasound images.

Number New to This Edition Use Real Data

Exercises 1600 85% 89%

Examples 200 84% 87%

Major Organization Changes

All Chapters

■ New Chapter Objectives: All chapters now begin with a list of key learning goals for that chapter. Chapter Objectives replaces the former Overview numbered sec- tions. The first numbered section of each chapter now covers a major topic.

Chapter 1

■ New Section 1-1: Statistical and Critical Thinking

■ New Subsection 1-3, Part 2: Big Data and Missing Data: Too Much and Not Enough

x Preface

Chapters 2 and 3

■ Chapter Partitioned: Chapter 2 (Describing, Exploring, and Comparing Data) from the first edition has been partitioned into Chapter 2 (Summarizing and Graph- ing) and Chapter 3 (Statistics for Describing, Exploring, and Comparing Data).

■ New Section 2-4: Scatterplots, Correlation, and Regression This new section includes scatterplots in Part 1, the linear correlation coefficient r in Part 2, and linear regression in Part 3. These additions are intended to greatly facilitate cover- age for those professors who prefer some early coverage of correlation and regres- sion concepts. Chapter 10 includes these topics discussed with much greater detail.

Chapter 4

■ Combined Sections: Section 3-3 (Addition Rule) and Section 3-4 (Multiplication Rule) from the first edition are now combined into one section: 4-2 (Addition Rule and Multiplication Rule).

■ New Subsection 4-3, Part 3: Bayes’ Theorem

Chapter 5

■ Combined Sections: Section 4-3 (Binomial Probability Distributions) and Section 4-4 (Mean, Variance, and Standard Deviation for the Binomial Distribu- tion) from the first edition are now combined into one section: 5-2 (Binomial Probability Distributions).

Chapter 6

■ Switched Sections: Section 6-5 (Assessing Normality) now precedes Section 6-6 (Normal as Approximation to Binomial).

Chapter 7

■ Combined Sections: Sections 6-4 (Estimating a Population Mean: s Known) and 6-5 (Estimating a Population Mean: s Not Known) from the first edition have been combined into one section: 7-2 (Estimating a Population Mean). The coverage of the s known case has been substantially reduced and it is now lim- ited to Part 2 of Section 7-2.

■ New Section 7-4: Bootstrapping: Using Technology for Estimates

Chapter 8

■ Combined Sections: Sections 7-4 (Testing a Claim About a Population Mean: s Known) and 7-5 (Testing a Claim About a Population Mean: s Not Known) from the first edition have been combined into one section: 8-3 (Testing a Claim About a Mean). Coverage of the s known case has been substantially reduced and it is now limited to Part 2 of Section 8-3.

Chapter 10

■ New Section: 10-5 Dummy Variables and Logistic Regression

Chapter 11

■ New Subsection: Section 11-2, Part 2 Test of Homogeneity, Fisher’s Exact Test, and McNemar’s Test for Matched Pairs

Chapter 14

■ Combined Sections: Section 13-2 (Elements of a Life Table) and Section 13-3 (Applications of Life Tables) from the first edition have been combined into Section 14-1 (Life Tables).

■ New Section: 14-2 Kaplan-Meier Survival Analysis

Flexible Syllabus This book’s organization reflects the preferences of most statistics instructors, but there are two common variations:

■ Early Coverage of Correlation and Regression: Some instructors prefer to cover the basics of correlation and regression early in the course. Section 2-4 now includes basic concepts of scatterplots, correlation, and regression without the use of formulas and greater depth found in Sections 10-1 (Correlation) and 10-2 (Regression).

■ Minimum Probability: Some instructors prefer extensive coverage of probability, while others prefer to include only basic concepts. Instructors preferring mini- mum coverage can include Section 4-1 while skipping the remaining sections of Chapter 4, as they are not essential for the chapters that follow. Many instructors prefer to cover the fundamentals of probability along with the basics of the addi- tion rule and multiplication rule (Section 4-2).

GAISE This book reflects recommendations from the American Statistical Association and its Guidelines for Assessment and Instruction in Statistics Education (GAISE). Those guidelines suggest the following objectives and strategies.

1. Emphasize statistical literacy and develop statistical thinking: Each section exercise set begins with Statistical Literacy and Critical Thinking exercises. Many of the book’s exercises are designed to encourage statistical thinking rather than the blind use of mechanical procedures.

2. Use real data: 87% of the examples and 89% of the exercises use real data.

3. Stress conceptual understanding rather than mere knowledge of procedures: Instead of seeking simple numerical answers, most exercises and examples involve conceptual understanding through questions that encourage practical interpretations of results. Also, each chapter includes a From Data to Decision project.

4. Foster active learning in the classroom: Each chapter ends with several Cooperative Group Activities.

5. Use technology for developing conceptual understanding and analyzing data: Computer software displays are included throughout the book. Special Tech Center subsections are available online, and they include instruction for using the software. Each chapter includes a Technology Project. When there are dis- crepancies between answers based on tables and answers based on technology, Appendix D provides both answers. The websites www.TriolaStats.com and www.pearsonhighered.com/triola as well as MyStatLab include free text-specific software (Statdisk), data sets formatted for several different technologies, and instructional videos for technologies.

6. Use assessments to improve and evaluate student learning: Assessment tools include an abundance of section exercises, Chapter Quick Quizzes, Review Exercises, Cumulative Review Exercises, Technology Projects, From Data to Decision projects, and Cooperative Group Activities.

xii Preface

Acknowledgments We would like to thank the many statistics professors and students who have contrib- uted to the success of this book. We thank the reviewers for their suggestions for this second edition:

James Baldone, Virginia College Naomi Brownstein, Florida State University Christina Caruso, University of Guelph Erica A. Corbett, Southeastern Oklahoma State University Xiangming Fang, East Carolina University Phil Gona, UMASS Boston Sharon Homan, University of North Texas Jackie Milton, Boston University Joe Pick, Palm Beach State College Steve Rigdon, St. Louis University Brian Smith, Black Hills State University Mahbobeh Vezvaei, Kent State University David Zeitler, Grand Valley State University

We also thank Paul Lorczak, Joseph Pick and Erica Corbett for their help in checking the accuracy of the text and answers.

Marc Triola Mario Triola

Jason Roy September 2016

MyStatLab® Online Course for Biostatistics: For the Biological and Health Sciences, 2e by Marc M. Triola, Mario F. Triola and Jason Roy (access code required) MyStatLab is available to accompany Pearson’s market leading text offerings. To give students a consistent tone, voice, and teaching method each text’s flavor and ap- proach is tightly integrated throughout the accompanying MyStatLab course, making learning the material as seamless as possible.

Real-World Data Examples – Help understand how statistics applies to everyday life through the extensive current, real-world data examples and exercises provided throughout the text.

MathXL coverage – MathXL is a market-leading text-specific autograded homework system built to improve student learning outcomes.

Enhanced video program to meet Introductory Statistics needs: • New! Tech-Specific Video Tutorials – These

short, topical videos address how to use varying technologies to complete exercises.

• Updated! Section Lecture Videos – Watch author, Marty Triola, work through examples and elaborate on key objectives of the chapter.

Resources for Success

