The new statistics with r: an introduction for biologists 2nd edition andy hector - Read the ebook n

Page 1


https://ebookmass.com/product/the-new-statistics-with-r-anintroduction-for-biologists-2nd-edition-andy-hector/

Instant digital products (PDF, ePub, MOBI) ready for you

Download now and discover formats that fit your needs...

Analysis with an introduction to proof. Fifth Edition, Pearson New International Edition Steven R. Lay

https://ebookmass.com/product/analysis-with-an-introduction-to-prooffifth-edition-pearson-new-international-edition-steven-r-lay/ ebookmass.com

Easy Statistics for Food Science with R Abdulraheem Alqaraghuli

https://ebookmass.com/product/easy-statistics-for-food-science-with-rabdulraheem-alqaraghuli/

ebookmass.com

Applied Statistics with R: A Practical Guide for the Life Sciences Justin C. Touchon

https://ebookmass.com/product/applied-statistics-with-r-a-practicalguide-for-the-life-sciences-justin-c-touchon/ ebookmass.com

The Hallowed Conspiracy: A Military Sci-Fi Series (Hallowed War Book 2) T.E. Bakutis

https://ebookmass.com/product/the-hallowed-conspiracy-a-military-scifi-series-hallowed-war-book-2-t-e-bakutis/

ebookmass.com

The Alpha’s Bullied Bride: Enemies to Lovers Shifter Romance (Alpha Wolf Island Book 7) Kayla Wolf

https://ebookmass.com/product/the-alphas-bullied-bride-enemies-tolovers-shifter-romance-alpha-wolf-island-book-7-kayla-wolf/

ebookmass.com

(eTextbook PDF) for Living Philosophy 2nd by Lewis Vaughn

https://ebookmass.com/product/etextbook-pdf-for-living-philosophy-2ndby-lewis-vaughn/

ebookmass.com

Human Development – Ebook PDF Version

https://ebookmass.com/product/human-development-ebook-pdf-version/

ebookmass.com

Survey of Operating Systems, 7th Edition Jane Holcombe

https://ebookmass.com/product/survey-of-operating-systems-7th-editionjane-holcombe/

ebookmass.com

Mössbauer Spectroscopy: Applications in Chemistry and Materials Science Yann Garcia

https://ebookmass.com/product/mossbauer-spectroscopy-applications-inchemistry-and-materials-science-yann-garcia/

ebookmass.com

Rescuing the Rancher: A Soldier & Cowboy Christian Romance (Black Rock Ranch Book 4) Jen Peters

https://ebookmass.com/product/rescuing-the-rancher-a-soldier-cowboychristian-romance-black-rock-ranch-book-4-jen-peters/

ebookmass.com

TheNewStatisticswithR

TheNewStatisticswithR AnIntroductionforBiologists

SecondEdition

ANDYHECTOR

DepartmentofPlantSciencesandLinacreCollege, UniversityofOxford,UK

GreatClarendonStreet,Oxford,OX26DP, UnitedKingdom

OxfordUniversityPressisadepartmentoftheUniversityofOxford. ItfurtherstheUniversity’sobjectiveofexcellenceinresearch,scholarship, andeducationbypublishingworldwide.Oxfordisaregisteredtrademarkof OxfordUniversityPressintheUKandincertainothercountries ©AndyHector2021

Themoralrightsoftheauthorhavebeenasserted FirstEditionpublishedin2015

Impression:1

Allrightsreserved.Nopartofthispublicationmaybereproduced,storedin aretrievalsystem,ortransmitted,inanyformorbyanymeans,withoutthe priorpermissioninwritingofOxfordUniversityPress,orasexpresslypermitted bylaw,bylicenceorundertermsagreedwiththeappropriatereprographics rightsorganization.Enquiriesconcerningreproductionoutsidethescopeofthe aboveshouldbesenttotheRightsDepartment,OxfordUniversityPress,atthe addressabove

Youmustnotcirculatethisworkinanyotherform andyoumustimposethissameconditiononanyacquirer

PublishedintheUnitedStatesofAmericabyOxfordUniversityPress 198MadisonAvenue,NewYork,NY10016,UnitedStatesofAmerica

BritishLibraryCataloguinginPublicationData

Dataavailable

LibraryofCongressControlNumber:2021931174

ISBN978–0–19–879817–0(hbk.)

ISBN978–0–19–879818–7(pbk.)

DOI:10.1093/oso/9780198798170.001.0001

PrintedinGreatBritainby Bell&BainLtd.,Glasgow

LinkstothirdpartywebsitesareprovidedbyOxfordingoodfaithand forinformationonly.Oxforddisclaimsanyresponsibilityforthematerials containedinanythirdpartywebsitereferencedinthiswork.

IdedicatedthefirsteditionofthisbooktothememoryofChristineMüller. ThisneweditionisdedicatedtoLindsayandRowan.

Acknowledgements

Theoriginalversionofthisbookwasbegunattheendof2011whileI wasonsabbaticalasavisitingresearcherinthecomputationalecology groupatMicrosoftResearchinCambridge—mythankstoDrewPurves andcolleaguesfortheirsupport.Thissecondeditionwaspartlywritten duringmysabbaticalin2019/20,sadlylargelyundercovid-19restrictions.However,beforelockdownImadesomeimportantprogressduring staysatObertschappina—thanksRolandandPetra—andonavisittothe CedarCreekEcosystemScienceReserve—forwhichIthankForestIsbell, DaveTilman,andtheamazinggroupofecologistsattheUniversityof Minnesota.

Severalpeoplewereinstrumentalinhelpingcultivatemyinitialinterest instatisticalanalysis.Iwasfirstintroducedtoexperimentsduringmy final-yearprojectwithPhilGrimeandcolleaguesattheUCPEatSheffield University.Shortlyafterwards,oneofthemostrewardingpartsofmyPhD atImperialCollegewaslearningstatistics(andtheGLIMsoftware)from MickCrawley.BernhardSchmidsharedthisinterestandenthusiasmand taughtmealotwhileIwasapostdocontheBIODEPTHprojectand,later, whenweworkedtogetherattheInstituteforEnvironmentalSciencesatthe UniversityofZurich(sorryforforsakingGenstatforR,Bernhard!).Herein OxfordIhavecontinuedtodiscussandlearnaboutstatisticspartlythrough thegenerosityofGeoffNicholls.

Ihavealsobenefitedfromsometimesbriefbutimportantdiscussions withseveralotherstatisticiansduringtrainingcourses,aftervisitingtalks,

andthelike,includingDouglasBates,AndrewGelman(overagameof Quincunx),MartinMaechler,PeterMcCullagh,JohnNelder,JoséPinheiro,BillVenables,andHadleyWickham.Myapologiestothemforany misunderstandingsthatmakeitintothisbook.

ManygroupmembershelpedmedelvefurtherintostatisticswithR, includingsomeofthematerialcoveredinthisbook.Iwouldliketothankall currentandpastgroupmembers,butparticularlyRobiBagchi,Stefanievon Felten,YannHautier,CharlieMarsh,ChrisPhilipson,MatteoTanadini, SeanTuck,MajaWeilenmann,andMikeyO’Brien.Ihavealsolearnedalot fromcollaboratingonpapersonstatisticswithseveralcolleagues,including TomBell,JarrettByrnes,JohnConnelly,LauraDee,ForestIsbell,Marc Kéry,MichelLoreau,andAlainZuur.

Thecontentofthisbookisbasedonteachingmaterialsdevelopedover thelasttwodecadesatImperialCollege,theUniversityofZurich,andhere atOxford,whereIteachstatisticsattheBachelor,Masters,andPhDlevels. Thankstoeveryoneinvolved—particularlythemanydemonstrators(TAs).

Manypeoplehelpedfinderrorsinthefirsteditionofthisbook—Ihave triedtocorrectthemandacknowledgethespottersattheRcaféwebsite (nodoubttherewillbemoretoaddforthissecondedition).Inparticular, mythankstoBenBolkerforhisconstructivecriticismofthefirstedition ofthisbook.

AtOUP,thanksgotoIan,Lucy,Bethany,andCharlieformakingthis bookandthissecondeditionpossible.Also,thankstoDouglasMeekison whohasskilfullycopyeditedthemanuscriptandSumintraGaurhasbeen projectmanagerforthisbook.

Finally,thankyou—andsorry—toanyonewhohasslippedmymindas Irushagaintomeetthebookdeadline! AndyHector,Oxford,October2020.

Chapter 9: Testing

Chapter 16: GLMsforCountData

Chapter 17: BinomialGLMs

17.1Binomialcountsandproportiondata

17.4Alternativelinkfunctions

17.5Summary:Statistics

Chapter 18: GLMsforBinaryData

Chapter 19: Conclusions

Chapter 20: AVeryShortIntroductiontoR

1 Introduction

1.1Introductiontothesecondedition

Backin2015,Iopenedtheintroductiontothefirsteditionofthisbook asfollows:

Unlikelyasitmayseem,statisticsiscurrentlyasexysubject.NateSilver’ssuccess inout-predictingthepoliticalpunditsinthelastUSelectiondrewhigh-profile presscoverageacrosstheglobe(andhisbookmanyreaders).Statisticsmaynot remainsexybutitwillalwaysbeuseful.Itisakeycomponentinthescientific toolboxandoneofthemainwayswehaveofdescribingthenaturalworldandof findingouthowitworks.Inmostareasofscience,statisticsisessential.

Somuchhaschangedoverthelastfiveyears.Initially,Ithoughtthis introductiontothesecondeditionwoulddiscussthesubsequentfailure ofstatisticstopredicttheBrexitreferendumandTrumpelectionresults. However,Iendedupworkingonthissecondeditionunderlockdowndue totheCOVID-19pandemic.I’mnotsureifstatisticsisstill‘sexy’butit iscertainlystillprominentinourlives.Modelling,muchofitstatistical, providespredictionsofthespreadofCOVID-19,andsamplingiskeyto estimatingfundamentalparameterslikethereproductivenumber,denoted

(coincidentally) R—thenumberofpeopleeachpersonwithCOVID-19in turninfects.

1.2Theaimofthisbook

Thisbookisintendedtointroduceoneofthemostusefultypesofstatistical analysistoresearchers,particularlyinthelifeandenvironmentalsciences: linearmodelsandtheirgeneralized-linear-model(GLM)extensions.My aimistogetacrosstheessenceofthestatisticalideasnecessarytointelligentlyapplyandinterpretthesemodelsinacontemporary(‘new’)way. Ihopeitwillbeofusetostudentsatbothundergraduateandpostgraduate levelsandtoresearchersinterestedinlearningmoreaboutstatistics(or inswitchingtothesoftwarepackagesusedhere,RandRStudio).The approachisthereforenotprimarilymathematical,andmakeslimiteduse ofequations—theyareeasilyfoundinnumerousstatisticstextbooksand ontheinternetifyouwantthem.Ihavealsokeptcitationstoaminimum andgivethemattheendofthemostrelevantchapter(thereisnooverall bibliography).Theapproachistolearnbydoing,throughtheanalysisof realdatasets.Thatmeansusingastatisticalsoftwarepackage,inthiscase theRprogramminglanguageforstatisticsandgraphics(forthereasons givenbelow).Italsorequiresdata.Infact,mostscientistsonlystartto takeaninterestinstatisticsoncetheyhavetheirowndata.Inmostscience degreesthatcomeslateintheday,makingtheteachingofintroductory statisticsmorechallenging.Studentsstudyingforresearchdegrees(Masters andPhDs)aregenerallymuchmoremotivatedtolearnstatisticssincethey knowitwillbeessentialfortheanalysisoftheirdata.Thenextbestthingto workingwithourowndataistoworkwithsomecarefullyselectedexamples fromtheliterature.Ihaveusedsomedatafrommyownresearchbut Ihavemainlytriedtofindsmall,relevantdatasetsthathavebeenanalysed inaninterestingway(preferablybyaqualifiedstatistician).Mostofthem arefromthelifeandenvironmentalsciences.Iamverygratefultoallof thepeoplewhohavehelpedcollectthesedataanddevelopedtheanalyses (theyarenamedintheappropriatechaptersasthedataandexampleare

introduced).Forconvenience,Ihavetriedtousedatasetsthatareavailable withintheRsoftware.

1.3Changesinthesecondedition

Thefirsteditionofthisbookwaswrittenfollowingstandardprocedure tosupplyaWorddocumentofthetextofeachchapterplusfilesofany figures.Thisprovedaninefficientanderror-pronemethodwithallthe copy–pastingbetweenRscriptsandthewordprocessingfile.Thissecond editionhasbeenentirelyrewrittenusingtheRMarkdownpackageto produceaPDFfileofeachchapteralongwiththeTeXfilethatgenerates it(asIunderstandit,subcontractorswillthenuseLaTeXtoapplythebook format).Writingthesecondeditionlikethisshouldbeasmarter,more efficient,andhopefullylesserror-pronewaytowork.Intheprocess,the bookhaschangedinmanyways.Basedonmyexperienceinteachingthe QuantitativeMethodsforBiologycourseatOxford,thecontenthasbeen dividedupintoagreaternumberofbite-sizetopicsthatwillhopefully provemoredigestibleforstudentsandmoreusefultoteachers.Inpart becausethebookwaswrittenusingtheRMarkdownpackage,Inowdrive RusingtheRStudiosoftware(italsoprovidesastandardinterfaceon allplatformsandlotsofothergreatsupportmaterials,liketheRcheat sheets).Everychapterhasbeenrewrittenbuttherearealsoentirelynew chapters,onegivinganopeningmotivationalexample,oneonreproducible research(usingtheRMarkdownpackage),andanotheronsomeofthe complexitiesoflinear-modelanalysisthatIskippedoverinthefirstedition. TherearenowseparatechaptersonGLMsfortheanalysisofdifferenttypes ofnon-normaldata.Thefirsteditionalsocontainedchaptersonmixedeffectsandgeneralizedlinearmixed-effectsmodels(GLMMs).Thesehave beendroppedfromthesecondedition—partlyduetothespacelimits butalsobecausesomereviewersandreadersfeltthatonechapterwas justnotenoughevenforashortintroductiontomixed-effectsmodels. Furthermore,theexampleGLMMnolongerranusinglaterversionsofthe software.

1.4TheRprogramminglanguageforstatisticsandgraphics

Risnowtheprincipalsoftwareforstatistics,graphics,andprogramming inmanyareasofscience,bothwithinacademiaandoutside(manylarge companiesuseR).Thereareseveralreasonsforthis,including:

• Risaproductofthestatisticalcommunity:itiswrittenbytheexperts.

• Risfree:itcostsnothingtodownloadanduse,facilitating collaboration.

• Rismultiplatform:versionsexistforWindows,Mac,andLinux.

• Risopen-sourcesoftwarethatcanbeeasilyextendedbytheR community.

• Risstatisticalsoftware,agraphicspackage,andaprogramming languageallinone(aswe’llsee,youcannowevenproducebooks, blogs,andwebsitesfromR).

1.5Scope

Statisticscansometimesseemlikeahuge,bewildering,andintimidating collectionoftests.ToavoidthisIhavechosentofocusonthelinearmodelframeworkasprobablythesinglemostusefulpartofstatistics(at leastforresearchersintheenvironmentalandlifesciences).Thebook startsbyintroducingseveraldifferentvariationsofthebasiclinear-model analysis(analysisofvariance,linearregression,analysisofcovariance,etc.). Ithenintroduceanextension:generalizedlinearmodelsfordatawith non-normaldistributions.Theadvantageoffollowingthelinear-model approachisthatawiderangeofdifferenttypesofdataandexperimental designscanbeanalysedwithverysimilarapproaches.Inparticular,all oftheanalysescoveredinthisbookcanbeperformedinRusingonly twomainfunctions,oneforlinearmodels(thelm()function)andonefor GLMs(theglm()function),togetherwithasetofgenericfunctionsthat extractdifferentaspectsoftheresults(confidenceintervalsetc.).

1.6Whatisnotcovered

Thisbookisprimarilyaboutstatistics(linearmodels),nottheRsoftware. Forthat,OUPoffersintroductoryvolumesbyBeckermanetal.(2017)and Petcheyetal.(2021).Statisticsisahugesubject,sothelimitedsizeofthe bookprecludedtheinclusionofmanytopics,andthecoverageislimited tolinearmodelsandGLMs.Therewasnospacefornon-linearregression approaches,generalizedadditivemodels(GAMs).Becauseofthefocus onanestimation-basedapproach,Ihavenotincludednon-parametric statistics.Experimentaldesigniscoveredbrieflyandintegratedintothe relevantchapters.Theuseofinformationcriteriaandmultimodelinference arebrieflyintroduced.ThebasicsofBayesianstatisticsisalsoabook-length projectinitsownright(e.g.Korner-Nievergeltetal.2017).

1.7Theapproach

Thereareseveraldifferentgeneralapproacheswithinstatistics(frequentist, Bayesian,informationtheory,etc.)andtherearemanysubspecieswithin theseschoolsofthought.Mostofthemethodsincludedinthisbookare usuallydescribedasbelongingto‘classicalfrequentiststatistics’.However, thisapproach,andtheprobabilityvaluesthataresowidelyusedwithin it,hascomeunderincreasingcriticism.Inparticular,statisticiansoften accusescientistsoffocusingfartoomuchon P-valuesandnotenough oneffectsizes.Thisisstrange,astheeffectsizes—theestimatesand intervals—aredirectlyrelatedtowhatwemeasureduringourresearch. Idon’tknowanyscientistswhostudy P-values!Forthatreason,Ihave triedtotakeanestimation-basedapproachthatfocusesonestimatesand confidenceintervalswhereverpossible.Stylesofanalysisvary(andfashions changeovertime).Becauseofthis,Ihavetriedtobefrankaboutsomeof mypersonalpreferencesusedinthisbook.Inadditiontomakingwideuse ofestimatesandintervals,Ihavealsotriedtoemphasizetheuseofgraphs forexploringdataandpresentingresults.Ihavetriedtoencouragetheuse

of apriori contrasts(comparisonsthatwereplannedinadvance)andI advocateavoidingtheinappropriateoveruseofmultipletestinginfavour ofamorefocused,plannedapproach.Finally,attheendofeachchapterI trytosummarizeboththestatisticalapproachandwhatithasenabledus tolearnaboutthescienceofeachexample.Itiseasytogetlostinstatistics, butfornon-statisticianstheanalysisshouldnotbecomeanendinitsown right,onlyamethodtohelpadvanceourscience.

1.8Thenewstatistics?

Whatisthe‘newstatistics’ofthetitle?Thetermisnotclearlydefinedbutit appearstobeusedtocoveracombinationofnewtechniques—particularly meta-analysis—withaback-to-basicsfocusonestimation-basedanalysis usingconfidenceintervals(Cumming2012).Meta-analysisisbeyondthe scopeofthisedition—IrecommendthebookbyKorichevaetal.(2013). Inthisbook,the‘newstatistics’referstoafocusonestimation-based analysis,togetherwiththeuseofmodernmaximum-likelihood-based analysis(includinginformationcriteria)plusmethodsforreproducible research.Ihavealsotriedtotakeaccountoftherecentcriticismsofthe overuseof P-valuesandstatisticalsignificance(althoughthisisanareaof ongoingdebate).

1.9Gettingstarted

Toallowalearning-by-doingapproach,theRcodenecessarytoperform thebasicanalysisisembeddedinthetextalongwiththekeyoutputfrom R(filesoftheRcodewillbeavailableassupportmaterialfromtheR caféathttp://www.plantecol.org/).Somereadersmaybecompletelynew toR,butmanywillhavesomefamiliaritywithit.Ratherthanstartwithan introductiontoR,wewilldivestraightintotheexampleanalyses.However, abriefintroductiontoRisprovidedattheendofthebook,andnewcomers tothesoftwarewillneedtostartthere.

1.10References

Beckerman,A.,Childs,D.Z.,&Petchey,O.P.(2017) GettingStartedwithR OxfordUniversityPress.

Cumming,G.(2012) UnderstandingtheNewStatistics.TaylorandFrancis. Koricheva,J.,Gurevitch,J.,&Mengersen,K.(2013) HandbookofMetaanalysisinEcologyandEvolution.PrincetonUniversityPress.

Korner-Nievergelt,F.,Roth,T.,vonFelten,S.,Guélat,J.,Almasi,B.,& Korner-Nievergelt,P.(2017) BayesianDataAnalysisinEcologyUsing LinearModelswithR,BUGS,andStan.AcademicPress.

Petchey,O.L.,Beckerman,A.P,Cooper,N.,&Childs,D.Z.(2021) Insights fromDatawithR:AnIntroductionfortheLifeandEnvironmentalSciences.OxfordUniversityPress.

Silver,N.(2012) TheSignalandtheNoise.Penguin.

2 Motivation

2.1Amatteroflifeanddeath

TheSpaceShuttle Challenger (Fig.2.1)wasonethemostadvanced spacecrafteverbuilt,butthefirstversionlackedejectionseatsforits crew.ThiswasparticularlyrelevantinJanuary1986,whentheexpected temperatureatlaunchwasbelowfreezing,muchcolder(around30oF) thanonanypreviousmission,raisingconcernsoversafety.Theshuttle (Challenger wasoneofonlyfiveputintoservice)wasdesignedtobelargely reusable.Theorbiter—themainshuttlecraft—waspropelledoutofthe atmospherebyitsownengines(suppliedbyalargeexternalliquid-fuel tankthatcouldbejettisonedwhenempty)andwiththehelpoftwobooster rocketsthatdisengagedwhenspentandfellback,toberecoveredfromthe seaforreuse.Theboosterrocketswereconstructedincylindricalsections andthejointssealedwithhugecircularwasherscalledO-rings.Itwasthese O-ringsthatwereofparticularconcern,astheirabilitytopreventfuelleaks (‘blowby’inNASAjargon)dependedontheirflexibilityandplasticity, whichdecreasedastemperaturesfell.

Figure2.1 Aschematicdiagramofthespaceshuttle,showingtheorbiterwith externalliquid-fueltankandreusableboosterrockets.Copyrightedfreeuse, https://commons.wikimedia.org/w/index.php?curid=554970

Becausetheboosterswererecoveredandrefurbishedforreuse,itwas possibletoinferfromscorchmarkswhetherfuelleakdamagehadoccurred duringeachlaunch.Theresultingdata(calledorings)isavailableaspartof anRpackagecalledfaraway(thinkofpackagesasadd-onappsthatyou candownloadtoextendthe‘base’versionofR—thispackageaccompanies thebookbyFaraway(2014)).The‘chunk’ofRcodebelowactivatesthe farawaypackage(itmustalreadybeinstalled—seeChapter20)anddisplays theoringsdata(thehead()functionshowsonlythefirstseveralrowsto savespace—notethatthelinesofRoutputareprefixedwithtwohashesto distinguishthem):

library(faraway) head(orings)

##tempdamage

##153 5

##257 1

##358 1

##463 1

##566 0

##667 0

Agraphofthenumberofleaks(fromlaunchespriorto Challenger’s 1986mission)asafunctionoflaunchtemperaturelookslikethatshownin Fig.2.2(thelibrary()functionloadstheggplot2packagesothatitsquickplotfunctioncanusetheoringsdatatodrawascatterplotwithtemperature onthe x-axisanddamageonthe y,savingthegraphas‘Fig2_2’):

library(ggplot2)

Fig2_2 <- qplot(data= orings, x= temp, y= damage)

Fig2_2

Doyouthinkthenumberoffuelleaksisrelatedtotemperature? AteleconferencewasheldbetweenNASAandtheboosterrocketmanufacturerontheeveofthelaunchand,afterprolongeddiscussion,the decisionwasmadetoproceed.Tragically,shortlyafterlift-offafuelleak

Figure2.2 Therelationshipbetweenthenumberoffuelleaks(‘damage’)and launchtemperatureforshuttlelaunchespriorto Challenger’s1986mission.

Turn static files into dynamic content formats.

Create a flipbook