https://ebookmass.com/product/the-new-statistics-with-r-anintroduction-for-biologists-2nd-edition-andy-hector/
Instant digital products (PDF, ePub, MOBI) ready for you
Download now and discover formats that fit your needs...
Analysis with an introduction to proof. Fifth Edition, Pearson New International Edition Steven R. Lay
https://ebookmass.com/product/analysis-with-an-introduction-to-prooffifth-edition-pearson-new-international-edition-steven-r-lay/ ebookmass.com
Easy Statistics for Food Science with R Abdulraheem Alqaraghuli
https://ebookmass.com/product/easy-statistics-for-food-science-with-rabdulraheem-alqaraghuli/
ebookmass.com
Applied Statistics with R: A Practical Guide for the Life Sciences Justin C. Touchon
https://ebookmass.com/product/applied-statistics-with-r-a-practicalguide-for-the-life-sciences-justin-c-touchon/ ebookmass.com
The Hallowed Conspiracy: A Military Sci-Fi Series (Hallowed War Book 2) T.E. Bakutis
https://ebookmass.com/product/the-hallowed-conspiracy-a-military-scifi-series-hallowed-war-book-2-t-e-bakutis/
ebookmass.com
The Alpha’s Bullied Bride: Enemies to Lovers Shifter Romance (Alpha Wolf Island Book 7) Kayla Wolf
https://ebookmass.com/product/the-alphas-bullied-bride-enemies-tolovers-shifter-romance-alpha-wolf-island-book-7-kayla-wolf/
ebookmass.com
(eTextbook PDF) for Living Philosophy 2nd by Lewis Vaughn
https://ebookmass.com/product/etextbook-pdf-for-living-philosophy-2ndby-lewis-vaughn/
ebookmass.com
Human Development – Ebook PDF Version
https://ebookmass.com/product/human-development-ebook-pdf-version/
ebookmass.com
Survey of Operating Systems, 7th Edition Jane Holcombe
https://ebookmass.com/product/survey-of-operating-systems-7th-editionjane-holcombe/
ebookmass.com
Mössbauer Spectroscopy: Applications in Chemistry and Materials Science Yann Garcia
https://ebookmass.com/product/mossbauer-spectroscopy-applications-inchemistry-and-materials-science-yann-garcia/
ebookmass.com
Rescuing the Rancher: A Soldier & Cowboy Christian Romance (Black Rock Ranch Book 4) Jen Peters
https://ebookmass.com/product/rescuing-the-rancher-a-soldier-cowboychristian-romance-black-rock-ranch-book-4-jen-peters/
ebookmass.com
TheNewStatisticswithR
TheNewStatisticswithR AnIntroductionforBiologists
SecondEdition
ANDYHECTOR
DepartmentofPlantSciencesandLinacreCollege, UniversityofOxford,UK
GreatClarendonStreet,Oxford,OX26DP, UnitedKingdom
OxfordUniversityPressisadepartmentoftheUniversityofOxford. ItfurtherstheUniversity’sobjectiveofexcellenceinresearch,scholarship, andeducationbypublishingworldwide.Oxfordisaregisteredtrademarkof OxfordUniversityPressintheUKandincertainothercountries ©AndyHector2021
Themoralrightsoftheauthorhavebeenasserted FirstEditionpublishedin2015
Impression:1
Allrightsreserved.Nopartofthispublicationmaybereproduced,storedin aretrievalsystem,ortransmitted,inanyformorbyanymeans,withoutthe priorpermissioninwritingofOxfordUniversityPress,orasexpresslypermitted bylaw,bylicenceorundertermsagreedwiththeappropriatereprographics rightsorganization.Enquiriesconcerningreproductionoutsidethescopeofthe aboveshouldbesenttotheRightsDepartment,OxfordUniversityPress,atthe addressabove
Youmustnotcirculatethisworkinanyotherform andyoumustimposethissameconditiononanyacquirer
PublishedintheUnitedStatesofAmericabyOxfordUniversityPress 198MadisonAvenue,NewYork,NY10016,UnitedStatesofAmerica
BritishLibraryCataloguinginPublicationData
Dataavailable
LibraryofCongressControlNumber:2021931174
ISBN978–0–19–879817–0(hbk.)
ISBN978–0–19–879818–7(pbk.)
DOI:10.1093/oso/9780198798170.001.0001
PrintedinGreatBritainby Bell&BainLtd.,Glasgow
LinkstothirdpartywebsitesareprovidedbyOxfordingoodfaithand forinformationonly.Oxforddisclaimsanyresponsibilityforthematerials containedinanythirdpartywebsitereferencedinthiswork.
IdedicatedthefirsteditionofthisbooktothememoryofChristineMüller. ThisneweditionisdedicatedtoLindsayandRowan.
Acknowledgements
Theoriginalversionofthisbookwasbegunattheendof2011whileI wasonsabbaticalasavisitingresearcherinthecomputationalecology groupatMicrosoftResearchinCambridge—mythankstoDrewPurves andcolleaguesfortheirsupport.Thissecondeditionwaspartlywritten duringmysabbaticalin2019/20,sadlylargelyundercovid-19restrictions.However,beforelockdownImadesomeimportantprogressduring staysatObertschappina—thanksRolandandPetra—andonavisittothe CedarCreekEcosystemScienceReserve—forwhichIthankForestIsbell, DaveTilman,andtheamazinggroupofecologistsattheUniversityof Minnesota.
Severalpeoplewereinstrumentalinhelpingcultivatemyinitialinterest instatisticalanalysis.Iwasfirstintroducedtoexperimentsduringmy final-yearprojectwithPhilGrimeandcolleaguesattheUCPEatSheffield University.Shortlyafterwards,oneofthemostrewardingpartsofmyPhD atImperialCollegewaslearningstatistics(andtheGLIMsoftware)from MickCrawley.BernhardSchmidsharedthisinterestandenthusiasmand taughtmealotwhileIwasapostdocontheBIODEPTHprojectand,later, whenweworkedtogetherattheInstituteforEnvironmentalSciencesatthe UniversityofZurich(sorryforforsakingGenstatforR,Bernhard!).Herein OxfordIhavecontinuedtodiscussandlearnaboutstatisticspartlythrough thegenerosityofGeoffNicholls.
Ihavealsobenefitedfromsometimesbriefbutimportantdiscussions withseveralotherstatisticiansduringtrainingcourses,aftervisitingtalks,
andthelike,includingDouglasBates,AndrewGelman(overagameof Quincunx),MartinMaechler,PeterMcCullagh,JohnNelder,JoséPinheiro,BillVenables,andHadleyWickham.Myapologiestothemforany misunderstandingsthatmakeitintothisbook.
ManygroupmembershelpedmedelvefurtherintostatisticswithR, includingsomeofthematerialcoveredinthisbook.Iwouldliketothankall currentandpastgroupmembers,butparticularlyRobiBagchi,Stefanievon Felten,YannHautier,CharlieMarsh,ChrisPhilipson,MatteoTanadini, SeanTuck,MajaWeilenmann,andMikeyO’Brien.Ihavealsolearnedalot fromcollaboratingonpapersonstatisticswithseveralcolleagues,including TomBell,JarrettByrnes,JohnConnelly,LauraDee,ForestIsbell,Marc Kéry,MichelLoreau,andAlainZuur.
Thecontentofthisbookisbasedonteachingmaterialsdevelopedover thelasttwodecadesatImperialCollege,theUniversityofZurich,andhere atOxford,whereIteachstatisticsattheBachelor,Masters,andPhDlevels. Thankstoeveryoneinvolved—particularlythemanydemonstrators(TAs).
Manypeoplehelpedfinderrorsinthefirsteditionofthisbook—Ihave triedtocorrectthemandacknowledgethespottersattheRcaféwebsite (nodoubttherewillbemoretoaddforthissecondedition).Inparticular, mythankstoBenBolkerforhisconstructivecriticismofthefirstedition ofthisbook.
AtOUP,thanksgotoIan,Lucy,Bethany,andCharlieformakingthis bookandthissecondeditionpossible.Also,thankstoDouglasMeekison whohasskilfullycopyeditedthemanuscriptandSumintraGaurhasbeen projectmanagerforthisbook.
Finally,thankyou—andsorry—toanyonewhohasslippedmymindas Irushagaintomeetthebookdeadline! AndyHector,Oxford,October2020.
Chapter 9: Testing
Chapter 16: GLMsforCountData
Chapter 17: BinomialGLMs
17.1Binomialcountsandproportiondata
17.4Alternativelinkfunctions
17.5Summary:Statistics
Chapter 18: GLMsforBinaryData
Chapter 19: Conclusions
Chapter 20: AVeryShortIntroductiontoR
1 Introduction
1.1Introductiontothesecondedition
Backin2015,Iopenedtheintroductiontothefirsteditionofthisbook asfollows:
Unlikelyasitmayseem,statisticsiscurrentlyasexysubject.NateSilver’ssuccess inout-predictingthepoliticalpunditsinthelastUSelectiondrewhigh-profile presscoverageacrosstheglobe(andhisbookmanyreaders).Statisticsmaynot remainsexybutitwillalwaysbeuseful.Itisakeycomponentinthescientific toolboxandoneofthemainwayswehaveofdescribingthenaturalworldandof findingouthowitworks.Inmostareasofscience,statisticsisessential.
Somuchhaschangedoverthelastfiveyears.Initially,Ithoughtthis introductiontothesecondeditionwoulddiscussthesubsequentfailure ofstatisticstopredicttheBrexitreferendumandTrumpelectionresults. However,Iendedupworkingonthissecondeditionunderlockdowndue totheCOVID-19pandemic.I’mnotsureifstatisticsisstill‘sexy’butit iscertainlystillprominentinourlives.Modelling,muchofitstatistical, providespredictionsofthespreadofCOVID-19,andsamplingiskeyto estimatingfundamentalparameterslikethereproductivenumber,denoted
(coincidentally) R—thenumberofpeopleeachpersonwithCOVID-19in turninfects.
1.2Theaimofthisbook
Thisbookisintendedtointroduceoneofthemostusefultypesofstatistical analysistoresearchers,particularlyinthelifeandenvironmentalsciences: linearmodelsandtheirgeneralized-linear-model(GLM)extensions.My aimistogetacrosstheessenceofthestatisticalideasnecessarytointelligentlyapplyandinterpretthesemodelsinacontemporary(‘new’)way. Ihopeitwillbeofusetostudentsatbothundergraduateandpostgraduate levelsandtoresearchersinterestedinlearningmoreaboutstatistics(or inswitchingtothesoftwarepackagesusedhere,RandRStudio).The approachisthereforenotprimarilymathematical,andmakeslimiteduse ofequations—theyareeasilyfoundinnumerousstatisticstextbooksand ontheinternetifyouwantthem.Ihavealsokeptcitationstoaminimum andgivethemattheendofthemostrelevantchapter(thereisnooverall bibliography).Theapproachistolearnbydoing,throughtheanalysisof realdatasets.Thatmeansusingastatisticalsoftwarepackage,inthiscase theRprogramminglanguageforstatisticsandgraphics(forthereasons givenbelow).Italsorequiresdata.Infact,mostscientistsonlystartto takeaninterestinstatisticsoncetheyhavetheirowndata.Inmostscience degreesthatcomeslateintheday,makingtheteachingofintroductory statisticsmorechallenging.Studentsstudyingforresearchdegrees(Masters andPhDs)aregenerallymuchmoremotivatedtolearnstatisticssincethey knowitwillbeessentialfortheanalysisoftheirdata.Thenextbestthingto workingwithourowndataistoworkwithsomecarefullyselectedexamples fromtheliterature.Ihaveusedsomedatafrommyownresearchbut Ihavemainlytriedtofindsmall,relevantdatasetsthathavebeenanalysed inaninterestingway(preferablybyaqualifiedstatistician).Mostofthem arefromthelifeandenvironmentalsciences.Iamverygratefultoallof thepeoplewhohavehelpedcollectthesedataanddevelopedtheanalyses (theyarenamedintheappropriatechaptersasthedataandexampleare
introduced).Forconvenience,Ihavetriedtousedatasetsthatareavailable withintheRsoftware.
1.3Changesinthesecondedition
Thefirsteditionofthisbookwaswrittenfollowingstandardprocedure tosupplyaWorddocumentofthetextofeachchapterplusfilesofany figures.Thisprovedaninefficientanderror-pronemethodwithallthe copy–pastingbetweenRscriptsandthewordprocessingfile.Thissecond editionhasbeenentirelyrewrittenusingtheRMarkdownpackageto produceaPDFfileofeachchapteralongwiththeTeXfilethatgenerates it(asIunderstandit,subcontractorswillthenuseLaTeXtoapplythebook format).Writingthesecondeditionlikethisshouldbeasmarter,more efficient,andhopefullylesserror-pronewaytowork.Intheprocess,the bookhaschangedinmanyways.Basedonmyexperienceinteachingthe QuantitativeMethodsforBiologycourseatOxford,thecontenthasbeen dividedupintoagreaternumberofbite-sizetopicsthatwillhopefully provemoredigestibleforstudentsandmoreusefultoteachers.Inpart becausethebookwaswrittenusingtheRMarkdownpackage,Inowdrive RusingtheRStudiosoftware(italsoprovidesastandardinterfaceon allplatformsandlotsofothergreatsupportmaterials,liketheRcheat sheets).Everychapterhasbeenrewrittenbuttherearealsoentirelynew chapters,onegivinganopeningmotivationalexample,oneonreproducible research(usingtheRMarkdownpackage),andanotheronsomeofthe complexitiesoflinear-modelanalysisthatIskippedoverinthefirstedition. TherearenowseparatechaptersonGLMsfortheanalysisofdifferenttypes ofnon-normaldata.Thefirsteditionalsocontainedchaptersonmixedeffectsandgeneralizedlinearmixed-effectsmodels(GLMMs).Thesehave beendroppedfromthesecondedition—partlyduetothespacelimits butalsobecausesomereviewersandreadersfeltthatonechapterwas justnotenoughevenforashortintroductiontomixed-effectsmodels. Furthermore,theexampleGLMMnolongerranusinglaterversionsofthe software.
1.4TheRprogramminglanguageforstatisticsandgraphics
Risnowtheprincipalsoftwareforstatistics,graphics,andprogramming inmanyareasofscience,bothwithinacademiaandoutside(manylarge companiesuseR).Thereareseveralreasonsforthis,including:
• Risaproductofthestatisticalcommunity:itiswrittenbytheexperts.
• Risfree:itcostsnothingtodownloadanduse,facilitating collaboration.
• Rismultiplatform:versionsexistforWindows,Mac,andLinux.
• Risopen-sourcesoftwarethatcanbeeasilyextendedbytheR community.
• Risstatisticalsoftware,agraphicspackage,andaprogramming languageallinone(aswe’llsee,youcannowevenproducebooks, blogs,andwebsitesfromR).
1.5Scope
Statisticscansometimesseemlikeahuge,bewildering,andintimidating collectionoftests.ToavoidthisIhavechosentofocusonthelinearmodelframeworkasprobablythesinglemostusefulpartofstatistics(at leastforresearchersintheenvironmentalandlifesciences).Thebook startsbyintroducingseveraldifferentvariationsofthebasiclinear-model analysis(analysisofvariance,linearregression,analysisofcovariance,etc.). Ithenintroduceanextension:generalizedlinearmodelsfordatawith non-normaldistributions.Theadvantageoffollowingthelinear-model approachisthatawiderangeofdifferenttypesofdataandexperimental designscanbeanalysedwithverysimilarapproaches.Inparticular,all oftheanalysescoveredinthisbookcanbeperformedinRusingonly twomainfunctions,oneforlinearmodels(thelm()function)andonefor GLMs(theglm()function),togetherwithasetofgenericfunctionsthat extractdifferentaspectsoftheresults(confidenceintervalsetc.).
1.6Whatisnotcovered
Thisbookisprimarilyaboutstatistics(linearmodels),nottheRsoftware. Forthat,OUPoffersintroductoryvolumesbyBeckermanetal.(2017)and Petcheyetal.(2021).Statisticsisahugesubject,sothelimitedsizeofthe bookprecludedtheinclusionofmanytopics,andthecoverageislimited tolinearmodelsandGLMs.Therewasnospacefornon-linearregression approaches,generalizedadditivemodels(GAMs).Becauseofthefocus onanestimation-basedapproach,Ihavenotincludednon-parametric statistics.Experimentaldesigniscoveredbrieflyandintegratedintothe relevantchapters.Theuseofinformationcriteriaandmultimodelinference arebrieflyintroduced.ThebasicsofBayesianstatisticsisalsoabook-length projectinitsownright(e.g.Korner-Nievergeltetal.2017).
1.7Theapproach
Thereareseveraldifferentgeneralapproacheswithinstatistics(frequentist, Bayesian,informationtheory,etc.)andtherearemanysubspecieswithin theseschoolsofthought.Mostofthemethodsincludedinthisbookare usuallydescribedasbelongingto‘classicalfrequentiststatistics’.However, thisapproach,andtheprobabilityvaluesthataresowidelyusedwithin it,hascomeunderincreasingcriticism.Inparticular,statisticiansoften accusescientistsoffocusingfartoomuchon P-valuesandnotenough oneffectsizes.Thisisstrange,astheeffectsizes—theestimatesand intervals—aredirectlyrelatedtowhatwemeasureduringourresearch. Idon’tknowanyscientistswhostudy P-values!Forthatreason,Ihave triedtotakeanestimation-basedapproachthatfocusesonestimatesand confidenceintervalswhereverpossible.Stylesofanalysisvary(andfashions changeovertime).Becauseofthis,Ihavetriedtobefrankaboutsomeof mypersonalpreferencesusedinthisbook.Inadditiontomakingwideuse ofestimatesandintervals,Ihavealsotriedtoemphasizetheuseofgraphs forexploringdataandpresentingresults.Ihavetriedtoencouragetheuse
of apriori contrasts(comparisonsthatwereplannedinadvance)andI advocateavoidingtheinappropriateoveruseofmultipletestinginfavour ofamorefocused,plannedapproach.Finally,attheendofeachchapterI trytosummarizeboththestatisticalapproachandwhatithasenabledus tolearnaboutthescienceofeachexample.Itiseasytogetlostinstatistics, butfornon-statisticianstheanalysisshouldnotbecomeanendinitsown right,onlyamethodtohelpadvanceourscience.
1.8Thenewstatistics?
Whatisthe‘newstatistics’ofthetitle?Thetermisnotclearlydefinedbutit appearstobeusedtocoveracombinationofnewtechniques—particularly meta-analysis—withaback-to-basicsfocusonestimation-basedanalysis usingconfidenceintervals(Cumming2012).Meta-analysisisbeyondthe scopeofthisedition—IrecommendthebookbyKorichevaetal.(2013). Inthisbook,the‘newstatistics’referstoafocusonestimation-based analysis,togetherwiththeuseofmodernmaximum-likelihood-based analysis(includinginformationcriteria)plusmethodsforreproducible research.Ihavealsotriedtotakeaccountoftherecentcriticismsofthe overuseof P-valuesandstatisticalsignificance(althoughthisisanareaof ongoingdebate).
1.9Gettingstarted
Toallowalearning-by-doingapproach,theRcodenecessarytoperform thebasicanalysisisembeddedinthetextalongwiththekeyoutputfrom R(filesoftheRcodewillbeavailableassupportmaterialfromtheR caféathttp://www.plantecol.org/).Somereadersmaybecompletelynew toR,butmanywillhavesomefamiliaritywithit.Ratherthanstartwithan introductiontoR,wewilldivestraightintotheexampleanalyses.However, abriefintroductiontoRisprovidedattheendofthebook,andnewcomers tothesoftwarewillneedtostartthere.
1.10References
Beckerman,A.,Childs,D.Z.,&Petchey,O.P.(2017) GettingStartedwithR OxfordUniversityPress.
Cumming,G.(2012) UnderstandingtheNewStatistics.TaylorandFrancis. Koricheva,J.,Gurevitch,J.,&Mengersen,K.(2013) HandbookofMetaanalysisinEcologyandEvolution.PrincetonUniversityPress.
Korner-Nievergelt,F.,Roth,T.,vonFelten,S.,Guélat,J.,Almasi,B.,& Korner-Nievergelt,P.(2017) BayesianDataAnalysisinEcologyUsing LinearModelswithR,BUGS,andStan.AcademicPress.
Petchey,O.L.,Beckerman,A.P,Cooper,N.,&Childs,D.Z.(2021) Insights fromDatawithR:AnIntroductionfortheLifeandEnvironmentalSciences.OxfordUniversityPress.
Silver,N.(2012) TheSignalandtheNoise.Penguin.
2 Motivation
2.1Amatteroflifeanddeath
TheSpaceShuttle Challenger (Fig.2.1)wasonethemostadvanced spacecrafteverbuilt,butthefirstversionlackedejectionseatsforits crew.ThiswasparticularlyrelevantinJanuary1986,whentheexpected temperatureatlaunchwasbelowfreezing,muchcolder(around30oF) thanonanypreviousmission,raisingconcernsoversafety.Theshuttle (Challenger wasoneofonlyfiveputintoservice)wasdesignedtobelargely reusable.Theorbiter—themainshuttlecraft—waspropelledoutofthe atmospherebyitsownengines(suppliedbyalargeexternalliquid-fuel tankthatcouldbejettisonedwhenempty)andwiththehelpoftwobooster rocketsthatdisengagedwhenspentandfellback,toberecoveredfromthe seaforreuse.Theboosterrocketswereconstructedincylindricalsections andthejointssealedwithhugecircularwasherscalledO-rings.Itwasthese O-ringsthatwereofparticularconcern,astheirabilitytopreventfuelleaks (‘blowby’inNASAjargon)dependedontheirflexibilityandplasticity, whichdecreasedastemperaturesfell.

Figure2.1 Aschematicdiagramofthespaceshuttle,showingtheorbiterwith externalliquid-fueltankandreusableboosterrockets.Copyrightedfreeuse, https://commons.wikimedia.org/w/index.php?curid=554970
Becausetheboosterswererecoveredandrefurbishedforreuse,itwas possibletoinferfromscorchmarkswhetherfuelleakdamagehadoccurred duringeachlaunch.Theresultingdata(calledorings)isavailableaspartof anRpackagecalledfaraway(thinkofpackagesasadd-onappsthatyou candownloadtoextendthe‘base’versionofR—thispackageaccompanies thebookbyFaraway(2014)).The‘chunk’ofRcodebelowactivatesthe farawaypackage(itmustalreadybeinstalled—seeChapter20)anddisplays theoringsdata(thehead()functionshowsonlythefirstseveralrowsto savespace—notethatthelinesofRoutputareprefixedwithtwohashesto distinguishthem):
library(faraway) head(orings)
##tempdamage
##153 5
##257 1
##358 1
##463 1
##566 0
##667 0
Agraphofthenumberofleaks(fromlaunchespriorto Challenger’s 1986mission)asafunctionoflaunchtemperaturelookslikethatshownin Fig.2.2(thelibrary()functionloadstheggplot2packagesothatitsquickplotfunctioncanusetheoringsdatatodrawascatterplotwithtemperature onthe x-axisanddamageonthe y,savingthegraphas‘Fig2_2’):
library(ggplot2)
Fig2_2 <- qplot(data= orings, x= temp, y= damage)
Fig2_2
Doyouthinkthenumberoffuelleaksisrelatedtotemperature? AteleconferencewasheldbetweenNASAandtheboosterrocketmanufacturerontheeveofthelaunchand,afterprolongeddiscussion,the decisionwasmadetoproceed.Tragically,shortlyafterlift-offafuelleak
Figure2.2 Therelationshipbetweenthenumberoffuelleaks(‘damage’)and launchtemperatureforshuttlelaunchespriorto Challenger’s1986mission.