https://ebookmass.com/product/statistical-modeling-with-r-a-
Instant digital products (PDF, ePub, MOBI) ready for you
Download now and discover formats that fit your needs...
Bayesian Analysis with Excel and R 1st Edition Conrad Carlberg
https://ebookmass.com/product/bayesian-analysis-with-excel-and-r-1stedition-conrad-carlberg/
ebookmass.com
Statistical Thinking M. D. Edge
From
Scratch: A Primer For Scientists https://ebookmass.com/product/statistical-thinking-from-scratch-aprimer-for-scientists-m-d-edge/
ebookmass.com
Applied Data Analysis and Modeling for Energy Engineers and Scientists
https://ebookmass.com/product/applied-data-analysis-and-modeling-forenergy-engineers-and-scientists/
ebookmass.com
Experiencing Intercultural Communication: An Introduction 7th Edition Judith N. Martin
https://ebookmass.com/product/experiencing-interculturalcommunication-an-introduction-7th-edition-judith-n-martin/
ebookmass.com
Guidelines for Managing Abnormal Situations Ccps (Center For Chemical Process Safety) https://ebookmass.com/product/guidelines-for-managing-abnormalsituations-ccps-center-for-chemical-process-safety/
ebookmass.com
Management Principles for Health Professionals 7th Edition, (Ebook PDF)
https://ebookmass.com/product/management-principles-for-healthprofessionals-7th-edition-ebook-pdf/
ebookmass.com
Deciphering the TCR Repertoire to Solve the COVID-19 Mystery Lucas Gutierrez & John Beckford & Houda Alachkar https://ebookmass.com/product/deciphering-the-tcr-repertoire-to-solvethe-covid-19-mystery-lucas-gutierrez-john-beckford-houda-alachkar/
ebookmass.com
European Union Politics 7th Edition Micelle Cini
https://ebookmass.com/product/european-union-politics-7th-editionmicelle-cini/
ebookmass.com
Computer Security Fundamentals, Fourth Edition Chuck Easttom
https://ebookmass.com/product/computer-security-fundamentals-fourthedition-chuck-easttom/
ebookmass.com
Essentials of Business Law 6th Edition Jeffrey F. Beatty
https://ebookmass.com/product/essentials-of-business-law-6th-editionjeffrey-f-beatty/
ebookmass.com
StatisticalModelingWithR StatisticalModeling WithR AdualfrequentistandBayesian approachforlifescientists PABLOINCHAUSTI CentroUniversitarioRegionaldelEste,UniversidaddelaRepública,Uruguay
GreatClarendonStreet,Oxford,OX26DP, UnitedKingdom
OxfordUniversityPressisadepartmentoftheUniversityofOxford. ItfurtherstheUniversity’sobjectiveofexcellenceinresearch,scholarship, andeducationbypublishingworldwide.Oxfordisaregisteredtrademarkof OxfordUniversityPressintheUKandincertainothercountries ©PabloInchausti2023
Themoralrightsoftheauthorhavebeenasserted Impression:1
Allrightsreserved.Nopartofthispublicationmaybereproduced,storedin aretrievalsystem,ortransmitted,inanyformorbyanymeans,withoutthe priorpermissioninwritingofOxfordUniversityPress,orasexpresslypermitted bylaw,bylicenceorundertermsagreedwiththeappropriatereprographics rightsorganization.Enquiriesconcerningreproductionoutsidethescopeofthe aboveshouldbesenttotheRightsDepartment,OxfordUniversityPress,atthe addressabove
Youmustnotcirculatethisworkinanyotherform andyoumustimposethissameconditiononanyacquirer
PublishedintheUnitedStatesofAmericabyOxfordUniversityPress 198MadisonAvenue,NewYork,NY10016,UnitedStatesofAmerica
BritishLibraryCataloguinginPublicationData
Dataavailable
LibraryofCongressControlNumber:2022937827
ISBN978–0–19–285901–3(hbk)
ISBN978–0–19–285902–0(pbk)
DOI:10.1093/oso/9780192859013.001.0001
Printedandboundby CPIGroup(UK)Ltd,Croydon,CR04YY
Coverimage:JohnLund/GettyImages.
LinkstothirdpartywebsitesareprovidedbyOxfordingoodfaithand forinformationonly.Oxforddisclaimsanyresponsibilityforthematerials containedinanythirdpartywebsitereferencedinthiswork.
ForJoana, becausetwoissomuchmorethanone
Preface Aprefaceistheclosestanauthormayhavetothe“letterofmarque”usedbyEuropean governmentsintheseventeenthcenturytoauthorizepiracywiththesovereign’stacitconsent.Anauthorcanwritejustaboutanythingintheprefacewiththeresignedpermission oftheeditor.
Havingfinishedthebook,Iherebygivemyselfpermissiontowriteinthefirstperson singular.Ihavespentmanythousandsofhoursendlesslywritingandrewritingthisbook overthelast24months.Ithasbeenachallenge,apleasure,athrill,andaburdenatthe sametime.
Atthispoint,Ihavethefollowingunsettlingmixtureoffeelings:
* Reliefandjoy:Finallyitisover.
* Prideandaccomplishment:Ihavedoneit!Ihavedoneit!Ihavedoneit!
* Uncertainty:Wasitworththeeffort?Willitbewell/badlyreceived?
* Insecurity:DidIdouble-checkeverything?Doesithaveanyembarrassingerrors?
Itisnowtimeforthebooktosinkorswimonitsownatthehandsofitsreaders.
Alongtimeago,myundergraduateadvisersentmetofetchsomerainfalldatafrom theVenezuelanMinistryoftheEnvironment.Ireluctantlywent,gotlostinsidetheugly building,failedtogatherthedata,butstumbledbychanceintoanearlyemptyand shabbyMinistrybookstore.ThereIfoundagem:theoriginal1975Spanisheditionof thebook Areografía:estrategiasgeográficasdelasespecies byEduardoRapoport.Hewasa cleverandoriginalArgentinianecologistwhoanticipatedwhatlaterbecameknownas macroecology.ThePergamonPresstranslation(Areography:TheGeographicStrategiesof Species)missedtheprefaceoftheSpanishoriginal,easilythebestprefaceofascientific bookIhaveeverread.IreadRapoport’sprefaceatthebookstore,andrightawayIbought thebookusingallthemoneyIhad,includingthereturnbusfare.Iwalkedthe4.5km home,readingthebookbetweentrafficlights.Recognizingtheabsenceofrulesforwhat theprefaceofascientificbookshouldcontain,Rapoportaimedtobringahumanized depictionofhisCV,sothat“sciencebookswouldleavehomeandbetakentothedentist waitingroom,”andyouknewwhattosayifyouevermeettheauthor.Followingthe masterEduardoRapoport,hereismyattempt.
ThingsthatIlove * ThedazzlingimaginationsofGabrielGarcíaMárquez,JulioCortázar,andAlejoCarpentier,thedignityofPrimoLevi,thewisdomofUmbertoEco,thehumanityofItalo Calvino, TheMagus byJohnFowles.
* ThemusicandshiningsmileofLouisArmstrong,thesaxophoneofJohnColtrane(his mission:“amasterpiecebymidnight”),andArtTatumplayingthepiano;theBeatles, EricClapton,Sting,andMarkKnopfler;thebluesofTajMahal,Keb’Mo,andBuddy
Guy;thestirringvoicesofAnnieLennox,UteLemper,andMadeleinePeyroux;the lyricsofLeonardCohenandBobDylan.
* Twoscientificheroes:John(JBS)HaldaneandRichardFeynman.
* AlltheMontyPythonfilmsandtheoriginalBBCseries.
* AllPicasso,exceptthepinkperiod;GustavKlimtandVincentvanGogh.
* Let’sshareafullUruguayanbarbecue,includingmollejas(thymus),kidneys,andsweet bloodsausages.ItwouldbeglorioustoenjoysomeFrenchgoatcheese,afreshgreen saladwithendivesandcherrytomatoes,redwineofcourse,andamangoorpassionfruit moussetobringmysoulclosertoearthlyparadise.
* And,aboveall,let’stalk,exchangingstories,books,andanecdotes.Thefoldersofmy memoryholdcountlessmegabytesofhistorical,literary,andscientificinformation, someofwhichmayeveninterestorentertainyouforawhile.AndIcanswiftlychange myopiniononanyissueunderthesunasmanytimesasyoucanmanagetoconvince mewithgoodarguments,sensibleevidence,andamodicumofstraightreasoning.
ThingsIhate * Socialinjusticeinanyshapeorform.
* Thestupidityofthemilitaryandallitscheerleaders.
* Social,racial,andsexualdiscriminationunderanydisguiseorshade.
* Alltotalitarianideologiesandformsofthought.
* Thestiffness,conservatism,intolerance,andbackwardnessofthetraditionalCatholic Churchandofmanyrecentlycreatedprotestantchurches.
* Pineappleonpizza:ahorrendousmixthatspoilstwogreatthings.
Mystory Myownbiographyisprettyordinary.Iwillrecalljustafeweventsthatmightperhaps inspireotherstobelieveinthemselves.IwasborninUruguay,asmallcountrythatlies sandwichedbetweenArgentinaandBrazilatthebottomleftoftheworldmap.Afterstartingpublicprimaryschoolthere,IfollowedmyfathertoVenezuela.Istarteduniversity wishingtobecomeanelectricalengineer,butfinallymanagedtograduateinbiologyat thetardyageof26.
Idesperatelywantedtostudymoreandbecomeascientist.Withmythenpartner, wemanagedtogetadmittedtotheStateUniversityofNewYorkatStonyBrook(now StonyBrookUniversity)bysheerluck.InSeptember1992,wegatheredallourmoneyand belongings,gotsomefamilyloans,andtraveledtoNewYork.Welandedthereperfectly unawareofeverything,includinghowtogettotheuniversityfromtheairport.Wehad
expectedtopaythefirstyearofuniversityfees,bettingthatourgoodbackgroundwould allowustoobtaingoodgradesthatmightleadtosomefinancialsupport.Butitturned outthatwedidnothavetopayanyuniversityfeesatall!
Andevenbetter,onedaybeforethestartofmyfirstsemester,anothergraduatestudent chosetotakecareofherillgrandmotheranddeclinedherteachingassistantposition.I wasofferedit,andofcoursetookit:$750amonthminustaxesamountedtotouching heaven.ThenextdayIwenttoteachsomeverybasicbiologyto25puzzledAmerican students.DuringmyfourthdayinanEnglish-speakingcountry,Ibarelyunderstood40% ofwhatthestudentssaid.ButIhadaninspiredideathatsavedme.Ishamelesslytold themofahearingdisabilitythatrequiredthemtospeakslowlyandveryloudlyforme tounderstandthem.Andtheydidittosuchanextentthatmystrangedisabilitymiraculouslydisappearedafterafewweeks.WequicklyboughtaTVtohelptrainmywooden ears.Atfirst,theonlyprogramthatIcouldunderstandwasthe(British)PrimeMinister’sQuestionsthatwasbroadcastonCSPANverylateatnightjustforthe(dis)pleasure ofinsomniacs.Thisverytheatrical,ceremonial,andmostlypointlessweeklyexerciseof BritishpoliticswasmydoortounderstandingspokenEnglish,andthestartingofan anglophiliathatonlyBrexitrecentlyanddefinitelymanagedtocure.
AtStonyBrook,ImetLevGinzburgbychancewhileeatingsandwichesattheDepartmentofEcologyandEvolution.ThisveryintelligentandwittyRussianmathematician becamemyPhDsupervisor.Atfirst,itwasnearlyimpossibletounderstandwhatthisman wastalkingsoquicklyabout.Iusedtosharea6m2 officeinfrontofhis. Лeв oftencalled meintohisofficeusingmundaneexcusestospendmanyhourstalkingandteachingme onaone-to-onebasisasifIwasamedievalapprentice.Theseinteractionsovertheyears shapedmeintoascientist,andaffectedmybrainmorethananythingsincegastrulation. Thewiderangeoftopicsoftheseconversationsincludedmathematics,ecology,classical physics,dynamicalsystems,riskanalysis,philosophyofscience,thelatestbookswewere reading,andwhoknowswhatelse IstillvividlyrecalltwoentireFridayafternoons that Лeв devotedtoteachingmethepuzzlingbasicsofquantummechanics(including theSchrödingerwaveequation)usingasmallgreenblackboardandwhitechalk.Itwas anindescribablepleasuretohavereceivedsuchagiftofhumanknowledgefromyou, мoй дopoгoйдpугинacтaвник.
Igraduatedin1998,andmyItalianpassport(lifelessonfortheyoung:youcannever havetoomanypassports;acquireasmanyaspossiblesincesomemayopenunexpected doors)gotmeanEUfellowshipforapostdocwithJohnLawtonatImperialCollege,UK. IlatermovedtoFrancewhereIlivedandworkedfornineyears.Othermovesfollowing anon-traditionalandhardlystraightpathtookmebacktoUruguay,whereIlivenow.
Iwillnotbotherthereaderwithfurtherdetailsofmyacademicpast.Thereishardly anymeritinvolvedinit.Likeyou,Ihave23pairsofchromosomesineverycell,bloodthe samecolorasyours,andagenomethatdiffersfromyoursandfromMandela’s,Einstein’s, Himmler’s,andStalin’sbyaboutsixmillionDNAbases(~0.06%,anirrelevantdifference sinceonlyabout2%ofourDNAistranslatedintoproteins).Therefore,restassuredthat thereisnothingspecial,unique,orevengoodaboutme.Youcaneasilydobetterthan meifyouwish.
Justtrustmeonthisone.Mostpeoplewhosucceedinlifearethosethatseriously applytheirheartandmindandenergylongenoughtopursuetheirdreamswithstubborn determination.Iamconvincedthatlife(ortheuniverse,orthegods)rewardspersistence andsingle-mindednessoverapparentleapsofinspiredgenius.However,forthatyoufirst needtoholddreamsandambitionsforyourself.Nobodycanteachyoutodreamand
aspiretoahigherfuturethanyourpresent.Dreamingturnsouttobeaspontaneousand personalaffair.Igleanedthenextquote(outofcontext,andoddlyenoughduetoLenin) fromaJulioCortázarbookthatsummarizeswellwhatIwishtoconvey:
Theriftbetweendreamsandrealitycausesnoharmifonlythepersondreamingbelievesseriouslyinhisdream,ifheattentivelyobserveslife,compareshis observationswithhiscastlesintheair,andif,generallyspeaking,heworksconscientiouslyfortheachievementofhisfantasies.Ifthereissomeconnection betweendreamsandlifethenalliswell.
IhavebeenhelpedbeyondthecallofdutybythestaffofOxfordUniversityPress.Ian Sherman,senioreditorofLifeSciences,incrediblyrememberedmeaftera19-yearhiatus and,evenmoresurprisingly,believedinandlikedtheideaofthisbook.Hevariously guided,prompted,keptquiet,andencouragedme,andIcannotthankhimenoughfor allthisandmore.ImustalsothankKatieLakinaforputtingtheproductionofthisbook backontrack,KarenMooreforherdiligentanddedicatedworkduringthetransformation ofmanyfilesintoafinishedbook,andRichardHutchinsonforhisattentiveandcareful copyeditingthatgreatlyimprovedthequalityofthetextthatyouarereading.
ThefreeandopensoftwareRandthemanypackagesusedinthisbookstemfrom thefantasticandcreativeworkofmanygenerousscientistsandprogrammersaroundthe world.Theirincredibleworkhascreatedthecollectivepropertyofstatisticalknowledge thatmadethisbookpossible.WhileIlackthemeanstothankyouall,letmeatleastraise aglasstotoastyouwithendlessgratitude.Ifthereisanyinformaticsgod,itsblessings shouldalsoextendtothecreatorsandmaintainersofLinuxUbuntuandLibreOffice.
SebastiánAguiar,MarcKéry,EnriqueLessa,DanielNaya,andMatíasSchraufkindly read,commentedon,andcorrecteddifferentchaptersofthisbook.Theirinputand feedbackpromptedchangesthatledtoimprovementsandhopefullyfewerembarrassing mistakes.Thestubbornerrors,plaininconsistencies,andstraightomissionsthatmight remainare,ofcourse,mineonly.MelinaAranda,JavierGarcía,DanielNaya,AliciaPonce, andAgustínSáezkindlyprovideddatafromtheirpublishedpapersthatareusedaseither casestudiesorproblemsattheendofsomechapters.IthankAlexandraElbakyanfor allowingmetoaccessanenormousamountofessentialinformationthatIcouldnot otherwisehaveeverdreamedtoreadanduseinthisbook:
Irefusetoindulgeinthetackyfinalsentencesthatendtheprefacesofmanyscientific books:“Lastbutnotleast,Iwanttothank ... fortheirpatienceand ... forthemanyhours Ispent ”Ohno,pleasenotthatagain!ButIwillsaythis:overthelast12years,Ihave beenblessedbeyonddeservingbytheearthlygodstosharemylifewithJoanaGagliardi. Sheismymagnificentpartner,mypassionatelover,myclosefriend,andatrulygreatand beautifulwomanwithashinysoulenvelopedbyalargesmileandalmond-shapedeyes.I havealsohadtheprivilegetosharetheseyearswithFiamma(24)andIahel(20),Joana’s brightdaughterandson,whomIhaveseengrowintotwobeautifuladultswhoarethe betterangelsofmysoul.
Thisisenoughnow.Youdidnotbuythebooktoreadthisbabble.Youwantsomestats, andthatiswhatyouwillfindstartingonthenextpage.Shouldyouhaveanycomments, complaints,remarks,orsuggestions,orhavespottedanysmallorlargeerrors,Iwantto hearfromyou,sopleasewriteto pablo.inchausti.f@gmail.com
Withwarmregards, Pablo
5TheGeneralLinearModelII:Categoricalexplanatory
5.8Aposterioritestsinfrequentistmodels
6.5Analysisofcovariance:Mixingcontinuousandcategorical explanatoryvariables
6.6Analysisofcovariance:Frequentistfitting
6.7Analysisofcovariance:Bayesianfitting
7ModelSelection:One,two,andmoremodelsfittedtothe
7.1Introduction
7.2Theproblemofmodelselection:Parsimonyinstatistics
7.3Modelselectioncriteriainthefrequentistframework:AIC
7.4ModelselectioncriteriaintheBayesianframework:DICand WAIC
7.5Theposteriorpredictivedistributionandposteriorpredictive checks
7.6NowbacktotheWAICandLOO-CV
7.7Priorpredictivedistributions:Arelatively“new”kidontheblock
8TheGeneralizedLinearModel 8.1Introduction
8.2WhatareGLMsmadeof?
8.3FittingGLMs
8.4GoodnessoffitinGLMs
9WhentheResponseVariableisBinary 9.1Introduction
9.2KeyconceptsforbinaryGLMs:Odds,logodds,andadditional linkfunctions
9.3FittingbinaryGLMs
9.4UngroupedbinaryGLM:Frequentistfitting
9.5FurtherissuesaboutvalidatingbinaryGLMs
9.6UngroupedbinaryGLMs:Bayesianfitting
9.7GroupedbinaryGLMs
9.8Problems
10WhentheResponseVariableisaCount,OftenwithMany Zeros
10.1Introduction
10.2Over-dispersion:Acommonproblemwithmanycausesand somesolutions
10.3Plantspeciesrichnessandgeographicalvariables
10.4Modelingofcountswithanexcessofzeros:Zero-inflatedand hurdlemodels
10.4.1Frequentistfittingofazero-inflatedmodel
13.4Problemsandinconsistencieswiththedefinitionofrandom effects
13.5Population-levelandgroup-leveleffectsinBayesianhierarchical models
13.6Fittingmixedmodelsinthefrequentistframework
13.7Statisticalsignificanceandmodelselectioninfrequentistmixed models
13.8Theshrinkageorborrowingstrengtheffectinmixedmodels
13.9FittingmixedmodelsintheBayesianframework
14.4.2Randomizedblockdesign
14.4.3Split-plotdesign
14.4.4Nesteddesign
14.4.5Repeatedmeasuresdesign
15MixedHierarchicalModelsandExperimentalDesignData 15.2.1BinaryGLMMwitharandomizedblockdesign:Frequentist models
15.2.2BinaryGLMMwitharandomizedblockdesign:Bayesian models 407
15.3GaussianGLMMwitharepeatedmeasuresdesign 416
15.3.1GaussianGLMMwitharepeatedmeasuresdesign:Frequentist models 420
15.3.2GaussianGLMMwitharepeatedmeasuresdesign:Bayesian models
15.4BetaGLMMwithasplit-plotdesign 428
15.4.1BetaGLMMwithasplit-plotdesign:Frequentistmodel 432
15.4.2BetaGLMMwithasplit-plotdesign:Bayesianmodel 439 15.5Problems 449
Afterword
AppendixA:ListofRPackagesUsedinThisBook
AppendixB:ExploringandDescribingtheEvidenceinGraphics (onlyavailableonlineat www.oup.com/companion/InchaustiSMWR)
AppendixC:UsingRandRStudio:TheBare-BonesBasics (onlyavailableonlineat www.oup.com/companion/InchaustiSMWR)
Index
PARTI TheConceptualBasisforFitting StatisticalModels CHAPTER1 GeneralIntroduction 1.1 Thepurposeofstatistics Thefirstarticleofthefirstissueof AnnualReviewofStatistics wasentitled“Whatis statistics?”(Fienberg2014).Itstartedbylistingeightdifferentandonlypartlyoverlappingdefinitions.Itishardtoimaginethatchemistsorphysicistswouldprovideasmuch varietywhendefiningtheirowntrades.TheAmericanStatisticalAssociationoffersavery inclusivedefinition:“Statisticsisthescienceoflearningfromdata,andofmeasuring, controllingandcommunicatinguncertainty”(https://www.amstat.org/asa-newsroom). Whilenoteverystatisticianwouldagreewiththis,itservestohighlightthatstatistics isakindofmeta-disciplineaimingtoextractreal-worldinsightsfromdatagathered withinotherrealmsofknowledge(Wildetal.2011).Statisticsisameta-disciplinebecause, indealingwiththefuzziness,imprecision,andvagariesofreal-worlddata,itpushes itspractitionerstoformulate“theoreticalscaffolds”thatcanbeusedonotherareasof knowledge.
Obtaininginsightsfromstatisticsinvolvesspecifyinghypotheses,gatheringdatarelevanttoaproblem,modelingdatawithquantitativemethods,andinterpretingquantitativefindingswithinthespecificcontextofthescientifichypothesesthatmotivated theresearch.Theseactivitiesdonot,andcannot,takeplaceasanintellectualabstraction aimingtosolveproblemswithintheclearlydefinedboundariesofappliedmathematicswherestatisticsissometimesplaced.Mathematiciansoftenneedto(over-)simplifythe contextoftheinitialproblemtobetterdefineanarrower,moreinteresting,andhopefully solvableresearchquestion.Incontrast,instatisticsthecontextisthekeytointerpreting thefindingsofcomputerprintoutsoftablesandgraphsandtotransformingdatainto insightsintermsoftheresearchproblemandhypothesesthatmotivatedthegatheringof evidence.Thepracticeofstatisticsis(orrathershouldbe)somethingfarmoresubtleand interestingthanaquasi-mechanicalquesttocontrastandrejecthypotheseswhenever p <0.05,asyoumighthavelearnedinundergraduatecourses.
“Statisticiansareengagedinanexhaustingbutexhilaratingstrugglewiththebiggest challengethatphilosophymakestoscience:howdowetranslateinformationintoknowledge?”(Senn2003 p.3).Takenatfacevalue,howcanthislaststatementfailtoexciteyou? Statisticiansdealwiththeexcruciatingmessinessofreal-worlddata.Bythatwemean theuncertaintyinthemeasurementsofvariables,thepervasivevariabilityoftheworld, andtheoftenfoggyrelationsbetweenthevariablesthatweaimtouncoverinorderto claimempiricalsupportforascientifichypothesis.Statisticshastotacklethechanceand contingencythatlieentangledwithinreal-worlddata,andwhoseinfluencecanbeaspervasiveasthatofthesignalrelatedtothemainpatternsthatwewishtoreliablyretrieve. Thestatisticalholygrailistouncoveranapproximatestatisticalmodelthatcouldhave plausiblygenerated(andhencefitsacceptablywell)theavailableevidence.Butthisisnot
all.Themagnitudesoftheestimatedparametersofsuchawell-fittingmodelshouldallow theevaluationofastatisticalhypothesisandhaveatangible,real-worldinterpretationin theresearchcontextthatpromptedthedesignoftheexperiment,thegatheringofdata, anditsanalysis.
1.2 Statisticsinaschizophrenicstate? Overthelastcentury,statisticshasfullydevelopedtwotheoreticalframeworks(frequentistandBayesian,tobeexplainedinChapters 2 and 3)thathavecontendedtobecome “therightandappropriate”wayofanalyzingdata.Youwillnotfindpractitionersin otherscientificdisciplinesspillingsomanybarrelsofinkfightingeachotherwithout everachievingcompletevictory.Thesetwoframeworkslargelystemfromtwodifferent viewsofprobabilitythathavecoexistedsincetheseventeenthcentury,andtheirproponentsanddefendershaveengagedinacrimoniousandprotracteddisputesduringmost ofthetwentiethcentury.Thecurrentlydominantfrequentistframeworkisanincoherent blendthatarosefromtheprotractedclashbetweenR.FisherononesideandJ.Neyman andE.Pearsonontheother.ItislikelythatFisherandNeyman/Pearsononlyagreedon theirstrongdislikeanddistrustoftheuseofpriorinformation(again,tobeexplainedin Chapter 2)asasubjectiveandarbitrarycomponentoftheBayesianframeworkthatthey wanteduprootedfromstatistics.Aimingforobjectivityandconclusionsthatareindependentofwhoeveranalyzesthedata,mostofthepracticeofstatisticschampionedunder thefrequentistframeworkhasturnedintoaquasi-mechanizedprocedureaimingtoreject statisticalhypotheses.
Itiscurrentlyfairtosaythataclearmajorityofscientistshavebeeneducatedincourses basedon(andhenceonlyuse)frequentistmethods.However,beingin(arapidlygrowing) minoritydoesnotsuggest,orevenlessproves,thatthechampionsoftheBayesianframeworkare“wrong”byanystretchoftheimagination.Thestruggleforprimacybetween proponentsofthesetwostatisticalframeworkshasbeenlargelyinconclusivethusfar. Atpresent,scientistshaveamoreecumenicalorpragmaticviewofusingwhatseems appropriate,andwhattheyknowbest,tosolvetheproblemathand.Scientistsneeding toemploytheotherframeworkalmostneedtorelearnfromscratch.Thisbookexplains, discusses,andappliesboththefrequentistandBayesianstatisticalframeworkstoanalyze thedifferenttypesofdatathatarecommonlygatheredbyresearchscientistsandstudents.
Thebookinyourhandsaimstopresentmaterialinaninformal,approachable,and progressivemannersuitableforresearchscientistsandgraduatestudentswithamodicumofprevioustraining.Thebookcoversallthematerialinatheoreticallyrigorous manner,focusingonthepracticalapplicationsofallmethodstoactualresearchdata. Itaimstoprovidejustenoughtheoreticalbackgroundforyoutounderstandthebasic underpinningsofthestatisticalmodelsexplainedhere.Everyimportantformulawillbe “translated”intowordstoprovideaclear,non-intimidatingdescriptiontoreaderswith onlyabasicbackgroundinmathematicsandinferentialstatistics.Incontrasttobooks ladenwithmoretheory,thisisa“how-to”book.Itemphasizesteachingbylearningto computeusingR,andtothoroughlyinterprettheresultsfromtheviewpointandneeds ofresearchscientistsandstudents.
1.3 Howisthisbookorganized? Itisunthinkabletocarryoutstatisticalanalysisofmeaningfulamountsdataofeven moderatecomplexitywithoutacomputer.Thisbookwillmakeextensiveuseofthe Rprogrammingenvironment(http://www.r-project.org/).Thisisanopen-source(one
canaccessandeditthecodeofalltheRfunctionsandsavearevisedversioninone’s computer),interpreted(itdoesnotrequirecompilationtobeexecuted)programminglanguageenvironmentforstatisticalcomputingandgraphics.RrunsonLinux,Windows, andmacOS,amongothers,andisthebrainchildofitscreatorsRossIhakaandRobert Gentleman.ItisnowsupportedbytheRFoundationforStatisticalComputing(Thieme 2018).RhasexperiencedphenomenalgrowthsinceAugust1993tobecomeoneofthe mostpopularandfastestgrowingprogramsforstatisticalanalysisandgraphicsworldwide.Beingaprogramminglanguage,Rcanbeeasilyextendedbywritingfunctionsand extensions.ThereisagrowingandveryactiveRcommunitycreatingpackages(more than17,500packagesinApril2021)andprovidinganswersintermsofcodeandexplanationsinmanyactiveandfast-reactingmailinglists.RcodeismostlywrittenintheR languageitself,althoughadvanceduserscanlinkittoothercomputerlanguagessuchas C,C++,FORTRAN,Java,andPythonusingspecificcommandstoassistintheexecution ofcomputer-intensivetasks.
MoststatisticsbooksusingRaimforstandaloneusebyprovidingbrief(andbynecessity incomplete)introductorychaptersabouttheinstallationandbasicuseofR,including thebasiccommandstogenerategraphics.ThisintroductorymaterialaboutRcantakeup severalchapters,often10to20percentoftheoveralllengthofmanystatisticaltextbooks. Therearemanybooksandcompanionwebsitesthatcoverboththebasicstepsforusing Randproducinggraphs:see Beckermanetal.(2017), Lander(2017), Petcheyetal.(2021), and Teetor(2017) forthebasicsofR; HortonandKleinman(2011) and Kabacoff(2011) forsimplegraphics,and AbedinandMittal(2015), Chang(2012),and Teutonico(2015) for ggplot2 graphics.Wefeltitunwisetoprovidethesamematerialinprintyetagain. Thecompanionwebsite(www.oup.com/companion/InchaustiSMWR)containsdetailed informationabouttheinstallationofRinWindows,macOS,andLinux,alongwiththe basicsyntaxforusingandmanipulatingRobjects.Thewebsitealsoprovidesdetailed explanationsformakingbasicplotsinRusingthepackage ggplot2 (Wickham2016), whichisrapidlybecomingthedominantapproachtoproducinggraphicsinR.Fromhere on,allRcodeinthebookwillbeshown in this font and highlighted in gray Whilethecodenecessaryforeachstatisticalanalysiswillbethoroughlyexplainedineach chapter,thecodeusedtomakeallthefigurescanbefoundonthecompanionwebsite toavoiddistractingyoufromunderstandingthemainideas.Youwillalsofindallthe datasetsandscripts(i.e.,textfileswithcommands)foreachchapterinthecompanion website.
Rhasaratherminimalistinterfaceinwhichtheusertypescommandsandobtains statisticalandgraphicalresults.RStudio(https://rstudio.org)hasbecomeaverypopular graphicalinterfacethatmanagestheinteractionbetweentheuserandRwithgreatflexibility.Theinstallationandbasicuseofthisfreegraphicalinterfaceisalsoexplainedon thecompanionwebsite.Nonetheless,allstatisticalandgraphicalanalysesdescribedin thisbookareindependentofwhetheroneusesagraphicalinterfacesuchasRStudio.
Thisbookisorganizedinthreeparts.Part I willprovidethefundamentaldefinitions ofprobabilitythatunderliethefrequentistandBayesianframeworks,anddevelopsthe notionofparameterestimationasthemaingoalofstatisticalinference(Chapter 2).
Chapter 3 thencoversthebasicunderpinningsofthefrequentistandBayesianmethods ofparameterestimation(i.e.,maximumlikelihood,andtheMarkovchainandHamiltonianMonteCarloalgorithms)thatwillbeusedinthedataanalysesofallthechaptersof Parts II and III
Part II representsthebulkofthisbook.Itcoverstheanalysisofthemaintypesofdata gatheredinsocialandnaturalsciencesfrombothfrequentistandBayesianperspectives. Eachdatasetwillbeanalyzedwithbothframeworks.Readersmaychoosetofocuson
separate,largelyself-containedchaptersdependingonthetypeofresponsevariable.However,thesingleeffectsofnumericalandcategoricalexplanatoryvariables(Chapters 4 to 6)shouldbeexaminedasbasicfoundationalaspects.Chapter 7 coversthetheoretical basisofmodelselection(andafewotherthings),againforbothfrequentistandBayesian frameworks.Chapter 8 reviewstheconceptualbasisofthegeneralizedlinearmodelsthat allowviewingmostoftheanalysesexplainedinseparatechaptersofPart II asspecial cases.Theassessmentofstatisticalsignificanceofparameterestimates,thecalculationof confidenceintervals,andtheassessmentofmodelgoodnessoffitarealsocovered.The restofPart II covers,inseparatechapters,theanalysisofdifferenttypesofdatacommonly encounteredinscientificresearchinvolvingbinary,count,proportions,andotherrealvaluedoutcomevariables.Thequalityoffitofallthestatisticalmodelstothedatawill beassessedwithresidualanalysisandrelatedmethods,allofwhichwillbeexplainedin detail.
Part III buildsontheunderstandinggainedinPart II toincorporaterandomor population-leveleffects(Chapter 13).Thisenablestheincorporationofstructureinthe dataimposedbyexperimentalandsurveydesigns(Chapter 14).Itisatthispointthatthe bookreachesitshighestlevelofcomplexity,generality,andusefulness.Asinallchapters ofPart II,theemphasisisplacedonformulatingthestartingstatisticalmodel,fittingthe modelusingeitherthefrequentistorBayesianframework,interpretingandunderstandingthemodeloutputs,assessingthegoodnessoffittothedata,andtranslatingintowords andfiguresthestatisticalfindingsforinterpretation.
Thebookwasstructuredandwrittenassuminganimaginaryreaderinterestedinacquiringabroadandcomprehensiveunderstandingofunivariatestatisticalanalysisaftera basicundergraduatecourseastaughtinmostengineeringandsciencefacultiesaround theworld.Thesesingle-semestercoursesprovideabasicunderstandingofdescriptive statistics(mean,variance,quartiles),thebasicnotionsofprobabilitytheory,aworking knowledgeofsomeprobabilitydistributions(e.g.,normal,binomial),howtocalculate theconfidenceintervalsofatleastthepopulationmean,thebasis(i.e.,typesofstatistical errors,thenotionofstatisticalsignificance)fortestingstatisticalhypothesesaboutthe differencesbetweentwomeans,andhopefullysimplelinearregression.Thebookstarts slowlytoprogressivelybuildabasicunderstandingofthemainconceptsandideasthat willbeusedinsubsequentchapters.
1.4 Howtousethisbook In1963theArgentinianwriterJulioCortázarpublishedtheremarkablebook Hopscotch (or Rayuela forthosewhocanreaditintheSpanishoriginal).Thisnovelhas155mostly shortchapters,99ofwhichwereconsidered“expendable”byitsauthor.Evenmore, JulioCortázarproposedseveralalternativewaysinwhichhisbookcouldbereadasif thechapterswerepiecesofmanydifferentpossiblepuzzlestobeassembledatwillbyits readers.FollowingCortazar’slead,hereareafewsuggestedpathsforusingthisbook:
• IfyoulackareasonableknowledgeofRandhowtomakegraphics,youshoulddefinitelystartbyreadingtheintroductorymaterialaboutRandRgraphicsonthe companionwebsite.
• Shouldyounotbeinterestedinthehistoricalrootsandtheconceptualbasisofthe frequentistandBayesianframeworksoverwhichstatisticianshavespilledsomuch ink,youmayskipChapters 2 and 3.However,pleasehavealookatthefinaltable
ofChapter 3 highlightingthemaindifferencesbetweentheBayesianandfrequentist approachesthatareworthknowingevenifjustforbasicstatisticalliteracy.
• Ifyouarejustinterestedinaspecificdataanalysis(say,logisticregression,factorialanalysisofvariance,countregression), Table2.1 pointstothechaptersyouneeddepending ontheprobabilitydistributionappropriateformodelingeachtypeofresponsevariable. BewarethatyoumayneedtohavealookatpartsofChapter 8 tounderstandcertainkey featuresofthegeneralizedlinearmodelssuchasthelinkfunction.Themainaspectsof incorporatingnumericaland/orcategoricalexplanatoryvariablesinmodelsarecovered inChapters 4 to 6,andtheyarevalidforallmodelscoveredinthisbook.
• IfyouwishtolearneitherfrequentistorBayesianstatistics,youmayonlyreadselected partsofspecificchaptersandsimplydismisstheotherhalf.Butagain,atthispointin thetwenty-firstcenturyitisbecomingessentialforscientiststopossessatleastabroad understandingofthetheoretical/conceptualbasisofbothfrequentistandBayesian frameworksasdiscussedinChapter 3.Youwillneedthebasicsjusttoavoidgetting lostandbeingfooledwhilereadingpapers.
• ReadersonlyinterestedinBayesianstatisticsmayfinditfrustratingtherethereisno singlechapterdevotedtopriors,theperenniallydebatedfeatureofthisframework. StartinginChapter 4,thesettingofpriorsisprogressivelybuiltupincomplexityin differentchapters.Thereisasummaryofthemanynon-exclusivestepsorapproaches todefiningpriorsinthedifferentchaptersonpage323.
• ShouldyoubeinterestedinmodelselectionineitherthefrequentistorBayesianframework,youneedtoreadpartsofChapter 7 toacquireatleastaflavorofhowitisdonein eitherframework.Pleasereadthischapterbeforedoinganymodelselectionwithyour specificdatatype,asunwrittenandoraltraditionshaveplaguedtoomuchofstatistical modelselectioncarriedoutbylifescientists.Althoughthebookhaslimitedemphasis onmodelselectionissues,therearespecificexamplesinChapters 11 and 12.
• Readerswithdatastemmingfromspecificexperimentaldesignsshouldfirstreadthe chapterdealingwiththetypeofdatainPart II,thenhaveatleastaquickreadonthe theoreticalbasisofthemixedmodels(Chapter 13),andthencarryoutthedataanalysis perhapsinspiredbyoneoftheseveralexamplesgiveninthechaptersofPart III.
• Finally,forreaderswishingtoacquireabroadandreasonablyexhaustiveoverview ofunivariatestatistics,theauthorsuggestsstartingwithChapters 4 to 6,jumping toChapter 8 tocoverthebasictheoryofgeneralizedlinearmodels,andthengoing straighttothechapter(s)dealingwiththetypesofdataaccordingto Table2.1.
Whicheverofthesuggested(orother)pathsyoutakethroughthisbook,itisverylikely thatyouwillhavetoflipbackandforthtoimproveorcheckyourunderstandingofa concept,anidea,ortheinterpretationofmodelresults,orsimplythecodeforananalysis orafigure.Inthisregard,whileeachchapterisself-contained,thebookisheavilycrossreferencedtoallowyoutofindyourwaybackandforthbetweenchaptersasneeded.
References Abedin,J.andMittal,H.(2015). RGraphsCookbook,2ndedn.PacktPublishing,Birmingham. Beckerman,A.,Childs,D.,andPetchey,O.(2017). GettingStartedwithR:AnIntroductionfor Biologists.OxfordUniversityPress,Oxford. Chang,W.(2012). RGraphicsCookbook,2ndedn.CRCPress/ChapmanandHall,NewYork. Fienberg,S.(2014).Whatisstatistics? AnnualReviewofStatisticsandApplications,1,1–19.
Horton,N.andKleinman,K.(2011). UsingRforDataManagementStatisticalAnalysisand Graphics.CRCPress/ChapmanandHall,NewYork. Kabacoff,R.(2011). RinAction.ManningPublications,NewYork. LanderJ.(2017). RforEveryone:AdvancedAnalyticsandGraphics,2ndedn.Addison-Wesley,New York.
Petchey,O.Beckerman,A.,Childs,D.,etal.(2021). InsightsfromDatawithR:AnIntroduction fortheLifeandEnvironmentalSciences.OxfordUniversityPress,Oxford. Teetor,P.(2017). RCookbook.O’ReillyPublishing,NewYork. Teutonico,D.(2015). ggplot2Essentials.PacktPublishing,Birmingham. Senn,S.(2003). DicingwithDeath:Chance,RiskandHealing.CambridgeUniversityPress, Cambridge. Thieme,N.(2018).TheRgeneration. Significance,15,14–20. Wickham,H.(2016) ggplot2:ElegantGraphicsforDataAnalysis.Springer,NewYork. Wild,C.,Pfannkuch,M.,andHorton,N.(2011).Towardsmoreaccessibleconceptionsof statisticalinference. JournaloftheRoyalStatisticalSocietyA,174,247–295.
CHAPTER2 StatisticalModeling Ashorthistoricalbackground 2.1 Whatisastatisticalmodel? Usingdatatoteststatisticalhypotheses,tofitempiricalrelations,ortoexploresuggestivepatternsrequiresformulatingstatisticalmodels.Allstatisticaltestsofhypothesesand statisticalestimatorsofparametersarederivedfromstatisticalmodels.Inverygeneral terms,astatisticalmodelcanbedefinedasamathematicalequation(s)havingatleast onevariableexhibitingstochastic(i.e.,probabilistic)variationtorepresenttheinherent uncertaintyofobservingitspotentialvalues.
Thestatisticalmodelsconsideredinthisbookcontainasingleresponsevariable Y reflectingtheeffectof,orthevariationassociatedwith,theexplanatoryvariables X.The lattercanbeanynumberofnumericalvariables,categoricalvariablesdenotinggroups,or combinationsthereof(i.e.,interactionsbetweenexplanatoryvariables).Inallthemodels consideredinthisbook,theresponsevariableisarandomvariablewithanassociated probabilitydistributionwhoseparametersembodyboththeeffectoftheexplanatory variablesandthevariabilityofitspotentialvalues.Statisticalmodelsarethusequations thatcanbeseenasdata-generatingmechanisms.Theycontainexplicitassumptionsthat mayreproducethedataforsomecombinationoftheirparametersandvaluesofthe explanatoryvariables.
Youmightrecallfrompreviousintroductorycoursestheexistenceofprobabilitymass functions(PMFs)andprobabilitydensityfunctions(PDFs)thatareassociatedwithdiscreteandcontinuousrandomvariables,respectively.PMFsandPDFsarecollectivelyalso termed“probabilitydistributions,”andsometimesbotharealsosubsumedundertheterm PDF.Thenamesofsomeprobabilitydistributionsthatmayspringtomindarebinomial, Poisson,normal,andperhapsothers.Whichprobabilitydistributioncouldorshouldbe usedforeachstatisticalmodelessentiallydependsonthemainattributesofitsresponse variable.Ratherthanshowingabestiaryoftheprobabilitydistributionsthatwillbeconsideredinthisbookalongwiththeirequationsandtheirdifferentshapesaccordingto particularparametervalues,wesimplylisttheminrelationtothetypeofdatatowhich theyapply(i.e.,thedomainoftheresponsevariable)inTable 2.1,anddeferfurtherdetails totherespectivechapterswheretheanalysisofeachdatatypeisexplained.Inaddition, youcanfindsuchbestiariesofprobabilitydistributionsinalmostanystatisticsbookon theshelfofthelibraryofyourinstitute,aswellasontheinternet.
Yet,whymusttheresponsevariable Y ofallstatisticalmodelsbearandomvariable? Thereareseverallinesofargumentationforthis(BlitzsteinandHwang2014).Onelineof reasoningisthattherandomnessoftheoutcomevariablesresultsfromtheepistemic uncertainty(afancywayofsayinglimitedknowledge),andthemeasurementerrors