Instant ebooks textbook Statistical modeling with r: a dual frequentist and bayesian approach for li

Page 1


https://ebookmass.com/product/statistical-modeling-with-r-a-

Instant digital products (PDF, ePub, MOBI) ready for you

Download now and discover formats that fit your needs...

Bayesian Analysis with Excel and R 1st Edition Conrad Carlberg

https://ebookmass.com/product/bayesian-analysis-with-excel-and-r-1stedition-conrad-carlberg/

ebookmass.com

Statistical Thinking

M. D. Edge

From

Scratch: A Primer For Scientists

https://ebookmass.com/product/statistical-thinking-from-scratch-aprimer-for-scientists-m-d-edge/

ebookmass.com

Applied Data Analysis and Modeling for Energy Engineers and Scientists

https://ebookmass.com/product/applied-data-analysis-and-modeling-forenergy-engineers-and-scientists/

ebookmass.com

Experiencing Intercultural Communication: An Introduction 7th Edition Judith N. Martin

https://ebookmass.com/product/experiencing-interculturalcommunication-an-introduction-7th-edition-judith-n-martin/

ebookmass.com

Guidelines for Managing Abnormal Situations Ccps (Center For Chemical Process Safety)

https://ebookmass.com/product/guidelines-for-managing-abnormalsituations-ccps-center-for-chemical-process-safety/

ebookmass.com

Management Principles for Health Professionals 7th Edition, (Ebook PDF)

https://ebookmass.com/product/management-principles-for-healthprofessionals-7th-edition-ebook-pdf/

ebookmass.com

Deciphering the TCR Repertoire to Solve the COVID-19 Mystery Lucas Gutierrez & John Beckford & Houda Alachkar

https://ebookmass.com/product/deciphering-the-tcr-repertoire-to-solvethe-covid-19-mystery-lucas-gutierrez-john-beckford-houda-alachkar/

ebookmass.com

European Union Politics 7th Edition Micelle Cini

https://ebookmass.com/product/european-union-politics-7th-editionmicelle-cini/

ebookmass.com

Computer Security Fundamentals, Fourth Edition Chuck

Easttom

https://ebookmass.com/product/computer-security-fundamentals-fourthedition-chuck-easttom/

ebookmass.com

Essentials of Business Law 6th Edition Jeffrey

https://ebookmass.com/product/essentials-of-business-law-6th-editionjeffrey-f-beatty/

ebookmass.com

StatisticalModelingWithR

StatisticalModeling WithR AdualfrequentistandBayesian approachforlifescientists

PABLOINCHAUSTI

CentroUniversitarioRegionaldelEste,UniversidaddelaRepública,Uruguay

GreatClarendonStreet,Oxford,OX26DP, UnitedKingdom

OxfordUniversityPressisadepartmentoftheUniversityofOxford. ItfurtherstheUniversity’sobjectiveofexcellenceinresearch,scholarship, andeducationbypublishingworldwide.Oxfordisaregisteredtrademarkof OxfordUniversityPressintheUKandincertainothercountries ©PabloInchausti2023

Themoralrightsoftheauthorhavebeenasserted Impression:1

Allrightsreserved.Nopartofthispublicationmaybereproduced,storedin aretrievalsystem,ortransmitted,inanyformorbyanymeans,withoutthe priorpermissioninwritingofOxfordUniversityPress,orasexpresslypermitted bylaw,bylicenceorundertermsagreedwiththeappropriatereprographics rightsorganization.Enquiriesconcerningreproductionoutsidethescopeofthe aboveshouldbesenttotheRightsDepartment,OxfordUniversityPress,atthe addressabove

Youmustnotcirculatethisworkinanyotherform andyoumustimposethissameconditiononanyacquirer

PublishedintheUnitedStatesofAmericabyOxfordUniversityPress 198MadisonAvenue,NewYork,NY10016,UnitedStatesofAmerica

BritishLibraryCataloguinginPublicationData

Dataavailable

LibraryofCongressControlNumber:2022937827

ISBN978–0–19–285901–3(hbk)

ISBN978–0–19–285902–0(pbk)

DOI:10.1093/oso/9780192859013.001.0001

Printedandboundby CPIGroup(UK)Ltd,Croydon,CR04YY

Coverimage:JohnLund/GettyImages.

LinkstothirdpartywebsitesareprovidedbyOxfordingoodfaithand forinformationonly.Oxforddisclaimsanyresponsibilityforthematerials containedinanythirdpartywebsitereferencedinthiswork.

ForJoana, becausetwoissomuchmorethanone

Preface

Aprefaceistheclosestanauthormayhavetothe“letterofmarque”usedbyEuropean governmentsintheseventeenthcenturytoauthorizepiracywiththesovereign’stacitconsent.Anauthorcanwritejustaboutanythingintheprefacewiththeresignedpermission oftheeditor.

Havingfinishedthebook,Iherebygivemyselfpermissiontowriteinthefirstperson singular.Ihavespentmanythousandsofhoursendlesslywritingandrewritingthisbook overthelast24months.Ithasbeenachallenge,apleasure,athrill,andaburdenatthe sametime.

Atthispoint,Ihavethefollowingunsettlingmixtureoffeelings:

* Reliefandjoy:Finallyitisover.

* Prideandaccomplishment:Ihavedoneit!Ihavedoneit!Ihavedoneit!

* Uncertainty:Wasitworththeeffort?Willitbewell/badlyreceived?

* Insecurity:DidIdouble-checkeverything?Doesithaveanyembarrassingerrors?

Itisnowtimeforthebooktosinkorswimonitsownatthehandsofitsreaders.

Alongtimeago,myundergraduateadvisersentmetofetchsomerainfalldatafrom theVenezuelanMinistryoftheEnvironment.Ireluctantlywent,gotlostinsidetheugly building,failedtogatherthedata,butstumbledbychanceintoanearlyemptyand shabbyMinistrybookstore.ThereIfoundagem:theoriginal1975Spanisheditionof thebook Areografía:estrategiasgeográficasdelasespecies byEduardoRapoport.Hewasa cleverandoriginalArgentinianecologistwhoanticipatedwhatlaterbecameknownas macroecology.ThePergamonPresstranslation(Areography:TheGeographicStrategiesof Species)missedtheprefaceoftheSpanishoriginal,easilythebestprefaceofascientific bookIhaveeverread.IreadRapoport’sprefaceatthebookstore,andrightawayIbought thebookusingallthemoneyIhad,includingthereturnbusfare.Iwalkedthe4.5km home,readingthebookbetweentrafficlights.Recognizingtheabsenceofrulesforwhat theprefaceofascientificbookshouldcontain,Rapoportaimedtobringahumanized depictionofhisCV,sothat“sciencebookswouldleavehomeandbetakentothedentist waitingroom,”andyouknewwhattosayifyouevermeettheauthor.Followingthe masterEduardoRapoport,hereismyattempt.

ThingsthatIlove

* ThedazzlingimaginationsofGabrielGarcíaMárquez,JulioCortázar,andAlejoCarpentier,thedignityofPrimoLevi,thewisdomofUmbertoEco,thehumanityofItalo Calvino, TheMagus byJohnFowles.

* ThemusicandshiningsmileofLouisArmstrong,thesaxophoneofJohnColtrane(his mission:“amasterpiecebymidnight”),andArtTatumplayingthepiano;theBeatles, EricClapton,Sting,andMarkKnopfler;thebluesofTajMahal,Keb’Mo,andBuddy

Guy;thestirringvoicesofAnnieLennox,UteLemper,andMadeleinePeyroux;the lyricsofLeonardCohenandBobDylan.

* Twoscientificheroes:John(JBS)HaldaneandRichardFeynman.

* AlltheMontyPythonfilmsandtheoriginalBBCseries.

* AllPicasso,exceptthepinkperiod;GustavKlimtandVincentvanGogh.

* Let’sshareafullUruguayanbarbecue,includingmollejas(thymus),kidneys,andsweet bloodsausages.ItwouldbeglorioustoenjoysomeFrenchgoatcheese,afreshgreen saladwithendivesandcherrytomatoes,redwineofcourse,andamangoorpassionfruit moussetobringmysoulclosertoearthlyparadise.

* And,aboveall,let’stalk,exchangingstories,books,andanecdotes.Thefoldersofmy memoryholdcountlessmegabytesofhistorical,literary,andscientificinformation, someofwhichmayeveninterestorentertainyouforawhile.AndIcanswiftlychange myopiniononanyissueunderthesunasmanytimesasyoucanmanagetoconvince mewithgoodarguments,sensibleevidence,andamodicumofstraightreasoning.

ThingsIhate

* Socialinjusticeinanyshapeorform.

* Thestupidityofthemilitaryandallitscheerleaders.

* Social,racial,andsexualdiscriminationunderanydisguiseorshade.

* Alltotalitarianideologiesandformsofthought.

* Thestiffness,conservatism,intolerance,andbackwardnessofthetraditionalCatholic Churchandofmanyrecentlycreatedprotestantchurches.

* Pineappleonpizza:ahorrendousmixthatspoilstwogreatthings.

Mystory

Myownbiographyisprettyordinary.Iwillrecalljustafeweventsthatmightperhaps inspireotherstobelieveinthemselves.IwasborninUruguay,asmallcountrythatlies sandwichedbetweenArgentinaandBrazilatthebottomleftoftheworldmap.Afterstartingpublicprimaryschoolthere,IfollowedmyfathertoVenezuela.Istarteduniversity wishingtobecomeanelectricalengineer,butfinallymanagedtograduateinbiologyat thetardyageof26.

Idesperatelywantedtostudymoreandbecomeascientist.Withmythenpartner, wemanagedtogetadmittedtotheStateUniversityofNewYorkatStonyBrook(now StonyBrookUniversity)bysheerluck.InSeptember1992,wegatheredallourmoneyand belongings,gotsomefamilyloans,andtraveledtoNewYork.Welandedthereperfectly unawareofeverything,includinghowtogettotheuniversityfromtheairport.Wehad

expectedtopaythefirstyearofuniversityfees,bettingthatourgoodbackgroundwould allowustoobtaingoodgradesthatmightleadtosomefinancialsupport.Butitturned outthatwedidnothavetopayanyuniversityfeesatall!

Andevenbetter,onedaybeforethestartofmyfirstsemester,anothergraduatestudent chosetotakecareofherillgrandmotheranddeclinedherteachingassistantposition.I wasofferedit,andofcoursetookit:$750amonthminustaxesamountedtotouching heaven.ThenextdayIwenttoteachsomeverybasicbiologyto25puzzledAmerican students.DuringmyfourthdayinanEnglish-speakingcountry,Ibarelyunderstood40% ofwhatthestudentssaid.ButIhadaninspiredideathatsavedme.Ishamelesslytold themofahearingdisabilitythatrequiredthemtospeakslowlyandveryloudlyforme tounderstandthem.Andtheydidittosuchanextentthatmystrangedisabilitymiraculouslydisappearedafterafewweeks.WequicklyboughtaTVtohelptrainmywooden ears.Atfirst,theonlyprogramthatIcouldunderstandwasthe(British)PrimeMinister’sQuestionsthatwasbroadcastonCSPANverylateatnightjustforthe(dis)pleasure ofinsomniacs.Thisverytheatrical,ceremonial,andmostlypointlessweeklyexerciseof BritishpoliticswasmydoortounderstandingspokenEnglish,andthestartingofan anglophiliathatonlyBrexitrecentlyanddefinitelymanagedtocure.

AtStonyBrook,ImetLevGinzburgbychancewhileeatingsandwichesattheDepartmentofEcologyandEvolution.ThisveryintelligentandwittyRussianmathematician becamemyPhDsupervisor.Atfirst,itwasnearlyimpossibletounderstandwhatthisman wastalkingsoquicklyabout.Iusedtosharea6m2 officeinfrontofhis. Лeв oftencalled meintohisofficeusingmundaneexcusestospendmanyhourstalkingandteachingme onaone-to-onebasisasifIwasamedievalapprentice.Theseinteractionsovertheyears shapedmeintoascientist,andaffectedmybrainmorethananythingsincegastrulation. Thewiderangeoftopicsoftheseconversationsincludedmathematics,ecology,classical physics,dynamicalsystems,riskanalysis,philosophyofscience,thelatestbookswewere reading,andwhoknowswhatelse IstillvividlyrecalltwoentireFridayafternoons that Лeв devotedtoteachingmethepuzzlingbasicsofquantummechanics(including theSchrödingerwaveequation)usingasmallgreenblackboardandwhitechalk.Itwas anindescribablepleasuretohavereceivedsuchagiftofhumanknowledgefromyou, мoй дopoгoйдpугинacтaвник.

Igraduatedin1998,andmyItalianpassport(lifelessonfortheyoung:youcannever havetoomanypassports;acquireasmanyaspossiblesincesomemayopenunexpected doors)gotmeanEUfellowshipforapostdocwithJohnLawtonatImperialCollege,UK. IlatermovedtoFrancewhereIlivedandworkedfornineyears.Othermovesfollowing anon-traditionalandhardlystraightpathtookmebacktoUruguay,whereIlivenow.

Iwillnotbotherthereaderwithfurtherdetailsofmyacademicpast.Thereishardly anymeritinvolvedinit.Likeyou,Ihave23pairsofchromosomesineverycell,bloodthe samecolorasyours,andagenomethatdiffersfromyoursandfromMandela’s,Einstein’s, Himmler’s,andStalin’sbyaboutsixmillionDNAbases(~0.06%,anirrelevantdifference sinceonlyabout2%ofourDNAistranslatedintoproteins).Therefore,restassuredthat thereisnothingspecial,unique,orevengoodaboutme.Youcaneasilydobetterthan meifyouwish.

Justtrustmeonthisone.Mostpeoplewhosucceedinlifearethosethatseriously applytheirheartandmindandenergylongenoughtopursuetheirdreamswithstubborn determination.Iamconvincedthatlife(ortheuniverse,orthegods)rewardspersistence andsingle-mindednessoverapparentleapsofinspiredgenius.However,forthatyoufirst needtoholddreamsandambitionsforyourself.Nobodycanteachyoutodreamand

aspiretoahigherfuturethanyourpresent.Dreamingturnsouttobeaspontaneousand personalaffair.Igleanedthenextquote(outofcontext,andoddlyenoughduetoLenin) fromaJulioCortázarbookthatsummarizeswellwhatIwishtoconvey:

Theriftbetweendreamsandrealitycausesnoharmifonlythepersondreamingbelievesseriouslyinhisdream,ifheattentivelyobserveslife,compareshis observationswithhiscastlesintheair,andif,generallyspeaking,heworksconscientiouslyfortheachievementofhisfantasies.Ifthereissomeconnection betweendreamsandlifethenalliswell.

IhavebeenhelpedbeyondthecallofdutybythestaffofOxfordUniversityPress.Ian Sherman,senioreditorofLifeSciences,incrediblyrememberedmeaftera19-yearhiatus and,evenmoresurprisingly,believedinandlikedtheideaofthisbook.Hevariously guided,prompted,keptquiet,andencouragedme,andIcannotthankhimenoughfor allthisandmore.ImustalsothankKatieLakinaforputtingtheproductionofthisbook backontrack,KarenMooreforherdiligentanddedicatedworkduringthetransformation ofmanyfilesintoafinishedbook,andRichardHutchinsonforhisattentiveandcareful copyeditingthatgreatlyimprovedthequalityofthetextthatyouarereading.

ThefreeandopensoftwareRandthemanypackagesusedinthisbookstemfrom thefantasticandcreativeworkofmanygenerousscientistsandprogrammersaroundthe world.Theirincredibleworkhascreatedthecollectivepropertyofstatisticalknowledge thatmadethisbookpossible.WhileIlackthemeanstothankyouall,letmeatleastraise aglasstotoastyouwithendlessgratitude.Ifthereisanyinformaticsgod,itsblessings shouldalsoextendtothecreatorsandmaintainersofLinuxUbuntuandLibreOffice.

SebastiánAguiar,MarcKéry,EnriqueLessa,DanielNaya,andMatíasSchraufkindly read,commentedon,andcorrecteddifferentchaptersofthisbook.Theirinputand feedbackpromptedchangesthatledtoimprovementsandhopefullyfewerembarrassing mistakes.Thestubbornerrors,plaininconsistencies,andstraightomissionsthatmight remainare,ofcourse,mineonly.MelinaAranda,JavierGarcía,DanielNaya,AliciaPonce, andAgustínSáezkindlyprovideddatafromtheirpublishedpapersthatareusedaseither casestudiesorproblemsattheendofsomechapters.IthankAlexandraElbakyanfor allowingmetoaccessanenormousamountofessentialinformationthatIcouldnot otherwisehaveeverdreamedtoreadanduseinthisbook:

Irefusetoindulgeinthetackyfinalsentencesthatendtheprefacesofmanyscientific books:“Lastbutnotleast,Iwanttothank ... fortheirpatienceand ... forthemanyhours Ispent ”Ohno,pleasenotthatagain!ButIwillsaythis:overthelast12years,Ihave beenblessedbeyonddeservingbytheearthlygodstosharemylifewithJoanaGagliardi. Sheismymagnificentpartner,mypassionatelover,myclosefriend,andatrulygreatand beautifulwomanwithashinysoulenvelopedbyalargesmileandalmond-shapedeyes.I havealsohadtheprivilegetosharetheseyearswithFiamma(24)andIahel(20),Joana’s brightdaughterandson,whomIhaveseengrowintotwobeautifuladultswhoarethe betterangelsofmysoul.

Thisisenoughnow.Youdidnotbuythebooktoreadthisbabble.Youwantsomestats, andthatiswhatyouwillfindstartingonthenextpage.Shouldyouhaveanycomments, complaints,remarks,orsuggestions,orhavespottedanysmallorlargeerrors,Iwantto hearfromyou,sopleasewriteto pablo.inchausti.f@gmail.com

Withwarmregards, Pablo

5TheGeneralLinearModelII:Categoricalexplanatory

5.8Aposterioritestsinfrequentistmodels

6.5Analysisofcovariance:Mixingcontinuousandcategorical explanatoryvariables

6.6Analysisofcovariance:Frequentistfitting

6.7Analysisofcovariance:Bayesianfitting

7ModelSelection:One,two,andmoremodelsfittedtothe

7.1Introduction

7.2Theproblemofmodelselection:Parsimonyinstatistics

7.3Modelselectioncriteriainthefrequentistframework:AIC

7.4ModelselectioncriteriaintheBayesianframework:DICand WAIC

7.5Theposteriorpredictivedistributionandposteriorpredictive checks

7.6NowbacktotheWAICandLOO-CV

7.7Priorpredictivedistributions:Arelatively“new”kidontheblock

8TheGeneralizedLinearModel

8.1Introduction

8.2WhatareGLMsmadeof?

8.3FittingGLMs

8.4GoodnessoffitinGLMs

9WhentheResponseVariableisBinary

9.1Introduction

9.2KeyconceptsforbinaryGLMs:Odds,logodds,andadditional linkfunctions

9.3FittingbinaryGLMs

9.4UngroupedbinaryGLM:Frequentistfitting

9.5FurtherissuesaboutvalidatingbinaryGLMs

9.6UngroupedbinaryGLMs:Bayesianfitting

9.7GroupedbinaryGLMs

9.8Problems

10WhentheResponseVariableisaCount,OftenwithMany Zeros

10.1Introduction

10.2Over-dispersion:Acommonproblemwithmanycausesand somesolutions

10.3Plantspeciesrichnessandgeographicalvariables

10.4Modelingofcountswithanexcessofzeros:Zero-inflatedand hurdlemodels

10.4.1Frequentistfittingofazero-inflatedmodel

13.4Problemsandinconsistencieswiththedefinitionofrandom effects

13.5Population-levelandgroup-leveleffectsinBayesianhierarchical models

13.6Fittingmixedmodelsinthefrequentistframework

13.7Statisticalsignificanceandmodelselectioninfrequentistmixed models

13.8Theshrinkageorborrowingstrengtheffectinmixedmodels

13.9FittingmixedmodelsintheBayesianframework

14.4.2Randomizedblockdesign

14.4.3Split-plotdesign

14.4.4Nesteddesign

14.4.5Repeatedmeasuresdesign

15MixedHierarchicalModelsandExperimentalDesignData

15.2.1BinaryGLMMwitharandomizedblockdesign:Frequentist models

15.2.2BinaryGLMMwitharandomizedblockdesign:Bayesian models 407

15.3GaussianGLMMwitharepeatedmeasuresdesign 416

15.3.1GaussianGLMMwitharepeatedmeasuresdesign:Frequentist models 420

15.3.2GaussianGLMMwitharepeatedmeasuresdesign:Bayesian models

15.4BetaGLMMwithasplit-plotdesign 428

15.4.1BetaGLMMwithasplit-plotdesign:Frequentistmodel 432

15.4.2BetaGLMMwithasplit-plotdesign:Bayesianmodel 439 15.5Problems 449

Afterword

AppendixA:ListofRPackagesUsedinThisBook

AppendixB:ExploringandDescribingtheEvidenceinGraphics (onlyavailableonlineat www.oup.com/companion/InchaustiSMWR)

AppendixC:UsingRandRStudio:TheBare-BonesBasics (onlyavailableonlineat www.oup.com/companion/InchaustiSMWR)

Index

PARTI TheConceptualBasisforFitting

StatisticalModels

CHAPTER1

GeneralIntroduction

1.1 Thepurposeofstatistics

Thefirstarticleofthefirstissueof AnnualReviewofStatistics wasentitled“Whatis statistics?”(Fienberg2014).Itstartedbylistingeightdifferentandonlypartlyoverlappingdefinitions.Itishardtoimaginethatchemistsorphysicistswouldprovideasmuch varietywhendefiningtheirowntrades.TheAmericanStatisticalAssociationoffersavery inclusivedefinition:“Statisticsisthescienceoflearningfromdata,andofmeasuring, controllingandcommunicatinguncertainty”(https://www.amstat.org/asa-newsroom). Whilenoteverystatisticianwouldagreewiththis,itservestohighlightthatstatistics isakindofmeta-disciplineaimingtoextractreal-worldinsightsfromdatagathered withinotherrealmsofknowledge(Wildetal.2011).Statisticsisameta-disciplinebecause, indealingwiththefuzziness,imprecision,andvagariesofreal-worlddata,itpushes itspractitionerstoformulate“theoreticalscaffolds”thatcanbeusedonotherareasof knowledge.

Obtaininginsightsfromstatisticsinvolvesspecifyinghypotheses,gatheringdatarelevanttoaproblem,modelingdatawithquantitativemethods,andinterpretingquantitativefindingswithinthespecificcontextofthescientifichypothesesthatmotivated theresearch.Theseactivitiesdonot,andcannot,takeplaceasanintellectualabstraction aimingtosolveproblemswithintheclearlydefinedboundariesofappliedmathematicswherestatisticsissometimesplaced.Mathematiciansoftenneedto(over-)simplifythe contextoftheinitialproblemtobetterdefineanarrower,moreinteresting,andhopefully solvableresearchquestion.Incontrast,instatisticsthecontextisthekeytointerpreting thefindingsofcomputerprintoutsoftablesandgraphsandtotransformingdatainto insightsintermsoftheresearchproblemandhypothesesthatmotivatedthegatheringof evidence.Thepracticeofstatisticsis(orrathershouldbe)somethingfarmoresubtleand interestingthanaquasi-mechanicalquesttocontrastandrejecthypotheseswhenever p <0.05,asyoumighthavelearnedinundergraduatecourses.

“Statisticiansareengagedinanexhaustingbutexhilaratingstrugglewiththebiggest challengethatphilosophymakestoscience:howdowetranslateinformationintoknowledge?”(Senn2003 p.3).Takenatfacevalue,howcanthislaststatementfailtoexciteyou? Statisticiansdealwiththeexcruciatingmessinessofreal-worlddata.Bythatwemean theuncertaintyinthemeasurementsofvariables,thepervasivevariabilityoftheworld, andtheoftenfoggyrelationsbetweenthevariablesthatweaimtouncoverinorderto claimempiricalsupportforascientifichypothesis.Statisticshastotacklethechanceand contingencythatlieentangledwithinreal-worlddata,andwhoseinfluencecanbeaspervasiveasthatofthesignalrelatedtothemainpatternsthatwewishtoreliablyretrieve. Thestatisticalholygrailistouncoveranapproximatestatisticalmodelthatcouldhave plausiblygenerated(andhencefitsacceptablywell)theavailableevidence.Butthisisnot

all.Themagnitudesoftheestimatedparametersofsuchawell-fittingmodelshouldallow theevaluationofastatisticalhypothesisandhaveatangible,real-worldinterpretationin theresearchcontextthatpromptedthedesignoftheexperiment,thegatheringofdata, anditsanalysis.

1.2 Statisticsinaschizophrenicstate?

Overthelastcentury,statisticshasfullydevelopedtwotheoreticalframeworks(frequentistandBayesian,tobeexplainedinChapters 2 and 3)thathavecontendedtobecome “therightandappropriate”wayofanalyzingdata.Youwillnotfindpractitionersin otherscientificdisciplinesspillingsomanybarrelsofinkfightingeachotherwithout everachievingcompletevictory.Thesetwoframeworkslargelystemfromtwodifferent viewsofprobabilitythathavecoexistedsincetheseventeenthcentury,andtheirproponentsanddefendershaveengagedinacrimoniousandprotracteddisputesduringmost ofthetwentiethcentury.Thecurrentlydominantfrequentistframeworkisanincoherent blendthatarosefromtheprotractedclashbetweenR.FisherononesideandJ.Neyman andE.Pearsonontheother.ItislikelythatFisherandNeyman/Pearsononlyagreedon theirstrongdislikeanddistrustoftheuseofpriorinformation(again,tobeexplainedin Chapter 2)asasubjectiveandarbitrarycomponentoftheBayesianframeworkthatthey wanteduprootedfromstatistics.Aimingforobjectivityandconclusionsthatareindependentofwhoeveranalyzesthedata,mostofthepracticeofstatisticschampionedunder thefrequentistframeworkhasturnedintoaquasi-mechanizedprocedureaimingtoreject statisticalhypotheses.

Itiscurrentlyfairtosaythataclearmajorityofscientistshavebeeneducatedincourses basedon(andhenceonlyuse)frequentistmethods.However,beingin(arapidlygrowing) minoritydoesnotsuggest,orevenlessproves,thatthechampionsoftheBayesianframeworkare“wrong”byanystretchoftheimagination.Thestruggleforprimacybetween proponentsofthesetwostatisticalframeworkshasbeenlargelyinconclusivethusfar. Atpresent,scientistshaveamoreecumenicalorpragmaticviewofusingwhatseems appropriate,andwhattheyknowbest,tosolvetheproblemathand.Scientistsneeding toemploytheotherframeworkalmostneedtorelearnfromscratch.Thisbookexplains, discusses,andappliesboththefrequentistandBayesianstatisticalframeworkstoanalyze thedifferenttypesofdatathatarecommonlygatheredbyresearchscientistsandstudents.

Thebookinyourhandsaimstopresentmaterialinaninformal,approachable,and progressivemannersuitableforresearchscientistsandgraduatestudentswithamodicumofprevioustraining.Thebookcoversallthematerialinatheoreticallyrigorous manner,focusingonthepracticalapplicationsofallmethodstoactualresearchdata. Itaimstoprovidejustenoughtheoreticalbackgroundforyoutounderstandthebasic underpinningsofthestatisticalmodelsexplainedhere.Everyimportantformulawillbe “translated”intowordstoprovideaclear,non-intimidatingdescriptiontoreaderswith onlyabasicbackgroundinmathematicsandinferentialstatistics.Incontrasttobooks ladenwithmoretheory,thisisa“how-to”book.Itemphasizesteachingbylearningto computeusingR,andtothoroughlyinterprettheresultsfromtheviewpointandneeds ofresearchscientistsandstudents.

1.3 Howisthisbookorganized?

Itisunthinkabletocarryoutstatisticalanalysisofmeaningfulamountsdataofeven moderatecomplexitywithoutacomputer.Thisbookwillmakeextensiveuseofthe Rprogrammingenvironment(http://www.r-project.org/).Thisisanopen-source(one

canaccessandeditthecodeofalltheRfunctionsandsavearevisedversioninone’s computer),interpreted(itdoesnotrequirecompilationtobeexecuted)programminglanguageenvironmentforstatisticalcomputingandgraphics.RrunsonLinux,Windows, andmacOS,amongothers,andisthebrainchildofitscreatorsRossIhakaandRobert Gentleman.ItisnowsupportedbytheRFoundationforStatisticalComputing(Thieme 2018).RhasexperiencedphenomenalgrowthsinceAugust1993tobecomeoneofthe mostpopularandfastestgrowingprogramsforstatisticalanalysisandgraphicsworldwide.Beingaprogramminglanguage,Rcanbeeasilyextendedbywritingfunctionsand extensions.ThereisagrowingandveryactiveRcommunitycreatingpackages(more than17,500packagesinApril2021)andprovidinganswersintermsofcodeandexplanationsinmanyactiveandfast-reactingmailinglists.RcodeismostlywrittenintheR languageitself,althoughadvanceduserscanlinkittoothercomputerlanguagessuchas C,C++,FORTRAN,Java,andPythonusingspecificcommandstoassistintheexecution ofcomputer-intensivetasks.

MoststatisticsbooksusingRaimforstandaloneusebyprovidingbrief(andbynecessity incomplete)introductorychaptersabouttheinstallationandbasicuseofR,including thebasiccommandstogenerategraphics.ThisintroductorymaterialaboutRcantakeup severalchapters,often10to20percentoftheoveralllengthofmanystatisticaltextbooks. Therearemanybooksandcompanionwebsitesthatcoverboththebasicstepsforusing Randproducinggraphs:see Beckermanetal.(2017), Lander(2017), Petcheyetal.(2021), and Teetor(2017) forthebasicsofR; HortonandKleinman(2011) and Kabacoff(2011) forsimplegraphics,and AbedinandMittal(2015), Chang(2012),and Teutonico(2015) for ggplot2 graphics.Wefeltitunwisetoprovidethesamematerialinprintyetagain. Thecompanionwebsite(www.oup.com/companion/InchaustiSMWR)containsdetailed informationabouttheinstallationofRinWindows,macOS,andLinux,alongwiththe basicsyntaxforusingandmanipulatingRobjects.Thewebsitealsoprovidesdetailed explanationsformakingbasicplotsinRusingthepackage ggplot2 (Wickham2016), whichisrapidlybecomingthedominantapproachtoproducinggraphicsinR.Fromhere on,allRcodeinthebookwillbeshown in this font and highlighted in gray Whilethecodenecessaryforeachstatisticalanalysiswillbethoroughlyexplainedineach chapter,thecodeusedtomakeallthefigurescanbefoundonthecompanionwebsite toavoiddistractingyoufromunderstandingthemainideas.Youwillalsofindallthe datasetsandscripts(i.e.,textfileswithcommands)foreachchapterinthecompanion website.

Rhasaratherminimalistinterfaceinwhichtheusertypescommandsandobtains statisticalandgraphicalresults.RStudio(https://rstudio.org)hasbecomeaverypopular graphicalinterfacethatmanagestheinteractionbetweentheuserandRwithgreatflexibility.Theinstallationandbasicuseofthisfreegraphicalinterfaceisalsoexplainedon thecompanionwebsite.Nonetheless,allstatisticalandgraphicalanalysesdescribedin thisbookareindependentofwhetheroneusesagraphicalinterfacesuchasRStudio.

Thisbookisorganizedinthreeparts.Part I willprovidethefundamentaldefinitions ofprobabilitythatunderliethefrequentistandBayesianframeworks,anddevelopsthe notionofparameterestimationasthemaingoalofstatisticalinference(Chapter 2).

Chapter 3 thencoversthebasicunderpinningsofthefrequentistandBayesianmethods ofparameterestimation(i.e.,maximumlikelihood,andtheMarkovchainandHamiltonianMonteCarloalgorithms)thatwillbeusedinthedataanalysesofallthechaptersof Parts II and III

Part II representsthebulkofthisbook.Itcoverstheanalysisofthemaintypesofdata gatheredinsocialandnaturalsciencesfrombothfrequentistandBayesianperspectives. Eachdatasetwillbeanalyzedwithbothframeworks.Readersmaychoosetofocuson

separate,largelyself-containedchaptersdependingonthetypeofresponsevariable.However,thesingleeffectsofnumericalandcategoricalexplanatoryvariables(Chapters 4 to 6)shouldbeexaminedasbasicfoundationalaspects.Chapter 7 coversthetheoretical basisofmodelselection(andafewotherthings),againforbothfrequentistandBayesian frameworks.Chapter 8 reviewstheconceptualbasisofthegeneralizedlinearmodelsthat allowviewingmostoftheanalysesexplainedinseparatechaptersofPart II asspecial cases.Theassessmentofstatisticalsignificanceofparameterestimates,thecalculationof confidenceintervals,andtheassessmentofmodelgoodnessoffitarealsocovered.The restofPart II covers,inseparatechapters,theanalysisofdifferenttypesofdatacommonly encounteredinscientificresearchinvolvingbinary,count,proportions,andotherrealvaluedoutcomevariables.Thequalityoffitofallthestatisticalmodelstothedatawill beassessedwithresidualanalysisandrelatedmethods,allofwhichwillbeexplainedin detail.

Part III buildsontheunderstandinggainedinPart II toincorporaterandomor population-leveleffects(Chapter 13).Thisenablestheincorporationofstructureinthe dataimposedbyexperimentalandsurveydesigns(Chapter 14).Itisatthispointthatthe bookreachesitshighestlevelofcomplexity,generality,andusefulness.Asinallchapters ofPart II,theemphasisisplacedonformulatingthestartingstatisticalmodel,fittingthe modelusingeitherthefrequentistorBayesianframework,interpretingandunderstandingthemodeloutputs,assessingthegoodnessoffittothedata,andtranslatingintowords andfiguresthestatisticalfindingsforinterpretation.

Thebookwasstructuredandwrittenassuminganimaginaryreaderinterestedinacquiringabroadandcomprehensiveunderstandingofunivariatestatisticalanalysisaftera basicundergraduatecourseastaughtinmostengineeringandsciencefacultiesaround theworld.Thesesingle-semestercoursesprovideabasicunderstandingofdescriptive statistics(mean,variance,quartiles),thebasicnotionsofprobabilitytheory,aworking knowledgeofsomeprobabilitydistributions(e.g.,normal,binomial),howtocalculate theconfidenceintervalsofatleastthepopulationmean,thebasis(i.e.,typesofstatistical errors,thenotionofstatisticalsignificance)fortestingstatisticalhypothesesaboutthe differencesbetweentwomeans,andhopefullysimplelinearregression.Thebookstarts slowlytoprogressivelybuildabasicunderstandingofthemainconceptsandideasthat willbeusedinsubsequentchapters.

1.4 Howtousethisbook

In1963theArgentinianwriterJulioCortázarpublishedtheremarkablebook Hopscotch (or Rayuela forthosewhocanreaditintheSpanishoriginal).Thisnovelhas155mostly shortchapters,99ofwhichwereconsidered“expendable”byitsauthor.Evenmore, JulioCortázarproposedseveralalternativewaysinwhichhisbookcouldbereadasif thechapterswerepiecesofmanydifferentpossiblepuzzlestobeassembledatwillbyits readers.FollowingCortazar’slead,hereareafewsuggestedpathsforusingthisbook:

• IfyoulackareasonableknowledgeofRandhowtomakegraphics,youshoulddefinitelystartbyreadingtheintroductorymaterialaboutRandRgraphicsonthe companionwebsite.

• Shouldyounotbeinterestedinthehistoricalrootsandtheconceptualbasisofthe frequentistandBayesianframeworksoverwhichstatisticianshavespilledsomuch ink,youmayskipChapters 2 and 3.However,pleasehavealookatthefinaltable

ofChapter 3 highlightingthemaindifferencesbetweentheBayesianandfrequentist approachesthatareworthknowingevenifjustforbasicstatisticalliteracy.

• Ifyouarejustinterestedinaspecificdataanalysis(say,logisticregression,factorialanalysisofvariance,countregression), Table2.1 pointstothechaptersyouneeddepending ontheprobabilitydistributionappropriateformodelingeachtypeofresponsevariable. BewarethatyoumayneedtohavealookatpartsofChapter 8 tounderstandcertainkey featuresofthegeneralizedlinearmodelssuchasthelinkfunction.Themainaspectsof incorporatingnumericaland/orcategoricalexplanatoryvariablesinmodelsarecovered inChapters 4 to 6,andtheyarevalidforallmodelscoveredinthisbook.

• IfyouwishtolearneitherfrequentistorBayesianstatistics,youmayonlyreadselected partsofspecificchaptersandsimplydismisstheotherhalf.Butagain,atthispointin thetwenty-firstcenturyitisbecomingessentialforscientiststopossessatleastabroad understandingofthetheoretical/conceptualbasisofbothfrequentistandBayesian frameworksasdiscussedinChapter 3.Youwillneedthebasicsjusttoavoidgetting lostandbeingfooledwhilereadingpapers.

• ReadersonlyinterestedinBayesianstatisticsmayfinditfrustratingtherethereisno singlechapterdevotedtopriors,theperenniallydebatedfeatureofthisframework. StartinginChapter 4,thesettingofpriorsisprogressivelybuiltupincomplexityin differentchapters.Thereisasummaryofthemanynon-exclusivestepsorapproaches todefiningpriorsinthedifferentchaptersonpage323.

• ShouldyoubeinterestedinmodelselectionineitherthefrequentistorBayesianframework,youneedtoreadpartsofChapter 7 toacquireatleastaflavorofhowitisdonein eitherframework.Pleasereadthischapterbeforedoinganymodelselectionwithyour specificdatatype,asunwrittenandoraltraditionshaveplaguedtoomuchofstatistical modelselectioncarriedoutbylifescientists.Althoughthebookhaslimitedemphasis onmodelselectionissues,therearespecificexamplesinChapters 11 and 12.

• Readerswithdatastemmingfromspecificexperimentaldesignsshouldfirstreadthe chapterdealingwiththetypeofdatainPart II,thenhaveatleastaquickreadonthe theoreticalbasisofthemixedmodels(Chapter 13),andthencarryoutthedataanalysis perhapsinspiredbyoneoftheseveralexamplesgiveninthechaptersofPart III.

• Finally,forreaderswishingtoacquireabroadandreasonablyexhaustiveoverview ofunivariatestatistics,theauthorsuggestsstartingwithChapters 4 to 6,jumping toChapter 8 tocoverthebasictheoryofgeneralizedlinearmodels,andthengoing straighttothechapter(s)dealingwiththetypesofdataaccordingto Table2.1.

Whicheverofthesuggested(orother)pathsyoutakethroughthisbook,itisverylikely thatyouwillhavetoflipbackandforthtoimproveorcheckyourunderstandingofa concept,anidea,ortheinterpretationofmodelresults,orsimplythecodeforananalysis orafigure.Inthisregard,whileeachchapterisself-contained,thebookisheavilycrossreferencedtoallowyoutofindyourwaybackandforthbetweenchaptersasneeded.

References

Abedin,J.andMittal,H.(2015). RGraphsCookbook,2ndedn.PacktPublishing,Birmingham. Beckerman,A.,Childs,D.,andPetchey,O.(2017). GettingStartedwithR:AnIntroductionfor Biologists.OxfordUniversityPress,Oxford. Chang,W.(2012). RGraphicsCookbook,2ndedn.CRCPress/ChapmanandHall,NewYork. Fienberg,S.(2014).Whatisstatistics? AnnualReviewofStatisticsandApplications,1,1–19.

Horton,N.andKleinman,K.(2011). UsingRforDataManagementStatisticalAnalysisand Graphics.CRCPress/ChapmanandHall,NewYork. Kabacoff,R.(2011). RinAction.ManningPublications,NewYork. LanderJ.(2017). RforEveryone:AdvancedAnalyticsandGraphics,2ndedn.Addison-Wesley,New York.

Petchey,O.Beckerman,A.,Childs,D.,etal.(2021). InsightsfromDatawithR:AnIntroduction fortheLifeandEnvironmentalSciences.OxfordUniversityPress,Oxford. Teetor,P.(2017). RCookbook.O’ReillyPublishing,NewYork. Teutonico,D.(2015). ggplot2Essentials.PacktPublishing,Birmingham. Senn,S.(2003). DicingwithDeath:Chance,RiskandHealing.CambridgeUniversityPress, Cambridge. Thieme,N.(2018).TheRgeneration. Significance,15,14–20. Wickham,H.(2016) ggplot2:ElegantGraphicsforDataAnalysis.Springer,NewYork. Wild,C.,Pfannkuch,M.,andHorton,N.(2011).Towardsmoreaccessibleconceptionsof statisticalinference. JournaloftheRoyalStatisticalSocietyA,174,247–295.

CHAPTER2 StatisticalModeling

Ashorthistoricalbackground

2.1 Whatisastatisticalmodel?

Usingdatatoteststatisticalhypotheses,tofitempiricalrelations,ortoexploresuggestivepatternsrequiresformulatingstatisticalmodels.Allstatisticaltestsofhypothesesand statisticalestimatorsofparametersarederivedfromstatisticalmodels.Inverygeneral terms,astatisticalmodelcanbedefinedasamathematicalequation(s)havingatleast onevariableexhibitingstochastic(i.e.,probabilistic)variationtorepresenttheinherent uncertaintyofobservingitspotentialvalues.

Thestatisticalmodelsconsideredinthisbookcontainasingleresponsevariable Y reflectingtheeffectof,orthevariationassociatedwith,theexplanatoryvariables X.The lattercanbeanynumberofnumericalvariables,categoricalvariablesdenotinggroups,or combinationsthereof(i.e.,interactionsbetweenexplanatoryvariables).Inallthemodels consideredinthisbook,theresponsevariableisarandomvariablewithanassociated probabilitydistributionwhoseparametersembodyboththeeffectoftheexplanatory variablesandthevariabilityofitspotentialvalues.Statisticalmodelsarethusequations thatcanbeseenasdata-generatingmechanisms.Theycontainexplicitassumptionsthat mayreproducethedataforsomecombinationoftheirparametersandvaluesofthe explanatoryvariables.

Youmightrecallfrompreviousintroductorycoursestheexistenceofprobabilitymass functions(PMFs)andprobabilitydensityfunctions(PDFs)thatareassociatedwithdiscreteandcontinuousrandomvariables,respectively.PMFsandPDFsarecollectivelyalso termed“probabilitydistributions,”andsometimesbotharealsosubsumedundertheterm PDF.Thenamesofsomeprobabilitydistributionsthatmayspringtomindarebinomial, Poisson,normal,andperhapsothers.Whichprobabilitydistributioncouldorshouldbe usedforeachstatisticalmodelessentiallydependsonthemainattributesofitsresponse variable.Ratherthanshowingabestiaryoftheprobabilitydistributionsthatwillbeconsideredinthisbookalongwiththeirequationsandtheirdifferentshapesaccordingto particularparametervalues,wesimplylisttheminrelationtothetypeofdatatowhich theyapply(i.e.,thedomainoftheresponsevariable)inTable 2.1,anddeferfurtherdetails totherespectivechapterswheretheanalysisofeachdatatypeisexplained.Inaddition, youcanfindsuchbestiariesofprobabilitydistributionsinalmostanystatisticsbookon theshelfofthelibraryofyourinstitute,aswellasontheinternet.

Yet,whymusttheresponsevariable Y ofallstatisticalmodelsbearandomvariable? Thereareseverallinesofargumentationforthis(BlitzsteinandHwang2014).Onelineof reasoningisthattherandomnessoftheoutcomevariablesresultsfromtheepistemic uncertainty(afancywayofsayinglimitedknowledge),andthemeasurementerrors

Turn static files into dynamic content formats.

Create a flipbook