Using r for data analysis in social sciences: a research project-oriented approach li - The ebook in

Page 1


https://ebookmass.com/product/using-r-for-data-analysis-in-

Instant digital products (PDF, ePub, MOBI) ready for you

Download now and discover formats that fit your needs...

Data Analysis for the Life Sciences with R 1st Edition

https://ebookmass.com/product/data-analysis-for-the-life-scienceswith-r-1st-edition/

ebookmass.com

Numerical Methods Using Kotlin: For Data Science, Analysis, and Engineering 1st Edition Haksun Li

https://ebookmass.com/product/numerical-methods-using-kotlin-for-datascience-analysis-and-engineering-1st-edition-haksun-li-2/

ebookmass.com

Numerical Methods Using Kotlin: For Data Science, Analysis, and Engineering 1st Edition Haksun Li

https://ebookmass.com/product/numerical-methods-using-kotlin-for-datascience-analysis-and-engineering-1st-edition-haksun-li/

ebookmass.com

Devil's Due: Complete Series Books 1-4 Eva Charles

https://ebookmass.com/product/devils-due-complete-seriesbooks-1-4-eva-charles/

ebookmass.com

The Sermons of John Donne: Volume 8

https://ebookmass.com/product/the-sermons-of-john-donne-volume-8/

ebookmass.com

Health Services Research Methods 2nd Edition, (Ebook PDF)

https://ebookmass.com/product/health-services-research-methods-2ndedition-ebook-pdf/

ebookmass.com

Psychiatric Drugs Explained 7th 7th Edition David Healy

https://ebookmass.com/product/psychiatric-drugs-explained-7th-7thedition-david-healy/

ebookmass.com

Analytic Theology and the Academic Study of Religion

William Wood

https://ebookmass.com/product/analytic-theology-and-the-academicstudy-of-religion-william-wood/

ebookmass.com

Lilleys Pharmacology for Canadian Health Care Practice 4e 4th Edition Kara Sealock

https://ebookmass.com/product/lilleys-pharmacology-for-canadianhealth-care-practice-4e-4th-edition-kara-sealock/

ebookmass.com

Fintech, Digital Currency and the Future of Islamic Finance: Strategic, Regulatory and Adoption Issues in the Gulf Cooperation Council Nafis Alam

https://ebookmass.com/product/fintech-digital-currency-and-the-futureof-islamic-finance-strategic-regulatory-and-adoption-issues-in-thegulf-cooperation-council-nafis-alam/

ebookmass.com

UsingRforDataAnalysis inSocialSciences

UsingRforData Analysisin SocialSciences

AResearchProject-OrientedApproach

QUANLI

OxfordUniversityPressisadepartmentoftheUniversityofOxford.Itfurthers theUniversity’sobjectiveofexcellenceinresearch,scholarship,andeducation bypublishingworldwide.OxfordisaregisteredtrademarkofOxfordUniversity PressintheUKandincertainothercountries.

PublishedintheUnitedStatesofAmericabyOxfordUniversityPress 198MadisonAvenue,NewYork,NY10016,UnitedStatesofAmerica.

©OxfordUniversityPress2018

Allrightsreserved.Nopartofthispublicationmaybereproduced,storedin aretrievalsystem,ortransmitted,inanyformorbyanymeans,withoutthe priorpermissioninwritingofOxfordUniversityPress,orasexpresslypermitted bylaw,bylicenseorundertermsagreedwiththeappropriatereproduction rightsorganization.Inquiriesconcerningreproductionoutsidethescopeofthe aboveshouldbesenttotheRightsDepartment,OxfordUniversityPress,atthe addressabove.

Youmustnotcirculatethisworkinanyotherform andyoumustimposethissameconditiononanyacquirer.

LibraryofCongressCataloging-in-PublicationData Names:Li,Quan,1966–author.

Title:UsingRfordataanalysisinsocialsciences:aresearch project-orientedapproach/QuanLi. Description:NewYork,NY:OxfordUniversityPress,[2018] Identifiers:LCCN2017010031|ISBN9780190656225(pbk.)| ISBN9780190656218(hardcover)|ISBN9780190656232(updf)| ISBN9780190656249(epub)Subjects:LCSH:Socialsciences–Research–Data processing.|Socialsciences–Statisticalmethods.|R(Computerprogramlanguage) Classification:LCCH61.3.L522018|DDC330.285/5133–dc23 LCrecordavailableathttps://lccn.loc.gov/2017010031

135798642

PaperbackprintedbyWebCom,Inc.,Canada HardbackprintedbyBridgeportNationalBindery,Inc.,UnitedStatesofAmerica

CONTENTS

ListofFigures ix

ListofTables xi

Acknowledgments xiii

Introduction xv

1.LearnaboutRandWriteFirstToyPrograms 1

WHENTOUSERINARESEARCHPROJECT 2

ESSENTIALSABOUTR 3

HOWTOSTARTAPROJECTFOLDERANDWRITEOURFIRSTRPROGRAM 4

CREATE,DESCRIBE,ANDGRAPHAVECTOR:ASIMPLETOYEXAMPLE 7

SIMPLEREAL-WORLDEXAMPLE:DATAFROMIVERSENANDSOSKICE(2006) 23

CHAPTER1:RPROGRAMCODE 28

TROUBLESHOOTANDGETHELP 32

IMPORTANTREFERENCEINFORMATION:SYMBOLS,OPERATORS,ANDFUNCTIONS 34

SUMMARY 35

MISCELLANEOUSQ&ASFORAMBITIOUSREADERS 36

EXERCISES 42

2.GetDataReady:Import,Inspect,andPrepareData 43

PREPARATION 43

IMPORTPENNWORLDTABLE7.0DATASET 45

INSPECTIMPORTEDDATA 49

PREPAREDATAI:VARIABLETYPESANDINDEXING 55

PREPAREDATAII:MANAGEDATASETS 59

PREPAREDATAIII:MANAGEOBSERVATIONS 65

PREPAREDATAIV:MANAGEVARIABLES 68

CHAPTER2PROGRAMCODE 78

SUMMARY 85

MISCELLANEOUSQ&ASFORAMBITIOUSREADERS 86 EXERCISES 93

3.One-SampleandDifference-of-MeansTests 94

CONCEPTUALPREPARATION 95

DATAPREPARATION 101

WHATISTHEAVERAGEECONOMICGROWTHRATEINTHEWORLDECONOMY? 104

DIDTHEWORLDECONOMYGROWMOREQUICKLYIN1990THANIN1960? 115

CHAPTER3PROGRAMCODE 128

SUMMARY 133

MISCELLANEOUSQ&ASFORAMBITIOUSREADERS 133 EXERCISES 142

4.CovarianceandCorrelation 143

DATAANDSOFTWAREPREPARATIONS 143

VISUALIZETHERELATIONSHIPBETWEENTRADEANDGROWTHUSING SCATTERPLOT 146

ARETRADEOPENNESSANDECONOMICGROWTHCORRELATED? 149

DOESTHECORRELATIONBETWEENTRADEANDGROWTHCHANGEOVERTIME? 154

CHAPTER4PROGRAMCODE 160

SUMMARY 163

MISCELLANEOUSQ&ASFORAMBITIOUSREADERS 164 EXERCISES 168

5.RegressionAnalysis 170

CONCEPTUALPREPARATION:HOWTOUNDERSTANDREGRESSIONANALYSIS 171

DATAPREPARATION 175

VISUALIZEANDINSPECTDATA 182

HOWTOESTIMATEANDINTERPRETOLSMODELCOEFFICIENTS 185

HOWTOESTIMATESTANDARDERROROFCOEFFICIENT 187

HOWTOMAKEANINFERENCEABOUTTHEPOPULATIONPARAMETER OFINTEREST 188

HOWTOINTERPRETOVERALLMODELFIT 190

HOWTOPRESENTSTATISTICALRESULTS 193

CHAPTER5PROGRAMCODE 194

SUMMARY 198

MISCELLANEOUSQ&ASFORAMBITIOUSREADERS 199 EXERCISES 204

6.RegressionDiagnosticsandSensitivityAnalysis 206

WHYAREOLSASSUMPTIONSANDDIAGNOSTICSIMPORTANT? 206

DATAPREPARATION 211

LINEARITYANDMODELSPECIFICATION 215

PERFECTANDHIGHMULTICOLLINEARITY 221

CONSTANTERRORVARIANCE 223

INDEPENDENCEOFERRORTERMOBSERVATIONS 227

INFLUENTIALOBSERVATIONS 240

NORMALITYTEST 245

REPORTFINDINGS 247

CHAPTER6PROGRAMCODE 251

SUMMARY 259

MISCELLANEOUSQ&ASFORAMBITIOUSREADERS 259 EXERCISES 262

7.ReplicationofFindingsinPublishedAnalyses 263

WHATEXPLAINSTHEGEOGRAPHICSPREADOFMILITARIZEDINTERSTATEDISPUTES?

REPLICATIONANDDIAGNOSTICSOFBRAITHWAITE(2006) 264

DOESRELIGIOSITYINFLUENCEINDIVIDUALATTITUDESTOWARDINNOVATION?

REPLICATIONOFBÉNABOUETAL.(2015) 284

CHAPTER7PROGRAMCODE 295

SUMMARY 301

8.Appendix:ABriefIntroductiontoAnalyzingCategorical DataandFindingMoreData 302

OBJECTIVE 302

GETTINGDATAREADY 303

DOMENANDWOMENDIFFERINSELF-REPORTEDHAPPINESS? 304

DOBELIEVERSINGODANDNON-BELIEVERSDIFFERINSELF-REPORTED HAPPINESS? 310

SOURCESOFSELF-REPORTEDHAPPINESS:LOGISTICREGRESSION 313 WHERETOFINDMOREDATA 323

ReferencesandReadings 327 Index 331

LISTOFFIGURES

1.1HowtoWriteFirstToyPrograminR 8

1.2HowtoInstallAdd-onPackage 18

1.3DistributionofDiscreteVariablevd$v1:BarChart 21

1.4DistributionofContinuousVariablevd$v1:Boxplotand Histogram 23

1.5DistributionofWageInequalityfromIversenand Soskice(2006) 27

1.6DistributionofPRandMajoritarianSystemsfromIversenand Soskice(2006) 27

1.7RStudioScreenshot 38

2.1UsingView()FunctiontoViewRawData 50

2.2DistributionofVariablergdpl 55

3.1TypesofErrorsandAlternativeSamplingDistributions 100

3.2HistogramforGrowth 113

3.3Meanand95%ConfidenceIntervalforGrowth 114

3.4Meanand95%ConfidenceIntervalforGrowth:1960and1990 127

4.1SimulatedPositiveCorrelationsofTwoRandomVariables 147

4.2ScatterPlotofTradeOpennessandEconomicGrowth 148

4.3CorrelationbetweenTradeandGrowthoverTime 157

4.4 P ValueofCorrelationbetweenTradeandGrowthoverTime 159

4.5AnscombeQuartetScatterPlot 166

5.1OriginalStatisticalResultsfromFrankelandRomer(1999) 174

5.2ComparingUnloggedandLoggedIncomeperPerson 184

5.3TradeOpennessandLogofIncomeperPerson 184

5.4CoefficientsPlotforModel1 194

5.5PartialRegressionPlot 203

5.6ExplorePairwiseRelationshipsamongVariables 204

6.1AnscombeQuartetRegressions 210

6.2AnscombeQuartetResidualsversusFittedValuesPlots 211

6.3DiagnosticPlotsforaWell-BehavedRegression 212

6.4ResidualsversusFittedValues:Linearity 216

6.5ResidualsversusIndependentVariables:Linearity 217

6.6TradeOpennessandLogofIncomeperPerson 220

6.7DistributionofResidualsbyRegion 228

6.8ScatterPlotofTradeandIncomebyRegion 230

6.9EstimatedEffectofTradeonIncomebyRegion 237

6.10InfluencePlotofInfluentialObservations 241

6.11InfluentialObservationsAboveCook’sDThreshold 243

6.12NormalityAssumptionDiagnosticPlot 245

7.1RegressionDiagnosticPlot:ResidualsversusFittedValues 274

7.2DiagnosticPlotforInfluentialObservations:Cook’sD 278

7.3NormalityAssumptionDiagnosticPlot 281

8.1SamplePagefromWorldValuesSurveyCodebook 303

LISTOFTABLES

1.1CountryMeansforVariablesUsedinRegressionAnalysis (fromIversonandSoskice,2006) 24

1.2StatisticsofImportedDatafromIversenandSoskice(2006) 26

1.3ImportantSymbolsinR 34

1.4ArithmeticOperators 35

1.5LogicalOperators 35

1.6CommonStatisticalandMathematicalFunctions 36

2.1ListofDataPreparationTasksandRelatedRFunctions 46

3.1LogicofStatisticalInference 96

3.2Two-SampleDifference-of-MeansTests 123

5.1CoefficientInterpretationinLogorUnloggedModels 175

5.2DescriptiveStatisticsofFinalDataset 183

5.3EffectofTradeOpennessonRealIncomeperPerson 193

6.1RegressionResultsUsingAnscombe’sQuartet 209

6.2EffectofTradeonIncome:RobustnessChecksPartI 249

6.3EffectofTradeonIncome:RobustnessChecksPartII 250

7.1VariableMeasuresandExpectedEffects 266

7.2OLSRegressionofDisputeDispersion(OriginalStatisticalResults TablefromBraithwaite,2006) 267

7.3OriginalDescriptiveStatisticsTableinBraithwaite(2006) 269

7.4CausesofSpreadofMilitaryDisputes:ReplicationandRobustness Tests 282

7.5MostImportantQualitiesforChildrentoHave(fromBénabouetal., 2015) 285

7.6VariableLabelsforDatasetinBénabouetal.(2015) 288

7.7ReplicatingTable2inBénaboutetal.(2015) 293

ACKNOWLEDGMENTS

Fiveoriginaltablesfromfourdifferentjournalarticlesarereprintedinthebook forreplicationexercises.Thearticlesinclude(1)Iversen,Torben,andDavid Soskice,2006,“ElectoralInstitutionsandthePoliticsofCoalitions:WhySome DemocraciesRedistributeMoreThanOthers,”AmericanPoliticalScienceReview 100(2):165–81,TableA1.Copyright:CambridgeUniversityPress.(2)Frankel, JeffreyA.,andDavidRomer,1999.“DoesTradeCauseGrowth?”American EconomicReview89(3):379–99,Table3.Copyright:AmericanEconomicAssociation.(3)Braithwaite,Alex.2006.“TheGeographicSpreadofMilitarizedDisputes,”JournalofPeaceResearch43(5):507–22,TableIandTableII.Copyright: SAGEPublications.(4)Bénabou,Roland,DavideTicchi,andAndreaVindigni, 2015,“Religionand‘Innovation”’AmericanEconomicReview105(5):346–51, Table2.Copyright:AmericanEconomicAssociation.Permissionstoreprintthe relevanttablesinIversenandSoskice(2006)andBraithwaite(2006)havebeen acquiredandlicensedfromCambridgeUniversityPressandSAGEPublications.

JeffreyFrankel,RolandBénabou,andAmericanEconomicAssociationdeserve specialthanksforgraciouslygrantingmepermissiontoreprinttherelevant tablesintheirarticlesforfree.

Figures1through4inF.J.Anscombe’s“GraphsinStatisticalAnalysis,” publishedin1973in TheAmericanStatistician 27(1):17–21,havebeenadapted andusedwithpermissionofthepublisher,Taylor&FrancisLtdhttp://www. tandfonline.com.

Thisbookwouldnothavebeenpossiblewithouttheencouragement,help, andsupportofmanystudents,colleagues,andfriends.Myundergraduate studentsinPolimetricsandSeniorResearchSeminaratTexasA&MUniversity gavemethefirstimpetustowritethisbook.Manystudentstakingthosetwo courses,especiallyJacobKingandAlexGoodman,caughttyposandmistakesin earlierdrafts.Duringthesummerof2016,ScarletAmo,CorbinCali,Chandler Dawson,andElizabethGohmertexperimentedwithusinganearlierversionof themanuscripttoself-studyRfordataanalysis.Theyprovideddetailedreports

acknowledgments oneachchapterandcompletedindependentapplicationpapers.Theirinputhas dramaticallychangedandimprovedhowvariousmaterialsinthebookarenow presentedandstructured.Ithankthemfortheirextraordinaryworkandeffort. Mygraduateassistants,MollyBerkemeier,KellyMcCaskey,andAustinJohnson, providedexcellenteditorialassistance.Mycolleaguesandfriends,TiyiFeng,Ren Mu,EricaOwen,andCarlisleRainey,readpartsofanearlierdraftandprovided valuablefeedbackandsuggestions.

ManypeopleatOxfordUniversityPresshavehelpedtomakethismanuscript possibleandbetter.ScottParris,whowastheeditorformyfirstbookby CambridgeUniversityPress,hadbeenpatientlyencouragingandproddingme tofinishthisbookuntilhisretirementfromOxford.Happyretirement,Scott! BeforeretiringfromOxford,ScotthandedmycasetoAnneDellinger.Anne’s enthusiasmandencouragementwerethemainreasonthatIdecidedtostaywith Oxford.AfterAnnedepartedfromOxford,DavidPervinbecamemyeditorand offeredsoundadvice.Scott’sassistantCathrynVaulmanandDavid’sassistants EmilyMackenzieandHayleySingertookcareofmanyofthelogisticissues intheprocess.DebbieRuelcorrectedmanyerrorsanddidagreatjobduring copyediting,andLincyPriyapatientlydealtwithmyrequestsandsmoothly handledtheproductionofmybook.XunPangandJudeHaysprovidedvaluable commentsandsuggestionsthathelpedtomakethebookenormouslybetter.

Finally,mygreatestdebtofgratitudeisowedtomywife,Liu,andmytwo children,EllenandAndrew.Withouttheirunyieldingsupport,constantinquiry, andevenreadingpartsofthebookandcheckingmyRcode,Iwouldnothave finishedtheproject.Thisbookisdedicatedtothem!

INTRODUCTION

Thisbookseekstoteachseniorundergraduateandbeginninggraduatestudents insocialscienceshowtouseRtomanage,visualize,andanalyzedatain ordertoanswersubstantiveresearchquestionsandreproducethestatistical analysisinpublishedjournalarticles.Overthepastseveraldecades,statistical analysistraininghasbecomeincreasinglyimportantforundergraduateand graduatestudentsinmanydisciplineswithinsocialandbehavioralsciences,such aseconomics,politicalscience,publicadministration,business,publichealth, anthropology,psychology,sociology,education,andcommunication.Withrapid progressinstatisticalcomputing,proficiencyinusingstatisticalsoftwarehas becomealmostauniversalrequirement,albeittovaryingdegrees,instatistical methodscourses.Popularsoftwarechoicesinclude:SAS,SPSS,Stata,andR. WhileSAS,SPSS,andStataallhaveaccessibleintroductorytextbookstargeting studentsinsocialsciences,suchtextbooksonRarerare.

ComparedwithcommercialpackageslikeSAS,SPSS,andStata,Rhasat leastthreestrengths.Itisawell-thought-out,coherentsystemthatcomes withasuiteofsoftwarefacilitiesfordatamanagement,visualization,and analysis.Inaddition,tomeetemergingneeds,alargecommunityofRusers constantlydevelopsnewopensourceadd-onpackages,alreadyreachingover 10,000.Finally,perhapsthegreatestperkofthesoftwareisthatitisfree.This financialbenefitcannotbeover-emphasized.Cash-strappedcollegestudents oftenfindthemselvesrelyingonlabcomputersforaccesstoSAS,SPSS,and Stata,orconstrainedbythelimitationsofthestudentversionsofthose commercialpackages.Evenpostgraduation,manyfinditdifficulttoconvince theiremployerstopurchaseaparticularcommercialpackagetheyknowfortheir everydayuse.

TherearemanyreasonswhyRispreferredtootherstatisticalsoftware packagesinhighereducation.ButR’sgreatesthandicaptoitswidespreaduse inthesocialsciencesisitssteeplearningcurve.Whilethemarkethasproduced numerousbooksonRatvariouslevels,introductorytextbooksthatfocusonthe

needsofstudentsinthesocialsciencesarenoteasytofind.Thisbookseeksto fillthisvoid.

ThisbookdistinguishesitselffromotherintroductoryRorstatisticsbooksin threeimportantways.First,itintendstoserveasanintroductorytextonusing Rfordataanalysisprojects,targetinganaudiencerarelyexposedtostatistical programming.Therationaleforemphasizingtheintroductorynatureofthis bookissimple;itisdrivenbytheneedsandheterogeneityofthestudentbodywe oftencomeacrossinclassroomteachinginsocialsciencedepartments.Unlike studentsinmathandstatistics,manystudentusersofRinsocialsciences havenoexperienceinanycomputinglanguageorprogrammingsoftware,and manywillneverachieveahigherlevelofprogrammingbeyondwhatisnecessary fortheireverydayuseinR.However,studentsinsocialscienceswillfindthat theopportunitytouseRfordatamanipulation,visualization,andanalysis frequentlypresentsitselfinvariouscoursesandfuturecareers.Hence,they needtobecomeproficientataccomplishingcommontasksindatamanipulation, visualization,andanalysisusingR,withoutgettingoverlytechnical.Inthis respect,existingintroductorytextsonRprogrammingthatdonotinvolve statisticstendtobeoverlycomprehensiveincoverageandareoftengeared towardstudentsinmath,statistics,sciences,andengineering,thusintimidating mostsocialsciencestudents.AlainZuur,ElenaIeno,andErikMeesters’ A Beginner’sGuidetoR andPhilipSpector’s DataManipulationwithR aregood examples.Theirtargetaudiencesoftenarestudentsinmath,statistics,sciences, andengineeringmajorswhohavemoreexperiencesinprogrammingthanfellow classmatesinsocialsciences.

Thisbook,incontrast,adoptsaminimalistapproachinteachingR.Itcovers onlythemostimportantfeaturesandfunctionsinRthatonewillneedforconductingreproducibleresearchprojects,withothermaterialsmovedtochapter appendicesorremovedfromconsiderationcompletely.Risextremelyflexible, almostalwaysallowingmultiplesolutionstooneprogrammingtask.Whilethis isastrength,itdoeschallengebeginningRusersrarelyexposedtocomputer programming.Theminimalistapproachadoptedherewillpresenttypicallyone waytodealwithataskinthemainpartofachapter,leavingotherstuffto asectioncalled“MiscellaneousQuestionsforAmbitiousReaders.”Asaresult, theminimalistapproachshouldflattenthesteeplearningcurve—acommonly noteddisadvantageofR—therebyimprovingthesoftware’saccessibilityto undergraduatesandsimilaraudiences.Organizationally,thisbookbreaksdown chaptersintosmallsectionsthatmimiclabsessionsforstudents.Eachchapter focusesononlytheessentialRfunctionsoneneedstoknowinorderto manipulate,visualize,andanalyzedatatoaccomplishsomeprimarystatistical analysistasks.Intheend,throughthisminimalistapproach,thereaderwill accumulateenoughRknowledgeandskillstocompleteacourseresearchproject andtoself-studymoreadvancedRmaterialsifnecessary.

Aseconduniquefeatureofthisbookisitsemphasisonmeetingthepractical needsofstudentsusingRtoconductstatisticalanalysisforresearchprojects drivenbysubstantivequestionsinsocialsciences.Inadditiontohomework assignmentsandproblemsets,statisticalmethodscoursesinsocialsciences oftenrequirethecompletionofafull-blown,substantivelymotivatedresearch project.Suchtrainingiscriticalifstatisticalknowledgeistoprovetobeofany valueandrelevancetosubstantivecoursesandstudents’futurecareers.Ideally, studentscanutilizecompletedstatisticalanalysispapersaswritingsamplesto showcasetheirquantitativeskillsintheirgraduateschoolorjobapplications.

Inpractice,toaccomplishsuchaprojectonasubstantivequestion,astudent hastocollect,clean,andmanipulatedata,visualizeandanalyzedatasystematicallytoaddressthequestionasked,andreportfindingsinanorganizedmanner. ManyRbooksforintroductorystatisticstendtoemphasizetheRcodesfor statisticaltechniques,givinginsufficientattentiontothepre-analysisneedsof usersaswellastheprocessofcompletingaresearchproject.Forexample,John Verzani’s UsingRforIntroductoryStatistics andMichaelCrawley’s Introductory StatisticsUsingR aretwopopulartextsinthiscategory.Datapreparationisnot linkedtoparticularresearchprojectsthataddresssubstantivequestions.

Incontrast,thisbookiswrittenunderthepremisethatthereaderuses Rprimarilytoaddresssomesubstantivequestionofinterest.Thisleadsto severalnotabledifferencesfromotherintroductorystatisticsbooksusingR.This bookbeginswiththeuseofRtogetanoriginalrawdatasetintoacondition appropriateforstatisticalanalysis,thusemphasizinghowtodealwithvarious issuesthatariseinsuchaprocess.Next,insteadofstartingwiththeinteractive useofR,whichistypicalinothertextbooks,thisbookgivesexclusiveattention towritingandexecutingRprograms.Thisapproachallowseasyverification, recollection,andreplicationofanalysis,anditisalmostalwayshowthings aredoneinactualreproducibleresearch.Studentsfollowingthisapproachwill writemanywell-documentedRcodesthataddressavarietyofpracticalissues suchthattheycansavethoseprogramsforfuturereference.Lastbutnotleast, theuseofRinthisbookiscloselyintegratedintoaprototypicalprocessthat consistsofasequenceofelements:asubstantivequestiontobeanswered,a hypothesisthatanswersthequestion,thelogicofstatisticalinferencebehind theempiricaltestofthehypothesis,theteststatisticforstatisticalinference representedinmathematicalnotationandimplementedcomputationallyinR, andthepresentationoffindingsinanorganizedmanner.Theemphasisison anin-depthunderstandingofwhywedostatisticalanalysisandhowRfits intoactualempiricalresearch.Hence,thisresearchprocess-orproject-oriented approachoughttosignificantlyincreasethelikelihoodthatstudentswillactually useRtosolveproblemsintheirfuturecoursesandcareers.

Athirduniquefeatureofthisbookisitsemphasisonteachingstudents howtoreplicatestatisticalanalysesinpublishedjournalarticles.Scientific

progressrequirespreviousfindingsbereplicableandreplicated;scientificeducation,likeinphysicsandchemistry,alwaysincludeslabexercisesthatreplicatepreviousexperiments.Associalscientificknowledgebecomesincreasingly evidence-basedandreliesonextensivedataanalysis,learningtoreplicate publishedresultsisanecessarystepforundergraduatesandfirst-yeargraduate studentsintheirlearningtoconductsocialscientificresearch.Suchtraining nowbecomesfeasiblebecauseoftheavailabilityofpowerfulfreesoftwareanda widerangeofdatasetsinthepublicdomain.Manyjournalsnowrequireauthors tosubmitanddepositreplicationdatasets.Manyoriginaldatafromsurveys andarchivalresearcharedownloadablefromtheinternet.Studentsnolonger havetobejustpassiveconsumersofsocialscientificresearchbutinsteadcan activelyscrutinizepublishedresearch,playwiththedata,andreproduceorfailto reproducepreviousfindings.Thiswillconvertstudentsfrompassiveconsumers intoactivelearners.Asreproducingresearchfindingsbecomesthenormrather thantheexception,itwillempowerthestudents,lowerthebarriertotheirentry intotheacademiccommunity,andchallengetheprofessorsandotherknowledge producers.Thewidegapbetweenteachingandresearchcommonlyobserved inundergraduatecoursesinsocialscienceswillbenarrowed.Suchchangesare likelytomaketeachingmoreinterestingforprofessors,renderlearningmore fruitfulforstudents,andenablebothpartiestobecomemoresuccessfulintheir endeavors.

Thisbookconsistsofeightchapters.Chapter1introducesR,illustrating howtowriteandexecuteprogramsusingthesoftware.Chapter2goesthrough theprocessof,andvariousmaintasksin,gettingdatareadyforanalysisinR. Chapter3providesaconceptualbackgroundonthelogicofstatisticalinference andthendemonstrateshowtomakestatisticalinferencewithrespecttoone continuousoutcomevariableusingone-andtwo-samplettests.Chapter4moves intoanalyzingtherelationshipbetweentwocontinuousvariables,focusingon covarianceandcorrelation.Chapter5introducesregressionanalysis,covering itsconceptualfoundation,modelspecification,estimation,interpretation,and inference.Chapter6continueswithregressionanalysis,delvingintovarious diagnosticsandsensitivityanalyses.Chapters4through6followthesame approach,integratingconceptualandmathematicalfoundation,datapreparation,statisticalanalysis,andresultsreportingwithineachchapter.Chapter 7walksreadersthroughtheprocessofreplicatingtwopublishedanalyses. Finally,Chapter8,asanappendix,providesabriefintroductiontoanalyzing discretedata,demonstratingtheChi-squaredtestofindependenceandlogistic regression.

Notextbookcanbeperfect;thisoneisnoexception.Theminimalistapproach, emphasizingtheaccessibilityofR,comesataprice.Manycommonlyused functionsandfeaturesofR,suchaswritingfunctionsandloops,arenot covered.Similarly,byfocusingonteachingtheresearchprocessofhowtouse

Rtoaddresssubstantivequestions,thisbookcoversprimarilyexplainingone continuousoutcomevariableandrelevantstatisticaltechniques,suchasmean, differenceofmeans,covariance,correlation,andcross-sectionalregression. Hence,comprehensivenessinbothprogrammingandstatisticsissacrificed,on purpose,forgreateraccessibility,clarity,anddepth.Thegoalistomakethisbook accessibleandusefulfornovicesinbothprogramminganddataanalysis.

Insum,thisbookintegratesRprogramming,thelogicandstepsofstatistical inference,andtheprocessofempiricalsocialscientificresearchinahighly accessibleandstructuredfashion.ItemphasizeslearningtouseRforessential datamanagement,visualization,analysis,andreplicatingpublishedresearch findings.Bytheendofthisbook,studentswillhavelearnedhowtodothe following:(1)useRtoimportdata,inspectdata,identifydatasetattributes, andmanageobservations,variables,anddatasets;(2)useRtographsimple histograms,boxplots,scatterplots,andresearchfindings;(3)useRtosummarizedata,conductone-samplet-test,testthedifference-of-meansbetween groups,computecovarianceandcorrelation,estimateandinterpretordinary leastsquare(OLS)regression,anddiagnoseandcorrectregressionassumption violations;and(4)replicateresearchfindingsinpublishedjournalarticles. The principlebehindthisbookistoteachstudentstolearnaslittleRaspossiblebutto doasmuchsubstantivelydrivendataanalysisatthebeginnerorintermediatelevel aspossible. Theminimalistapproachshoulddramaticallyreducethelearning costbutstillproveadequateformeetingthepracticalresearchneedsofsenior undergraduateandbeginninggraduatestudentsinthesocialsciences.Having completedthisbook,studentscancompetentlyuseRandstatisticalanalysisto answersubstantivequestionsregardingsomesubstantivelyinterestingcontinuousoutcomevariableinacross-sectionaldesign.Itismyhopethat,thenewly acquiredcompetencewillmotivatestudentstowantto,ratherthanbeingforced to,learnmoreaboutRandstatistics.

UsingRforDataAnalysis inSocialSciences

LearnaboutRandWrite

FirstToyPrograms

ChapterObjectives

Inthisfirstchapter,wewillaimtoachievethefollowingobjectives:

1.UnderstandwhentouseRinaresearchproject.

2.LearnaboutthebasicbackgroundofR,softwareinstallation,andgetting help.

3.LearntosetupaprojectfolderforRprogramsanddatafiles.

4.Learntowriteandexecutesimpletoyprograms.

5.LearntofindandsettheworkingdirectoryforaprojectinR.

6.Learntocreateadatavector.

7.Learntocalculatedescriptivestatisticsandhandlemissingvalues.

8.Learntoconvertadatavectorintoadataframe.

9.Learntorefertoavariablewithinadataframe.

10.Learntoinstallanadd-onpackage,"stargazer,"loaditintoR,anduseitto getadescriptivestatisticstable.

11.Learntographthedistributionofavariable.

12.Applyallthelessonslearnedtoareal-worlddataexample.

13.Learnaboutcommoncodingerrorsandhowtogethelp.

Materialsinthischapterneedaboutanhourandahalfforaclassofabout 10studentstocoverinalab,includingbrieflecturingandhands-onpractice. Largerclassesorself-studycouldtakelonger.

WhentoUseRinaResearchProject

Tocompleteanempiricalresearchprojectinvolvesseveralstages,oftenstarting withtheidentificationofaresearchproblemandendingwiththereportof findingsandimplications:

1.Identifyaresearchproblem

2.Surveytheliterature(Findoutwhatisknownabouttheproblem)

3.Formulateatheoreticalargumentandsometestablehypothesis

4.Measureconcepts

5.Collectdata

6.Preparedata

7.Analyzedata

8.Reportfindingsandimplications

Thetasksofidentifyingasignificantandinterestingresearchproblem, surveyingtheextantliterature,formulatingacoherenttheoreticalargumentand sometestablehypothesisthatexplaintheresearchpuzzle,measuringconcepts inthetheoryempirically,andcollectingdatafortheempiricalindicatorsofthe concepts—tasks(1)to(5)—aregenerallydealtwithinsubstantiveandresearch designcoursesinafield.ThosetopicsarebeyondthescopeofthislittleRbook. Yettasks(6)to(8)mayallinvolveRasaresearchinstrument.Specifically,using Rforactualresearchprojectsistoanalyzeparticularresearchproblems,such asevaluatingtheimpactofapolicyortestingtheimpactofacausalfactor(or anindependentvariable)onanoutcome(oradependentvariable)ofinterest, aspostulatedbypre-specifiedtheoreticalexpectations.Howtoaccomplishtasks (6)to(8)willbeillustratedinthefollowingchapters.

Aresearchprojectofthistypepresentsatleasttwochallenges,forwhichR willbeuseful.First,inpractice,suchaprojectinvolvesarangeoftasks,such asimportingdataintosoftware,mergingdifferentdatasetstogether,verifying data,creatingnewvariables,recodingandrenamingvariables,visualizingdata, runningstatisticalestimationprocedures,carryingoutdiagnostictests,andso on.Second,ananalystneedstobeabletoreproducehisorherownanalysis, includingdatasetconstructionandestimationresults,evenyearslater.Thefirst challengeconcernstheefficiencyofananalysis,whereasthesecondconcernsthe reproducibilityandintegrityoftheanalysis.

Toachievebothefficiencyandreproducibility,experiencedanalystsalways choosetowritedowntheircomputingcodeinoneormoreprogramssothat thecodecanbesubmitted,revised,andresubmittedtoreproduceananalysis speedilyandwhenevernecessary.Hence,inthisbook,wewillfocusonhowto writeandsubmitRprogramsforspecifictasksinaprogrameditor,ratherthan theinteractiveuseormenu-driveninterfaceofR.Forallpracticalpurposes,

theprogrammingapproachismuchmoreefficientandconsistentthanthe interactiveormenu-drivenapproach.

BeforewestepintohowtouseR,wewillneedtoclarifysomerelated organizationalandhousekeepingissues.Inthischapter,wewillfirstoffera verybriefintroductiontoR,thendemonstratehowtoinstallR,writeand executeRprograms,installandloadadd-onpackages,andproducegraphical andnumericaloutput,andthenturntoessentialreferenceinformationabout importantsymbolsandcommoncodingerrors.Notably,eachlineofRcodewill likelyappearthreetimes:presentedasastand-alonecommandlineprecededor followedbyanexplanationofitspurposeandfunction,listedtogetherwiththe outputfromitsexecution,andcollatedwithallotherprogramcodeinthechapter forthesakeofconvenientreference.Wewillendthechapterwithasectionabout miscellaneousissuesofinteresttoambitiousreadersandasectiononexercises.

EssentialsaboutR

AOne-ParagraphIntroductiontoR

Risacomputerlanguageandanenvironmentforstatisticalcomputingand graphicswithimportantadvantages.StartedbyRobertGentlemanandRoss IhakaoftheUniversityofAucklandin1995,itisnowmaintainedbytheR core-developmentteamofvolunteerdevelopers.Risreferredtoasacomputer languagebecauseasadialectoftheSlanguagedevelopedinthelate1980s atAT&T’slabs,Rallowsuserstofollowthealgorithms,defineandaddnew functions,andwritenewanalyticmethods,ratherthanmerelysupplyingcanned routines.Risalsoacoherentsystemwhichprovidesanenvironmentwithan integratedsuiteofsoftwarefacilitiesfordatastorage,manipulation,analysis, andvisualization.Inaddition,Risflexible.ItrunsonWindows,UNIX,andMac OSX.Itcanbeeasilyextendedintermsofnewfunctionsandstate-of-the-art statisticalmethods;theover10,000add-onpackagesbytheendofJanuary 2017throughtheCRANfamilyofinternetsitestestifytothisfact.Lastbutnot least,Risfree,asareitsnumerousadd-onpackages.Hence,Rispopularamong practitionersinmanyfieldsandscholarsinmanydisciplines,includingthesocial sciences.

Installation

Asanopensourcesoftwareforstatisticalcomputing,Rcanbeeasilydownloaded fromthefollowingsite:http://www.r-project.org/.Wemaysimplyclickonthe highlighted downloadR linktoreachalistofCRANmirrorsites.Clickingon anysitewepreferdirectsustothepagefordownloadingthesoftwareforthree differentplatforms:Linux,Windows,andMac.Rworksslightlydifferentlyacross

Turn static files into dynamic content formats.

Create a flipbook
Using r for data analysis in social sciences: a research project-oriented approach li - The ebook in by Education Libraries - Issuu