Introduction to python for econometrics, statistics and data analysis kevin sheppard - Discover the

Page 1


https://ebookmass.com/product/introduction-to-python-foreconometrics-statistics-and-data-analysis-kevin-sheppard/

Instant digital products (PDF, ePub, MOBI) ready for you

Download now and discover formats that fit your needs...

Introduction to Python for Econometrics, Statistics and Data Analysis. 5th Edition Kevin Sheppard.

https://ebookmass.com/product/introduction-to-python-for-econometricsstatistics-and-data-analysis-5th-edition-kevin-sheppard/

ebookmass.com

Introduction to Statistics and Data Analysis 6th Edition Roxy Peck

https://ebookmass.com/product/introduction-to-statistics-and-dataanalysis-6th-edition-roxy-peck/

ebookmass.com

Statistics and Data Analysis for Nursing Research (2nd Edition ) 2nd Edition

https://ebookmass.com/product/statistics-and-data-analysis-fornursing-research-2nd-edition-2nd-edition/

ebookmass.com

The Virtues in Psychiatric Practice John R. Peteet

https://ebookmass.com/product/the-virtues-in-psychiatric-practicejohn-r-peteet/

ebookmass.com

Sustainable

Resource Management: Modern Approaches and Contexts 1st Edition Chaudhery Mustansar Hussain

https://ebookmass.com/product/sustainable-resource-management-modernapproaches-and-contexts-1st-edition-chaudhery-mustansar-hussain/

ebookmass.com

Excel VBA Programming For Dummies 6th Edition Dick Kusleika

https://ebookmass.com/product/excel-vba-programming-for-dummies-6thedition-dick-kusleika/

ebookmass.com

Study Guide for Foundations and Adult Health Nursing 8th Edition

https://ebookmass.com/product/study-guide-for-foundations-and-adulthealth-nursing-8th-edition/

ebookmass.com

Lippincott Illustrated Reviews: Biochemistry (Lippincott Illustrated Reviews Series) 7th Edition, (Ebook PDF)

https://ebookmass.com/product/lippincott-illustrated-reviewsbiochemistry-lippincott-illustrated-reviews-series-7th-edition-ebookpdf/

ebookmass.com

Philosophy of Life: German Lebensphilosophie 1870-1920

Prof Frederick C. Beiser

https://ebookmass.com/product/philosophy-of-life-germanlebensphilosophie-1870-1920-prof-frederick-c-beiser/

ebookmass.com

https://ebookmass.com/product/peace-through-self-determinationsuccess-and-failure-of-territorial-autonomy-felix-schulte/

ebookmass.com

3rdEdition,1stRevision

KevinSheppard UniversityofOxford

Monday9th September,2019

ChangessincetheThirdEdition

• Verifiedthatallcodeandexamplesworkcorrectlyagainst2019versionsofmodules.Thenotable packagesandtheirversionsare:

– Python3.7(Preferredversion)

– NumPy:1.16

– SciPy:1.3

– pandas:0.25

– matplotlib:3.1

• Python2.7supporthasbeenofficiallydropped,althoughmostexamplescontinuetoworkwith2.7. DonotPython2.7in2019fornumericalcode.

• Smalltypofixes,thankstoMartonHuebler.

• FixeddirectdownloadofFREDdataduetoAPIchanges,thankstoJesperTermansen.

• ThanksforBillTubbsforadetailedreadandmultipletyporeports.

• Updatedtochangesinlineprofiler(seeCh. 24)

• Updateddeprecationsinpandas.

• Removed hold fromplottingchaptersincethisisnolongerrequired.

• ThanksforGenLiformultipletyporeports.

• TestedallcodeonPyton3.6.Codehasbeentestedagainstthecurrentsetofmodulesinstalledby condaasofFebruary2018.Thenotablepackagesandtheirversionsare:

– NumPy:1.13

– Pandas:0.22

Notestothe3rd Edition

Thiseditionincludesthefollowingchangesfromthesecondedition(August2014):

• RewritteninstallationsectionfocusedexclusivelyonusingContinuum’sAnaconda.

• Python3.5isthedefaultversionofPythoninsteadof2.7.Python3.5(ornewer)iswellsupportedby thePythonpackagesrequiredtoanalyzedataandperformstatisticalanalysis,andbringsomenew usefulfeatures,suchasanewoperatorformatrixmultiplication(@).

• Removeddistinctionbetweenintegersandlongsinbuilt-indatatypeschapter.Thisdistinctionis onlyrelevantforPython2.7.

• dot hasbeenremovedfrommostexamplesandreplacedwith @ toproducemorereadablecode.

• SplitCythonandNumbaintoseparatechapterstohighlighttheimprovedcapabilitiesofNumba.

• VerifiedallcodeworkingoncurrentversionsofcorelibrariesusingPython3.5.

• pandas

– Updatedsyntaxofpandasfunctionssuchas resample.

– Addedpandas Categorical

– Expandedcoverageofpandas groupby.

– Expandedcoverageofdateandtimedatatypesandfunctions.

• Newchapterintroducingstatsmodels,apackagethatfacilitatesstatisticalanalysisofdata.statsmodelsincludesregressionanalysis,GeneralizedLinearModels(GLM)andtime-seriesanalysisusing ARIMAmodels.

ChangessincetheSecondEdition

• Fixedtyposreportedbyareader–thankstoIlyaSorvachev

• CodeverifiedagainstAnaconda2.0.1.

• AddeddiagnostictoolsandasimplemethodtouseexternalcodeintheCythonsection.

• UpdatedtheNumbasectiontoreflectrecentchanges.

• FixedsometyposinthechapteronPerformanceandOptimization.

• AddedexamplesofjoblibandIPython’sclustertothechapteronrunningcodeinparallel.

• Newchapterintroducingobject-orientedprogrammingasamethodtoprovidestructureandorganizationtorelatedcode.

• Addedseaborntotherecommendedpackagelist,andhaveincludeditbedefaultinthegraphics chapter.

• BasedonexperienceteachingPythontoeconomicsstudents,therecommendedinstallationhas beensimplifiedbyremovingthesuggestiontousevirtualenvironment.Thediscussionofvirtual environmentsasbeenmovedtotheappendix.

• Rewrotepartsofthepandaschapter.

• ChangedtheAnacondainstalltousebothcreateandinstall,whichshowshowtoinstalladditional packages.

• Fixedsomemissingpackagesinthedirectinstall.

• ChangedtheconfigurationofIPythontoreflectbestpractices.

• AddedsubsectioncoveringIPythonprofiles.

• SmallsectionaboutSpyderasagoodstartingIDE.

Notestothe2nd Edition

Thiseditionincludesthefollowingchangesfromthefirstedition(March2012):

• ThepreferredinstallationmethodisnowContinuumAnalytics’Anaconda.Anacondaisacomplete scientificstackandisavailableforallmajorplatforms.

• Newchapteronpandas.pandasprovidesasimplebutpowerfultooltomanagedataandperform preliminaryanalysis.Italsogreatlysimplifiesimportingandexportingdata.

• Newchapteronadvancedselectionofelementsfromanarray.

• Numbaprovidesjust-in-timecompilationfornumericPythoncodewhichoftenproduceslargeperformancegainswhenpureNumPysolutionsarenotavailable(e.g.loopingcode).

• Dictionary,setandtuplecomprehensions

• Numeroustypos

• AllcodehasbeenverifiedworkingagainstAnaconda1.7.0.

Chapter1 Introduction

1.1 Background

ThesenotesaredesignedforsomeonenewtostatisticalcomputingwishingtodevelopasetofskillsnecessarytoperformoriginalresearchusingPython.Theyshouldalsobeusefulforstudents,researchersor practitionerswhorequireaversatileplatformforeconometrics,statisticsorgeneralnumericalanalysis (e.g.numericsolutionstoeconomicmodelsormodelsimulation).

Pythonisapopulargeneral–purposeprogramminglanguagethatiswellsuitedtoawiderangeofproblems.1 RecentdevelopmentshaveextendedPython’srangeofapplicabilitytoeconometrics,statistics,and generalnumericalanalysis.Python–withtherightsetofadd-ons–iscomparabletodomain-specific languagessuchasR,MATLABorJulia.IfyouarewonderingwhetheryoushouldbotherwithPython(or anotherlanguage),anincompletelistofconsiderationsincludes:

YoumightwanttoconsiderRif:

• Youwanttoapplystatisticalmethods.ThestatisticslibraryofRissecondtonone,andRisclearly attheforefrontofnewstatisticalalgorithmdevelopment–meaningyouaremostlikelytofindthat new(ish)procedureinR.

• Performanceisofsecondaryimportance.

• Freeisimportant.

YoumightwanttoconsiderMATLABif:

• Commercialsupportandaclearchanneltoreportissuesisimportant.

• Documentationandorganizationofmodulesaremoreimportantthanthebreadthofalgorithms available.

• Performanceisanimportantconcern.MATLABhasoptimizations,suchasJust-in-Time(JIT)compilationofloops,whichisnotautomaticallyavailableinmostotherpackages.

YoumightwanttoconsiderJuliaif:

1Accordingtotherankingon http://www.tiobe.com/tiobe-index/,Pythonisthe5th mostpopularlanguage. http:// langpop.corger.nl/ ranksPythonas4th or5th .

• Performanceinaninteractivebasedlanguageisyourmostimportantconcern.

• Youdon’tmindlearningenoughPythontointerfacewithPythonpackages.TheJuliaecosystemis initsinfancyandabridgetoPythonisusedtoprovideimportantmissingfeatures.

• Youlikelivingonthebleedingedgeandaren’tworriedaboutcodebreakingacrossnewversionsof Julia.

• Youliketodomostthingsyourself.

Havingreadthereasonstochooseanotherpackage,youmaywonderwhyyoushouldconsiderPython.

• Youneedalanguagewhichcanactasanend-to-endsolutionthatallowsaccesstoweb-basedservices,databaseservers,datamanagementandprocessingandstatisticalcomputation.Pythoncan evenbeusedtowriteserver-sideappssuchasadynamicwebsite(seee.g. http://stackoverflow. com),appsfordesktop-classoperatingsystemswithgraphicaluserinterfaces,orappsfortabletsand phonesapps(iOSandAndroid).

• Datahandlingandmanipulation–especiallycleaningandreformatting–isanimportantconcern. PythonissubstantiallymorecapableatdatasetconstructionthaneitherRorMATLAB.

• Performanceisaconcern,butnotatthetopofthelist.2

• Freeisanimportantconsideration–Pythoncanbefreelydeployed,evento100sofserversinona cloud-basedcluster(e.g.AmazonWebServices,GoogleComputeorAzure).

• KnowledgeofPython,asageneralpurposelanguage,iscomplementarytoR/MATLAB/Julia/Ox/GAUSS/Stata.

1.2 Conventions

Thesenoteswillfollowtwoconventions.

1. Codeblockswillbeusedthroughout.

"""A docstring

# Comments appear in a different color

# Reserved keywords are highlighted and as assert break class continue def del elif else except exec finally for from global if import in is lambda not or pass print raise return try while with yield

# Common functions and classes are highlighted in a # different color. Note that these are not reserved,

2PythonperformancecanbemadearbitrarilyclosetoCusingavarietyofmethods,includingNumba(purepython),Cython (C/Pythoncreolelanguage)ordirectlycallingCcode.Moreover,recentadvanceshavesubstantiallyclosedthegapwithrespect tootherJust-in-TimecompiledlanguagessuchasMATLAB.

# and can be used although best practice would be # to avoid them if possible arraymatrix range list True False None

# Long lines are indented

some_text= 'This is a very, very, very, very, very, very, very, very, very, very, very, very long line '

2. Whenacodeblockcontains >>>,thisindicatesthatthecommandisrunninganinteractiveIPython session.Outputwilloftenappearaftertheconsolecommand,andwill not beprecededbyacommandindicator.

>>>x=1.0

>>>x+2

3.0

Ifthecodeblockdoesnotcontaintheconsolesessionindicator,thecodecontainedintheblockis intendedtobeexecutedinastandalonePythonfile.

import numpy as np

x=np.array([1,2,3,4])

y=np.sum(x) print(x) print(y)

1.3 ImportantComponentsofthePythonScientificStack

1.3.1 Python

Python3.5(orlater)isrequired.ThisprovidesthecorePythoninterpreter.Mostoftheexamplesshould workwiththelatestversionofPython2.7aswell.

1.3.2 NumPy

NumPyprovidesasetofarrayandmatrixdatatypeswhichareessentialforstatistics,econometricsand dataanalysis.

1.3.3 SciPy

SciPycontainsalargenumberofroutinesneededforanalysisofdata.Themostimportantincludeawide rangeofrandomnumbergenerators,linearalgebraroutines,andoptimizers.SciPydependsonNumPy.

1.3.4 JupyterandIPython

IPythonprovidesaninteractivePythonenvironmentwhichenhancesproductivitywhendevelopingcode orperforminginteractivedataanalysis.JupyterprovidesagenericsetofinfrastructurethatenablesIPython toberuninavarietyofsettingsincludinganimprovedconsole(QtConsole)orinaninteractivewebbrowserbasednotebook.

1.3.5 matplotlibandseaborn

matplotlibprovidesaplottingenvironmentfor2Dplots,withlimitedsupportfor3Dplotting.seabornis aPythonpackagethatimprovesthedefaultappearanceofmatplotlibplotswithoutanyadditionalcode.

1.3.6 pandas pandasprovideshigh-performancedatastructures.

1.3.7 statsmodels

statsmodelsispandas-awareandprovidesmodelsusedinthestatisticalanalysisofdataincludinglinear regression,GeneralizedLinearModels(GLMs),andtime-seriesmodels(e.g.,ARIMA).

1.3.8 PerformanceModules

Anumberofmodulesareavailabletohelpwithperformance.TheseincludeCythonandNumba.Cython isaPythonmodulewhichfacilitatesusingasimplePython-derivedcreoletowritefunctionsthatcanbe compiledtonative(Ccode)Pythonextensions.Numbausesamethodofjust-in-timecompilationto translateasubsetofPythontonativecodeusingLow-LevelVirtualMachine(LLVM).

1.4 Setup

TherecommendedmethodtoinstallthePythonscientificstackistouseContinuumAnalytics’Anaconda. Appendix 1.A.3 describesamorecomplexinstallationprocedurewithinstructionsfordirectlyinstalling PythonandtherequiredmoduleswhenitisnotpossibletoinstallAnaconda.

ContinuumAnalytics’Anaconda

Anaconda,afreeproductofContinuumAnalytics(www.continuum.io),isavirtuallycompletescientific stackforPython.ItincludesboththecorePythoninterpreterandstandardlibrariesaswellasmostmodulesrequiredfordataanalysis.Anacondaisfreetouseandmodulesforacceleratingtheperformanceof linearalgebraonIntelprocessorsusingtheMathKernelLibrary(MKL)areprovided.ContinuumAnalyticsalsoprovidesotherhigh-performancemodulesforreadinglargedatafilesorusingtheGPUtofurther accelerateperformanceforanadditional,modestcharge.Mostimportantly,installationisextraordinarily easyonWindows,Linux,andOSX.Anacondaisalsosimpletoupdatetothelatestversionusing

condaupdateconda condaupdateanaconda

Windows

InstallationonWindowsrequiresdownloadingtheinstallerandrunning.AnacondacomesinbothPython 2.7and3.xflavors,andthelatestPython3.xisthepreferredchoice.TheseinstructionsuseANACONDA toindicatetheAnacondainstallationdirectory(e.g.thedefaultisC:\Anaconda).Oncethesetuphas completed,openacommandprompt(cmd.exe)andrun

cd ANACONDA\Scripts condaupdateconda condaupdateanaconda condainstallhtml5libseaborn

whichwillfirstensurethatAnacondaisup-to-date. condainstall canbeusedlatertoinstallotherpackagesthatmaybeofinterest.NotethatifAnacondaisinstalledintoadirectoryotherthanthedefault,the fullpathshouldnotcontainUnicodecharactersorspaces.

Notes

TherecommendedsettingsforinstallingAnacondaonWindowsare:

• Installforallusers,whichrequiresadminprivileges.Ifthesearenotavailable,thenchoosethe“Just forme”option,butbeawareofinstallingonapaththatcontainsnon-ASCIIcharacterswhichcan causeissues.

• AddAnacondatotheSystemPATH-ThisisimportanttoensurethatAnacondacommandscanbe runfromthecommandprompt.

• RegisterAnacondaasthesystemPythonunlessyouhaveaspecificreasonnotto(unlikely).

IfAnacondaisnotaddedtothesystempath,itisnecessarytoaddthe ANACONDA and ANACONDA\Scripts directoriestothePATHusing

set PATH=ANACONDA;ANACONDA\Scripts;%PATH% beforerunningPythonprograms.

LinuxandOSX

InstallationonLinuxrequiresexecuting bashAnaconda3-x.y.z-Linux-ISA.sh where x.y.z willdependontheversionbeinginstalledand ISA willbeeitherx86ormorelikelyx86_64. AnacondacomesinbothPython2.7and3.xflavors,andthelatestPython3.xisthepreferredchoice.The OSXinstallerisavailableeitherinaGUIinstalled(pkgformat)orasabashinstallerwhichisinstalled inanidenticalmannertotheLinuxinstallation.Itisstronglyrecommendedthattheanaconda/binis prependedtothepath.Thiscanbeperformedinasession-by-sessionbasisbyentering export PATH=ANACONDA/bin;$PATH

OnLinuxthischangecanbemadepermanentbyenteringthislinein .bashrc whichisahiddenfilelocated in ~/.OnOSX,thislinecanbeaddedto .bash_profile whichislocatedinthehomedirectory(~/).3 Afterinstallationcompletes,execute condaupdateconda condaupdateanaconda condainstallhtml5libseaborn

3Usetheappropriatesettingsfileifusingadifferentshell(e.g. .zshrc forzsh).

whichwillfirstensurethatAnacondaisup-to-dateandthentoinstalltheIntelMathKernellibrary-linked modules,whichprovidesubstantialperformanceimprovements–thispackagerequiresalicensewhich isfreetoacademicusersandlowcosttoothers.Ifacquiringalicenseisnotpossible,omitthisline. condainstall canbeusedlatertoinstallotherpackagesthatmaybeofinterest.

Notes

AllinstructionsforOSXandLinuxassumethat ANACONDA/bin hasbeenaddedtothepath.Ifthisisnot thecase,itisnecessarytorun

cd ANACONDA cd bin

andthenallcommandsmustbeprependedbya . asin

./condaupdateconda

1.5 UsingPython

PythoncanbeprogrammedusinganinteractivesessionusingIPythonorbydirectlyexecutingPython scripts–textfilesthatendwiththeextension.py–usingthePythoninterpreter.

1.5.1 PythonandIPython

Mostofthisintroductionfocusesoninteractiveprogramming,whichhassomedistinctadvantageswhen learningalanguage.ThestandardPythoninteractiveconsoleisverybasicanddoesnotsupportuseful featuressuchastabcompletion.IPython,andespeciallytheQtConsoleversionofIPython,transforms theconsoleintoahighlyproductiveenvironmentwhichsupportsanumberofusefulfeatures:

• Tabcompletion-Afterentering1ormorecharacters,pressingthetabbuttonwillbringupalistof functions,packages,andvariableswhichmatchthetypedtext.Ifthelistofmatchesislarge,pressing tabagainallowsthearrowkeyscanbeusedtobrowseandselectacompletion.

• “Magic”functionwhichmaketaskssuchasnavigatingthelocalfilesystem(using %cd~/directory/ orjust cd~/directory/ assumingthat %automagic ison)orrunningotherPythonprograms(using run program.py)simple.Entering %magic insideandIPythonsessionwillproduceadetaileddescription oftheavailablefunctions.Alternatively, %lsmagic producesasuccinctlistofavailablemagiccommands.Themostusefulmagicfunctionsare

– cd -changedirectory

– edit filename -launchaneditortoedit filename

– ls or ls pattern -listthecontentsofadirectory

– run filename-runthePythonfile filename

– timeit -timetheexecutionofapieceofcodeorfunction

.

• Integratedhelp-WhenusingtheQtConsole,callingafunctionprovidesaviewofthetopofthehelp function.Forexample,entering mean( willproduceaviewofthetop20linesofitshelptext.

• Inlinefigures-BoththeQtConsoleandthenotebookcanalsodisplayfigureinlinewhichproducesa tidy,self-containedenvironment.Thiscanbeenabledbyentering %matplotlibinline inanIPython session.

• Thespecialvariable containsthelastresultintheconsole,andsothemostrecentresultcanbe savedtoanewvariableusingthesyntax x=_

• Supportforprofiles,whichprovidefurthercustomizationofsessions.

1.5.2 LaunchingIPython

OSXandLinux

IPythoncanbestartedbyrunning ipython intheterminal.IPythonusingtheQtConsolecanbestartedusing jupyterqtconsole

AsinglelinelauncheronOSXorLinuxcanbeconstructedusing bash-c "jupyter qtconsole"

Thissinglelinelaunchercanbesavedas filename.commandwhere filename isameaningfulname(e.g. IPython-Terminal)tocreatealauncheronOSXbyenteringthecommand chmod755/FULL/PATH/TO/filename.command

ThesamecommandcantocreateaDesktoplauncheronUbuntubyrunning sudoapt-getinstall--no-install-recommendsgnome-panel gnome-desktop-item-edit~/Desktop/--create-new andthenusingthecommandastheCommandinthedialogthatappears.

Windows(Anaconda)

TorunIPythonopencmdandenter IPython inthestartmenu.StartingIPythonusingtheQtConsoleis similarandissimplycalled QtConsole inthestartmenu.Launching IPython fromthestartmenushould createawindowsimilartothatinfigure 1.1

Next,run

jupyterqtconsole--generate-config intheterminalorcommandprompttogenerateafilenamed jupyter_qtconsole_config.py.Thisfilecontains settingsthatareusefulforcustomizingtheQtConsolewindow.Afewrecommendedmodificationsare

Figure1.1:IPythonrunninginthestandardWindowsconsole(cmd.exe).

c.IPythonWidget.font_size=11

c.IPythonWidget.font_family= "Bitstream Vera Sans Mono"

c.JupyterWidget.syntax_style= 'monokai'

ThesecommandsassumethattheBitstreamVerafontshavebeenlocallyinstalled,whichareavailable from http://ftp.gnome.org/pub/GNOME/sources/ttf-bitstream-vera/1.10/.Opening QtConsole should createawindowsimilartothatinfigure 1.2 (althoughtheappearancemightdiffer)ifyoudidnotusethe recommendationconfiguration.

1.5.3

GettingHelp

HelpisavailableinIPythonsessionsusing help(function).Somefunctions(andmodules)haveverylong helpfiles.WhenusingIPython,thesecanbepagedusingthecommand ?function or function? sothatthe textcanbescrolledusingpageupanddownandqtoquit. ??function or function?? canbeusedtotype theentirefunctionincludingboththedocstringandthecode.

1.5.4 RunningPythonprograms

Whileinteractiveprogrammingisusefulforlearningalanguageorquicklydevelopingsomesimplecode, complexprojectsrequiretheuseofcompleteprograms.ProgramscanberuneitherusingtheIPython magicwork %runprogram.py orbydirectlylaunchingthePythonprogramusingthestandardinterpreter using pythonprogram.py.TheadvantageofusingtheIPythonenvironmentisthatthevariablesusedin theprogramcanbeinspectedaftertheprogramrunhascompleted.DirectlycallingPythonwillrunthe programandthenterminate,andsoitisnecessarytooutputanyimportantresultstoafilesothatthey canbeviewedlater.4

4ProgramscanalsoberuninthestandardPythoninterpreterusingthecommand: exec(compile(open(’filename.py’).read(),’filename.py’,’exec’))

Turn static files into dynamic content formats.

Create a flipbook
Introduction to python for econometrics, statistics and data analysis kevin sheppard - Discover the by Education Libraries - Issuu