https://ebookmass.com/product/introduction-to-python-foreconometrics-statistics-and-data-analysis-kevin-sheppard/
Instant digital products (PDF, ePub, MOBI) ready for you
Download now and discover formats that fit your needs...
Introduction to Python for Econometrics, Statistics and Data Analysis. 5th Edition Kevin Sheppard.
https://ebookmass.com/product/introduction-to-python-for-econometricsstatistics-and-data-analysis-5th-edition-kevin-sheppard/
ebookmass.com
Introduction to Statistics and Data Analysis 6th Edition Roxy Peck
https://ebookmass.com/product/introduction-to-statistics-and-dataanalysis-6th-edition-roxy-peck/
ebookmass.com
Statistics and Data Analysis for Nursing Research (2nd Edition ) 2nd Edition
https://ebookmass.com/product/statistics-and-data-analysis-fornursing-research-2nd-edition-2nd-edition/
ebookmass.com
The Virtues in Psychiatric Practice John R. Peteet
https://ebookmass.com/product/the-virtues-in-psychiatric-practicejohn-r-peteet/
ebookmass.com
Sustainable Resource Management: Modern Approaches and Contexts 1st Edition Chaudhery Mustansar Hussain https://ebookmass.com/product/sustainable-resource-management-modernapproaches-and-contexts-1st-edition-chaudhery-mustansar-hussain/
ebookmass.com
Excel VBA Programming For Dummies 6th Edition Dick Kusleika
https://ebookmass.com/product/excel-vba-programming-for-dummies-6thedition-dick-kusleika/
ebookmass.com
Study Guide for Foundations and Adult Health Nursing 8th Edition
https://ebookmass.com/product/study-guide-for-foundations-and-adulthealth-nursing-8th-edition/
ebookmass.com
Lippincott Illustrated Reviews: Biochemistry (Lippincott Illustrated Reviews Series) 7th Edition, (Ebook PDF)
https://ebookmass.com/product/lippincott-illustrated-reviewsbiochemistry-lippincott-illustrated-reviews-series-7th-edition-ebookpdf/
ebookmass.com
Philosophy of Life: German Lebensphilosophie 1870-1920
Prof Frederick C. Beiser
https://ebookmass.com/product/philosophy-of-life-germanlebensphilosophie-1870-1920-prof-frederick-c-beiser/
ebookmass.com
https://ebookmass.com/product/peace-through-self-determinationsuccess-and-failure-of-territorial-autonomy-felix-schulte/
ebookmass.com
3rdEdition,1stRevision
KevinSheppard UniversityofOxford
Monday9th September,2019
ChangessincetheThirdEdition • Verifiedthatallcodeandexamplesworkcorrectlyagainst2019versionsofmodules.Thenotable packagesandtheirversionsare:
– Python3.7(Preferredversion)
– NumPy:1.16
– SciPy:1.3
– pandas:0.25
– matplotlib:3.1
• Python2.7supporthasbeenofficiallydropped,althoughmostexamplescontinuetoworkwith2.7. DonotPython2.7in2019fornumericalcode.
• Smalltypofixes,thankstoMartonHuebler.
• FixeddirectdownloadofFREDdataduetoAPIchanges,thankstoJesperTermansen.
• ThanksforBillTubbsforadetailedreadandmultipletyporeports.
• Updatedtochangesinlineprofiler(seeCh. 24)
• Updateddeprecationsinpandas.
• Removed hold fromplottingchaptersincethisisnolongerrequired.
• ThanksforGenLiformultipletyporeports.
• TestedallcodeonPyton3.6.Codehasbeentestedagainstthecurrentsetofmodulesinstalledby condaasofFebruary2018.Thenotablepackagesandtheirversionsare:
– NumPy:1.13
– Pandas:0.22
Notestothe3rd Edition Thiseditionincludesthefollowingchangesfromthesecondedition(August2014):
• RewritteninstallationsectionfocusedexclusivelyonusingContinuum’sAnaconda.
• Python3.5isthedefaultversionofPythoninsteadof2.7.Python3.5(ornewer)iswellsupportedby thePythonpackagesrequiredtoanalyzedataandperformstatisticalanalysis,andbringsomenew usefulfeatures,suchasanewoperatorformatrixmultiplication(@).
• Removeddistinctionbetweenintegersandlongsinbuilt-indatatypeschapter.Thisdistinctionis onlyrelevantforPython2.7.
• dot hasbeenremovedfrommostexamplesandreplacedwith @ toproducemorereadablecode.
• SplitCythonandNumbaintoseparatechapterstohighlighttheimprovedcapabilitiesofNumba.
• VerifiedallcodeworkingoncurrentversionsofcorelibrariesusingPython3.5.
• pandas
– Updatedsyntaxofpandasfunctionssuchas resample.
– Addedpandas Categorical
– Expandedcoverageofpandas groupby.
– Expandedcoverageofdateandtimedatatypesandfunctions.
• Newchapterintroducingstatsmodels,apackagethatfacilitatesstatisticalanalysisofdata.statsmodelsincludesregressionanalysis,GeneralizedLinearModels(GLM)andtime-seriesanalysisusing ARIMAmodels.
ChangessincetheSecondEdition • Fixedtyposreportedbyareader–thankstoIlyaSorvachev
• CodeverifiedagainstAnaconda2.0.1.
• AddeddiagnostictoolsandasimplemethodtouseexternalcodeintheCythonsection.
• UpdatedtheNumbasectiontoreflectrecentchanges.
• FixedsometyposinthechapteronPerformanceandOptimization.
• AddedexamplesofjoblibandIPython’sclustertothechapteronrunningcodeinparallel.
• Newchapterintroducingobject-orientedprogrammingasamethodtoprovidestructureandorganizationtorelatedcode.
• Addedseaborntotherecommendedpackagelist,andhaveincludeditbedefaultinthegraphics chapter.
• BasedonexperienceteachingPythontoeconomicsstudents,therecommendedinstallationhas beensimplifiedbyremovingthesuggestiontousevirtualenvironment.Thediscussionofvirtual environmentsasbeenmovedtotheappendix.
• Rewrotepartsofthepandaschapter.
• ChangedtheAnacondainstalltousebothcreateandinstall,whichshowshowtoinstalladditional packages.
• Fixedsomemissingpackagesinthedirectinstall.
• ChangedtheconfigurationofIPythontoreflectbestpractices.
• AddedsubsectioncoveringIPythonprofiles.
• SmallsectionaboutSpyderasagoodstartingIDE.
Notestothe2nd Edition Thiseditionincludesthefollowingchangesfromthefirstedition(March2012):
• ThepreferredinstallationmethodisnowContinuumAnalytics’Anaconda.Anacondaisacomplete scientificstackandisavailableforallmajorplatforms.
• Newchapteronpandas.pandasprovidesasimplebutpowerfultooltomanagedataandperform preliminaryanalysis.Italsogreatlysimplifiesimportingandexportingdata.
• Newchapteronadvancedselectionofelementsfromanarray.
• Numbaprovidesjust-in-timecompilationfornumericPythoncodewhichoftenproduceslargeperformancegainswhenpureNumPysolutionsarenotavailable(e.g.loopingcode).
• Dictionary,setandtuplecomprehensions
• Numeroustypos
• AllcodehasbeenverifiedworkingagainstAnaconda1.7.0.
Chapter1 Introduction 1.1 Background ThesenotesaredesignedforsomeonenewtostatisticalcomputingwishingtodevelopasetofskillsnecessarytoperformoriginalresearchusingPython.Theyshouldalsobeusefulforstudents,researchersor practitionerswhorequireaversatileplatformforeconometrics,statisticsorgeneralnumericalanalysis (e.g.numericsolutionstoeconomicmodelsormodelsimulation).
Pythonisapopulargeneral–purposeprogramminglanguagethatiswellsuitedtoawiderangeofproblems.1 RecentdevelopmentshaveextendedPython’srangeofapplicabilitytoeconometrics,statistics,and generalnumericalanalysis.Python–withtherightsetofadd-ons–iscomparabletodomain-specific languagessuchasR,MATLABorJulia.IfyouarewonderingwhetheryoushouldbotherwithPython(or anotherlanguage),anincompletelistofconsiderationsincludes:
YoumightwanttoconsiderRif:
• Youwanttoapplystatisticalmethods.ThestatisticslibraryofRissecondtonone,andRisclearly attheforefrontofnewstatisticalalgorithmdevelopment–meaningyouaremostlikelytofindthat new(ish)procedureinR.
• Performanceisofsecondaryimportance.
• Freeisimportant.
YoumightwanttoconsiderMATLABif:
• Commercialsupportandaclearchanneltoreportissuesisimportant.
• Documentationandorganizationofmodulesaremoreimportantthanthebreadthofalgorithms available.
• Performanceisanimportantconcern.MATLABhasoptimizations,suchasJust-in-Time(JIT)compilationofloops,whichisnotautomaticallyavailableinmostotherpackages.
YoumightwanttoconsiderJuliaif:
1Accordingtotherankingon http://www.tiobe.com/tiobe-index/,Pythonisthe5th mostpopularlanguage. http:// langpop.corger.nl/ ranksPythonas4th or5th .
• Performanceinaninteractivebasedlanguageisyourmostimportantconcern.
• Youdon’tmindlearningenoughPythontointerfacewithPythonpackages.TheJuliaecosystemis initsinfancyandabridgetoPythonisusedtoprovideimportantmissingfeatures.
• Youlikelivingonthebleedingedgeandaren’tworriedaboutcodebreakingacrossnewversionsof Julia.
• Youliketodomostthingsyourself.
Havingreadthereasonstochooseanotherpackage,youmaywonderwhyyoushouldconsiderPython.
• Youneedalanguagewhichcanactasanend-to-endsolutionthatallowsaccesstoweb-basedservices,databaseservers,datamanagementandprocessingandstatisticalcomputation.Pythoncan evenbeusedtowriteserver-sideappssuchasadynamicwebsite(seee.g. http://stackoverflow. com),appsfordesktop-classoperatingsystemswithgraphicaluserinterfaces,orappsfortabletsand phonesapps(iOSandAndroid).
• Datahandlingandmanipulation–especiallycleaningandreformatting–isanimportantconcern. PythonissubstantiallymorecapableatdatasetconstructionthaneitherRorMATLAB.
• Performanceisaconcern,butnotatthetopofthelist.2
• Freeisanimportantconsideration–Pythoncanbefreelydeployed,evento100sofserversinona cloud-basedcluster(e.g.AmazonWebServices,GoogleComputeorAzure).
• KnowledgeofPython,asageneralpurposelanguage,iscomplementarytoR/MATLAB/Julia/Ox/GAUSS/Stata.
1.2 Conventions Thesenoteswillfollowtwoconventions.
1. Codeblockswillbeusedthroughout.
"""A docstring
# Comments appear in a different color
# Reserved keywords are highlighted and as assert break class continue def del elif else except exec finally for from global if import in is lambda not or pass print raise return try while with yield
# Common functions and classes are highlighted in a # different color. Note that these are not reserved,
2PythonperformancecanbemadearbitrarilyclosetoCusingavarietyofmethods,includingNumba(purepython),Cython (C/Pythoncreolelanguage)ordirectlycallingCcode.Moreover,recentadvanceshavesubstantiallyclosedthegapwithrespect tootherJust-in-TimecompiledlanguagessuchasMATLAB.
# and can be used although best practice would be # to avoid them if possible arraymatrix range list True False None
# Long lines are indented
some_text= 'This is a very, very, very, very, very, very, very, very, very, very, very, very long line '
2. Whenacodeblockcontains >>>,thisindicatesthatthecommandisrunninganinteractiveIPython session.Outputwilloftenappearaftertheconsolecommand,andwill not beprecededbyacommandindicator.
>>>x=1.0
>>>x+2
3.0
Ifthecodeblockdoesnotcontaintheconsolesessionindicator,thecodecontainedintheblockis intendedtobeexecutedinastandalonePythonfile.
import numpy as np
x=np.array([1,2,3,4])
y=np.sum(x) print(x) print(y)
1.3 ImportantComponentsofthePythonScientificStack 1.3.1 Python
Python3.5(orlater)isrequired.ThisprovidesthecorePythoninterpreter.Mostoftheexamplesshould workwiththelatestversionofPython2.7aswell.
1.3.2 NumPy
NumPyprovidesasetofarrayandmatrixdatatypeswhichareessentialforstatistics,econometricsand dataanalysis.
1.3.3 SciPy
SciPycontainsalargenumberofroutinesneededforanalysisofdata.Themostimportantincludeawide rangeofrandomnumbergenerators,linearalgebraroutines,andoptimizers.SciPydependsonNumPy.
1.3.4 JupyterandIPython
IPythonprovidesaninteractivePythonenvironmentwhichenhancesproductivitywhendevelopingcode orperforminginteractivedataanalysis.JupyterprovidesagenericsetofinfrastructurethatenablesIPython toberuninavarietyofsettingsincludinganimprovedconsole(QtConsole)orinaninteractivewebbrowserbasednotebook.
1.3.5 matplotlibandseaborn
matplotlibprovidesaplottingenvironmentfor2Dplots,withlimitedsupportfor3Dplotting.seabornis aPythonpackagethatimprovesthedefaultappearanceofmatplotlibplotswithoutanyadditionalcode.
1.3.6 pandas pandasprovideshigh-performancedatastructures.
1.3.7 statsmodels statsmodelsispandas-awareandprovidesmodelsusedinthestatisticalanalysisofdataincludinglinear regression,GeneralizedLinearModels(GLMs),andtime-seriesmodels(e.g.,ARIMA).
1.3.8 PerformanceModules Anumberofmodulesareavailabletohelpwithperformance.TheseincludeCythonandNumba.Cython isaPythonmodulewhichfacilitatesusingasimplePython-derivedcreoletowritefunctionsthatcanbe compiledtonative(Ccode)Pythonextensions.Numbausesamethodofjust-in-timecompilationto translateasubsetofPythontonativecodeusingLow-LevelVirtualMachine(LLVM).
1.4 Setup TherecommendedmethodtoinstallthePythonscientificstackistouseContinuumAnalytics’Anaconda. Appendix 1.A.3 describesamorecomplexinstallationprocedurewithinstructionsfordirectlyinstalling PythonandtherequiredmoduleswhenitisnotpossibletoinstallAnaconda.
ContinuumAnalytics’Anaconda Anaconda,afreeproductofContinuumAnalytics(www.continuum.io),isavirtuallycompletescientific stackforPython.ItincludesboththecorePythoninterpreterandstandardlibrariesaswellasmostmodulesrequiredfordataanalysis.Anacondaisfreetouseandmodulesforacceleratingtheperformanceof linearalgebraonIntelprocessorsusingtheMathKernelLibrary(MKL)areprovided.ContinuumAnalyticsalsoprovidesotherhigh-performancemodulesforreadinglargedatafilesorusingtheGPUtofurther accelerateperformanceforanadditional,modestcharge.Mostimportantly,installationisextraordinarily easyonWindows,Linux,andOSX.Anacondaisalsosimpletoupdatetothelatestversionusing
condaupdateconda condaupdateanaconda
Windows
InstallationonWindowsrequiresdownloadingtheinstallerandrunning.AnacondacomesinbothPython 2.7and3.xflavors,andthelatestPython3.xisthepreferredchoice.TheseinstructionsuseANACONDA toindicatetheAnacondainstallationdirectory(e.g.thedefaultisC:\Anaconda).Oncethesetuphas completed,openacommandprompt(cmd.exe)andrun
cd ANACONDA\Scripts condaupdateconda condaupdateanaconda condainstallhtml5libseaborn
whichwillfirstensurethatAnacondaisup-to-date. condainstall canbeusedlatertoinstallotherpackagesthatmaybeofinterest.NotethatifAnacondaisinstalledintoadirectoryotherthanthedefault,the fullpathshouldnotcontainUnicodecharactersorspaces.
Notes
TherecommendedsettingsforinstallingAnacondaonWindowsare:
• Installforallusers,whichrequiresadminprivileges.Ifthesearenotavailable,thenchoosethe“Just forme”option,butbeawareofinstallingonapaththatcontainsnon-ASCIIcharacterswhichcan causeissues.
• AddAnacondatotheSystemPATH-ThisisimportanttoensurethatAnacondacommandscanbe runfromthecommandprompt.
• RegisterAnacondaasthesystemPythonunlessyouhaveaspecificreasonnotto(unlikely).
IfAnacondaisnotaddedtothesystempath,itisnecessarytoaddthe ANACONDA and ANACONDA\Scripts directoriestothePATHusing
set PATH=ANACONDA;ANACONDA\Scripts;%PATH% beforerunningPythonprograms.
LinuxandOSX
InstallationonLinuxrequiresexecuting bashAnaconda3-x.y.z-Linux-ISA.sh where x.y.z willdependontheversionbeinginstalledand ISA willbeeitherx86ormorelikelyx86_64. AnacondacomesinbothPython2.7and3.xflavors,andthelatestPython3.xisthepreferredchoice.The OSXinstallerisavailableeitherinaGUIinstalled(pkgformat)orasabashinstallerwhichisinstalled inanidenticalmannertotheLinuxinstallation.Itisstronglyrecommendedthattheanaconda/binis prependedtothepath.Thiscanbeperformedinasession-by-sessionbasisbyentering export PATH=ANACONDA/bin;$PATH
OnLinuxthischangecanbemadepermanentbyenteringthislinein .bashrc whichisahiddenfilelocated in ~/.OnOSX,thislinecanbeaddedto .bash_profile whichislocatedinthehomedirectory(~/).3 Afterinstallationcompletes,execute condaupdateconda condaupdateanaconda condainstallhtml5libseaborn
3Usetheappropriatesettingsfileifusingadifferentshell(e.g. .zshrc forzsh).
whichwillfirstensurethatAnacondaisup-to-dateandthentoinstalltheIntelMathKernellibrary-linked modules,whichprovidesubstantialperformanceimprovements–thispackagerequiresalicensewhich isfreetoacademicusersandlowcosttoothers.Ifacquiringalicenseisnotpossible,omitthisline. condainstall canbeusedlatertoinstallotherpackagesthatmaybeofinterest.
Notes
AllinstructionsforOSXandLinuxassumethat ANACONDA/bin hasbeenaddedtothepath.Ifthisisnot thecase,itisnecessarytorun
cd ANACONDA cd bin
andthenallcommandsmustbeprependedbya . asin
./condaupdateconda
1.5 UsingPython PythoncanbeprogrammedusinganinteractivesessionusingIPythonorbydirectlyexecutingPython scripts–textfilesthatendwiththeextension.py–usingthePythoninterpreter.
1.5.1 PythonandIPython Mostofthisintroductionfocusesoninteractiveprogramming,whichhassomedistinctadvantageswhen learningalanguage.ThestandardPythoninteractiveconsoleisverybasicanddoesnotsupportuseful featuressuchastabcompletion.IPython,andespeciallytheQtConsoleversionofIPython,transforms theconsoleintoahighlyproductiveenvironmentwhichsupportsanumberofusefulfeatures:
• Tabcompletion-Afterentering1ormorecharacters,pressingthetabbuttonwillbringupalistof functions,packages,andvariableswhichmatchthetypedtext.Ifthelistofmatchesislarge,pressing tabagainallowsthearrowkeyscanbeusedtobrowseandselectacompletion.
• “Magic”functionwhichmaketaskssuchasnavigatingthelocalfilesystem(using %cd~/directory/ orjust cd~/directory/ assumingthat %automagic ison)orrunningotherPythonprograms(using run program.py)simple.Entering %magic insideandIPythonsessionwillproduceadetaileddescription oftheavailablefunctions.Alternatively, %lsmagic producesasuccinctlistofavailablemagiccommands.Themostusefulmagicfunctionsare
– cd -changedirectory
– edit filename -launchaneditortoedit filename
– ls or ls pattern -listthecontentsofadirectory
– run filename-runthePythonfile filename
– timeit -timetheexecutionofapieceofcodeorfunction
.
• Integratedhelp-WhenusingtheQtConsole,callingafunctionprovidesaviewofthetopofthehelp function.Forexample,entering mean( willproduceaviewofthetop20linesofitshelptext.
• Inlinefigures-BoththeQtConsoleandthenotebookcanalsodisplayfigureinlinewhichproducesa tidy,self-containedenvironment.Thiscanbeenabledbyentering %matplotlibinline inanIPython session.
• Thespecialvariable containsthelastresultintheconsole,andsothemostrecentresultcanbe savedtoanewvariableusingthesyntax x=_
• Supportforprofiles,whichprovidefurthercustomizationofsessions.
1.5.2 LaunchingIPython OSXandLinux
IPythoncanbestartedbyrunning ipython intheterminal.IPythonusingtheQtConsolecanbestartedusing jupyterqtconsole
AsinglelinelauncheronOSXorLinuxcanbeconstructedusing bash-c "jupyter qtconsole"
Thissinglelinelaunchercanbesavedas filename.commandwhere filename isameaningfulname(e.g. IPython-Terminal)tocreatealauncheronOSXbyenteringthecommand chmod755/FULL/PATH/TO/filename.command
ThesamecommandcantocreateaDesktoplauncheronUbuntubyrunning sudoapt-getinstall--no-install-recommendsgnome-panel gnome-desktop-item-edit~/Desktop/--create-new andthenusingthecommandastheCommandinthedialogthatappears.
Windows(Anaconda)
TorunIPythonopencmdandenter IPython inthestartmenu.StartingIPythonusingtheQtConsoleis similarandissimplycalled QtConsole inthestartmenu.Launching IPython fromthestartmenushould createawindowsimilartothatinfigure 1.1
Next,run
jupyterqtconsole--generate-config intheterminalorcommandprompttogenerateafilenamed jupyter_qtconsole_config.py.Thisfilecontains settingsthatareusefulforcustomizingtheQtConsolewindow.Afewrecommendedmodificationsare
Figure1.1:IPythonrunninginthestandardWindowsconsole(cmd.exe).
c.IPythonWidget.font_size=11
c.IPythonWidget.font_family= "Bitstream Vera Sans Mono"
c.JupyterWidget.syntax_style= 'monokai'
ThesecommandsassumethattheBitstreamVerafontshavebeenlocallyinstalled,whichareavailable from http://ftp.gnome.org/pub/GNOME/sources/ttf-bitstream-vera/1.10/.Opening QtConsole should createawindowsimilartothatinfigure 1.2 (althoughtheappearancemightdiffer)ifyoudidnotusethe recommendationconfiguration.
1.5.3
GettingHelp HelpisavailableinIPythonsessionsusing help(function).Somefunctions(andmodules)haveverylong helpfiles.WhenusingIPython,thesecanbepagedusingthecommand ?function or function? sothatthe textcanbescrolledusingpageupanddownandqtoquit. ??function or function?? canbeusedtotype theentirefunctionincludingboththedocstringandthecode.
1.5.4 RunningPythonprograms Whileinteractiveprogrammingisusefulforlearningalanguageorquicklydevelopingsomesimplecode, complexprojectsrequiretheuseofcompleteprograms.ProgramscanberuneitherusingtheIPython magicwork %runprogram.py orbydirectlylaunchingthePythonprogramusingthestandardinterpreter using pythonprogram.py.TheadvantageofusingtheIPythonenvironmentisthatthevariablesusedin theprogramcanbeinspectedaftertheprogramrunhascompleted.DirectlycallingPythonwillrunthe programandthenterminate,andsoitisnecessarytooutputanyimportantresultstoafilesothatthey canbeviewedlater.4
4ProgramscanalsoberuninthestandardPythoninterpreterusingthecommand: exec(compile(open(’filename.py’).read(),’filename.py’,’exec’))