Toourstudents
Preface
Whenwebeganwritingthisbook,longeragothanwecaretoadmit,ourgoal wastowriteatextthatcouldbeusedforthesecondandthirdsemesters ofatypicalgraduatesequenceineconometrics.Weperceivedalackofany textbookthataddressedtheneedsofstudentstryingtoacquireasolidunderstandingofwhatwethinkofasthe“modern”approachtoeconometrics. Bythiswemeananapproachthatgoesbeyondthewell-knownlinearregressionmodel,stressestheessentialsimilaritiesofalltheleadingestimation methods,andputsasmuchemphasisontestingandmodelspecificationas onestimation.
Wesoonrealizedthatthisplanhadafatalflaw.Inordertowriteabook forthe second courseineconometrics,onemustbeconfidentofwhatwillhave beencoveredinthe first course.Sincetherewasnotthen,andisnotnow, awidelyacceptedstandardsyllabusforthefirstcourseineconometrics,we decidedthatwewouldhavetostartthere.Wethereforechangedourplan, andthisbooknowattemptstodevelopeconometrictheoryfromtheground up.Readersareofcourseexpectedtohavesomeacquaintancewithelementaryeconometricsbeforestarting,butnomorethanwouldbepartofatypical undergraduatecurriculum.Theyarealsoexpectedtohavethemathematical maturityappropriatetograduatestudentsineconomics,althoughwedoprovidetwoappendicesthatcoverthemathematicalandstatisticalprerequisites forunderstandingthematerial.
Almostalloftheeconometrictheorywepresentis asymptotic,which meansthatitisexactlytrueonlyinthelimitasthesamplesizetendsto infinity,butisthought(orratherhoped)tobeapproximatelytrueinfinite samples.Inrecentyears,researchershavefounditincreasinglynecessaryto gobeyondtheconfinesofthestandardlinearregressionmodel,inwhichrestrictiveclassicalassumptionsleadtoexactresultsaboutthedistributions oftheordinaryleastsquaresestimatorandofthefamiliar t and F statistics. Greatergeneralityinmodelspecification,however,carriesthepricethatexact finite-sampleresultsareveryrarelyavailable.Happily,asymptoticeconometrictheoryisnowatamaturestageofdevelopment,anditprovidesthemain theoreticalfoundationforthepresentbook.
Ourfirstchapterdoesnotreallydiscusseconometricsatall.Instead, itpresentsthoseaspectsofthegeometryofleastsquaresthatareneededin therestofthebook.Akeyresultinthiscontextisthetheoremthatwe havedubbedtheFrisch-Waugh-LovellTheorem.Wehavefoundthatgettingstudentstounderstandthistheorem,althoughoftenaratherchallenging task,doesagreatdealtodevelopintuitionaboutestimationandtestingin econometrics.Aparticularapplicationofthetheorem,whichwepresentin
Chapter1,istothequestionofleverageandinfluenceinregressionmodels. Existingtreatmentsofthisimportanttopichavetypicallybeenalgebraically difficultandunintuitive.UseoftheFWLTheoremmakesitpossibletodevelopamuchsimplertreatment.Chapter1alsobrieflydiscussesthecomputationofordinaryleastsquaresestimates,asubjectaboutwhichtoomany studentsofeconometricsarecompletelyignorant.
Oneofouraimsinthisbookistoemphasizenonlinearestimation.In Chapters2and3,wethereforeplungedirectlyintoatreatmentofthenonlinearregressionmodel.Itturnsoutthatitisscarcelyanymoredifficultto developtheessentialnotionsofleast-squaresestimation,andofstatisticalinferencebasedonsuchestimation,inanonlinearcontextthanitisinthemore usuallinearone.Infact,the essential notionsareofteneasiertocometo gripswithifoneisnotdistractedbythegreatwealthofdetailedbutspecial resultsthatenrichthelineartheory.
AfterthelargelyintuitivetreatmentofChapters2and3,weprovide inChapters4and5afullerandmorerigorousaccountoftheasymptotic theorythatunderliesthenonlinearregressionmodel.Justhowfartogo inthequestforrigorhasbeenathornyproblematmanypoints.Muchof therecentliteratureintheoreticaleconometricsappearstobeinaccessibleto manystudents,inlargepart,webelieve,becauserigorhastakenprecedence overthecommunicationoffundamentalideas.Wehavethereforedeliberately notaimedatthesamestandardsofrigor.Ontheotherhand,somerigoris neededinanyaccountthatisnotmerelyanecdotal.ItisinChapters4and5, andlaterinChapter8,whichlaysthefoundationsofthetheoryofmaximum likelihood,thatwehavegoneasfaraswefeltwecouldinthedirectionofa formalrigoroustreatment.Attimesweevenadopta“theorem-proof”format, somethingthatwehavegenerallyavoidedinthebook.Manyinstructorswill prefertoskimthesechapters,especiallyinafirstcourse,althoughwehope thatmostwillchoosenottoomitthementirely.
Althoughwestressnonlinearmodelsthroughoutthebook,wealsoemphasizeanotherpointthathasemergedinthelastfifteenyearsandthathas beenacentralfocusofmuchofourownresearchoverthatperiod.Inorder toperformstatisticalinferenceontheresultsofanonlinearestimationprocedure,itisalmostalwayspossibletomakeuseofartificial linear regressions forthepurposesofcomputingteststatistics.Chapter6isthefirstchapter inwhichwediscussanartificiallinearregression,anditisakeychapterfor understandingmuchsubsequentmaterial.Weshowhowtheso-calledGaussNewtonregressioncanbeusedforavarietyofpurposes,mostnotablythe calculationofLagrangemultipliertestsandrelatedteststatistics,thecomputationofnonlinearleastsquaresestimates,andthecomputationofone-step efficientestimates.Theuseofartificialregressionsfordoingdiagnostictestsof modelspecificationisemphasized.Otherartificialregressionsareintroduced laterinthebookforuseincontextsmoregeneralthanthatofnonlinear regressionmodels,buttheintuitionisalwaysthesame.
OurtreatmentofthelinearsimultaneousequationsmodelbeginsinChapter7,wherewediscusssingle-equationinstrumentalvariablesestimation.In linewithouremphasisonnonlinearmodels,wedonotstickwithlinearinstrumentalvariablesmodelsonly,butalsotreattheestimationofnonlinear modelsbyinstrumentalvariablesandshowhowtheGauss-Newtonregression generalizestosuchmodels.Wealsointroducetheimportantideaoftestsof overidentifyingrestrictions.However,wedonotattemptafulltreatmentof thelinearsimultaneousequationsmodelatthispoint.Wehavedeliberately leftthistopic,oftenthoughtofasthecenterpieceofeconometrictheory,until verylateinthebook.Itisourfeelingthatmoderntheoryandpracticeare driftingawayfromthelinearsimultaneousequationsmodel,infavorofamore flexibleapproachinwhichinstrumentalvariablescontinuetoplayalargerole butinamuchmoregeneralcontext.
ThepresentationofstandardmaximumlikelihoodtheoryinChapter8 reliesasmuchaspossibleoninsightsdevelopedearlierforthenonlinearregressionmodel.Thebasicconceptsofconsistencyandasymptoticnormalityare alreadyavailableandcanthereforebedealtwithquiteswiftly.Newconcepts ariseinconnectionwiththeinformationmatrixequalityandtheCram´er-Rao lowerbound.InChapter9,maximumlikelihoodmethodsfindtheirfirstmajorapplicationaswedevelopthemethodsofgeneralizedleastsquares.These methodsleadnaturallytoadiscussionofmultivariate,butnotsimultaneous, models.Wealsodevoteasectionofthischaptertoproblemsparticulartothe analysisofpaneldata.
Chapter10dealswithatopicofgreatconcerntoalleconometricianswho workwithtimeseries:serialcorrelation.Fewtopicsineconometricshavebeen thesubjectofsovastaliterature,muchofwhichisnowsomewhatoutdated. Althoughwemakenoattempttogiveacompleteaccountofthisliterature, thischapterisneverthelessoneofthelongest.Itprovidesafirsttreatment oftime-seriesmethods,sinceitisherethatwedescribeautoregressiveand movingaverageprocesses.Methodsoftestingforthepresenceoftheseprocessesintheerrortermsofregressionequations,andperformingestimation intheirpresence,arediscussed.Again,wehighlightthepossibilityofusingartificiallinearregressionsforthesepurposes.Onesectionisdevotedto theimportant,andinmanytextssurprisinglyneglected,subjectofcommon factorrestrictions.
Hypothesistestinganddiagnostictesting,alwaysaprimaryconcern,take centerstageagaininChapter11,whichdiscussestestsbasedontheGaussNewtonregression.Nonnestedhypothesistestingisdiscussedhere,andthe principleofDurbin-Wu-Hausmantests,introducedearlierinChapter7,is takenupmorefully.Inaddition,aheteroskedasticity-robustversionofthe Gauss-Newtonregressionisdeveloped,providingafirstlookatissuesthat willbetakenupinmuchmoredetailinChapters16and17.
Chapter12containsmaterialnotfoundinanyothertextbooktreatment, toourknowledge.Here,inthesimplecontextoftheregressionmodel,we
Preface
discussthedeterminantsoftestpower.Weshowhowtestsoftenhavepower torejectfalsehypothesesorill-specifiedmodelsevenwhenthealternative hypothesisunderlyingthetestisalsowronglyspecified.Theunifyingconcept isthatofadriftingDGP,ageneralizationofthePitmandriftofstandard statisticalanalysis.Thisconceptmakesitpossibletodevelopanasymptotic theoryoftestpower,basedonasymptoticnoncentralityparameters.The asymptoticpowerofatestisshowntodependonjusttwothings:itsnoncentralityparameteranditsnumberofdegreesoffreedom.Wealsodevotea sectiontotheinversepowerfunction,whichhasrecentlybeenproposedasa usefulandpowerfultoolfortheinterpretationoftestresults.Wesuspectthat someinstructorswillchoosetoskipthischapter,butwefeelstronglythatany studentwhoaimstobeaspecialistineconometricsshouldbefamiliarwith thismaterial.
InChapter13,weturnagaintomaximumlikelihoodestimationand develop,ratherformally,thetheoryoftheclassicalhypothesistests,relying forintuitiononsomeofthematerialoftheprecedingtwochapters.Wetreat notonlythewell-knowntrioofthelikelihoodratio,Lagrangemultiplier,and Waldtests,butalsothe C (α)testofNeyman,whichisnowemergingfrom somedecadesofneglect.Thelattertestturnsouttobeparticularlyeasyto implementbymeansofartificialregressions.Itisinthischapterthatthe well-knownOPGregressionisintroduced.
FromChapter14untiltheendofthebook,mostchaptersconstitute relativelyself-containedunits.Inthesechapters,wetrytodiscussmanyof thetopicsofimportanceinmoderneconometrics.Itisherethatsomereaders maywellfeelthatwehavebeenhopelesslymisguidedinourselectionand haveleftouttheonethingthatalleconometriciansmustknow.Inafieldas rapidlygrowingaseconometricsisatthemoment,theymaywellberight.We havebeenguidedlargelybyourowninterestsandtastes,whichareinevitably fallible.Twotopicsthatwecouldwellhavediscussedifspacehadpermitted arenonparametricandsemiparametrictechniquesandBayesianmethods.We apologizetospecialistsinthesefields,offeringonlythelameexcusethatwe arenotourselvesspecialistsinthem,andwouldnodoubthavefailedtodo themjustice.
Chapters14and15dealrespectivelywithmodelsinvolvingtransformationsofthedependentvariableandmodelsinvolvingqualitativeandlimited dependentvariables.Bothchaptersrelyheavilyonthetheoryofestimation andtestingformodelsestimatedbymaximumlikelihood.Courseswithan appliedorientationmightwanttoemphasizethesechapters,andtheoretical coursesmightomitthementirelyinfavorofmoreadvancedtopics.
Chapter16dealswithavarietyoftopics,includingheteroskedasticity, skewnessandkurtosis,conditionalmomenttests,andinformationmatrix tests.Manyrelativelyrecentdevelopmentsarediscussedinthischapter, whichleadsnaturallytoChapter17,onthegeneralizedmethodofmoments,or GMM.Thisimportantestimationtechniquehasnot,toourknowledge,been
discussedinanydetailinprevioustextbooks.Ourtreatmentdependsheavily onearlierresultsforinstrumentalvariablesandgeneralizedleastsquares.It containsbothgeneralresultsformodelsestimatedbymeansofanysetof momentconditions,andspecificresultsforlinearregressionmodels.Forthe latter,wepresentestimatorsthataremoreefficientthanordinaryandtwostageleastsquaresinthepresenceofheteroskedasticityofunknownform.
Afulltreatmentofthelinearsimultaneousequationsmodeldoesnot occuruntilChapter18.Oneadvantageofleavingituntillateinthebookis thatpreviousresultsoninstrumentalvariables,maximumlikelihood,andthe generalizedmethodofmomentsarethenavailable.Thus,inChapter18,we areabletoprovidereasonablyadvanceddiscussionsofLIML,FIML,and3SLS estimationasapplicationsofgeneraltechniquesthatstudentshavealready learned.TheGMMframeworkalsoallowsustointroduceavariantof3SLS thatisefficientinthepresenceofheteroskedasticityofunknownform.
Chapters19and20completeourdiscussionoftime-seriesissues.The firstdealswithanumberoftopicsthatareimportantforappliedwork,includingspuriousregressions,dynamicmodels,andseasonality.Thesecond dealswithtworelatedtopicsofsubstantialcurrentinterestthathavenotto ourknowledgebeentreatedinprevioustextbooks,namely,unitrootsand cointegration.ThesechapterscouldbecoveredimmediatelyafterChapter10 inacourseorientedtowardapplications,althoughtheydomakeuseofresults fromsomeinterveningchapters.
Finally,Chapter21providesareasonablydetailedintroductiontoMonte Carlomethodsineconometrics.Thesemethodsarealreadywidelyused,and webelievethattheirusewillincreasegreatlyoverthenextfewyearsas computersbecomecheaperandmorepowerful.
Onepossiblewayinwhichthisbookcanbeusedistostartatthebeginningandcontinueuntiltheend.Ifthreesemestersareavailable,suchan approachisnotonlypossiblebutdesirable.Iflesstimeisavailable,however, therearemanypossibleoptions.Onealternativewouldbetogoonlyasfar asChapter13andthen,iftimeremains,selectafewchaptersortopicsfrom theremainderofthebook.Dependingonthefocusofthecourse,itisalso possibletoskipsomeearlierchapters,suchasChapters10and12,alongwith partsofChapters9,11,and13.
Insomecourses,itmaybepreferabletoskipmuchofthetheoretical materialentirelyandconcentrateonthetechniquesforestimationandinference,withouttheunderlyingtheory.Inthatevent,wewouldrecommend thatChapters4,5,and8becoveredlightly,andthatChapter13beskipped entirely.ForChapter4,thenotionsofconsistencyandasymptoticnormality wouldneedtobetreatedatsomelevel,butitispossibletobecontentwith simpledefinitions.AgooddealofconceptualmaterialwithoutmuchmathematicalformalismcanbefoundinSection4.4,inwhichthekeyideaofa data-generatingprocessisdefinedanddiscussed.ForChapter5,theresults ontheconsistencyandasymptoticnormalityofthenonlinearleastsquares
estimatorshouldbestatedanddiscussedbutneednotbeproved.TheGaussMarkovTheoremcouldalsobediscussed.InChapter8,thefirsttwosections containthematerialnecessaryforlaterchaptersandarenotatallformal incontent.Thenextsixsectionscouldthenbeskipped.Section8.9,on testing,couldserveasasimplerreplacementforthewholeofChapter13. Finally,Section8.10forgesthelinkbetweenmaximumlikelihoodtheoryand thepreviouslycoveredmaterialonthenonlinearregressionmodel.
OneofusteachesinFrance,whereforseveralyearshehasusedmaterial fromthisbookasthebasisforaseriesofcoursesattheupperundergraduate andmaster’slevels.Thestudentshavealreadytakenbasiccoursesinmathematicsandstatisticswhentheyentertheprogram.Inthefirstyear,they arepresentedwithmaterialfromthefirstthreechaptersandabriefdiscussionofthemainissuesofChapters4and5,followedbyChapters6and7, andaccompaniedbyproblemsetstobeworkedoutonthecomputer.The secondyearembarksonmaximumlikelihoodtheoryfromChapters8and9, skipsmostofChapter10(althoughthemodelwithAR(1)errorsisusedasan importantexampleoftheusesoftheGauss-Newtonregression),andtakesup thetestingmaterialofChapters11,12,and13,withrelativelylittleemphasis placedonthelastofthese.Numerousproblemsetsaccompanythematerial ofthesechapters.Thethird-yearcourse,whichisshorterandisjoinedby studentsfromotherprograms,variesmoreincontent,althoughChapter13is alwaysusedasafocusforpresentationandrevisionofmaximumlikelihood methodsandtestingprocedures.Recently,infact,thefirstchaptertobe discussedwasthelast,Chapter21,onMonteCarlomethods.
Itisourhopethatthisbookwillbeuseful,notonlytostudents,butalso toestablishedresearchersineconometricsasaworkofreference.Manyof thetechniqueswedescribe,especiallythosebasedonartificialregressions,are difficulttofindintheliteratureorcanbefoundonlyinexceedinglytechnical articles.WewouldespeciallyliketodrawattentiontoChapter12,inwhich wediscussthedeterminantsoftestpowerandthecorrectinterpretationof teststatistics;Chapter17,whichisoneofveryfewtextbooktreatmentsof thegeneralizedmethodofmoments;andChapter21,onMonteCarloexperiments.Inthesechapters,wethinkthatthebookmakesauniquecontribution. Muchofthematerialintherestofthebook,notablyChapters6,11,16,and 20,isalsonottobefoundinothertexts.Evenwhenthematerialwecoveris relativelyfamiliar,webelievethatourwayoftreatingitisoftennovelenough tobeenlightening.
Oneadvantageofabookovertheresearchliteratureisthatacoherentapproachand,perhapsofevengreaterimportance,acoherentnotation canbedeveloped.Thusreaderscanmorereadilyperceivetherelationsand similaritiesbetweenseeminglydisparatetechniquesandarguments.Wewill notpretendeitherthatournotationisalwaysabsolutelyconsistentorthat itwaseasytomakeitevenasconsistentasitis.Forexample,thestudyof timeserieshasforalongtimegeneratedaliteraturedistinctlyseparatefrom
themainstreamofeconometrics,andwithinthisliteraturenotationalhabits haveevolvedthatareincompatiblewiththosethatmosteconometriciansare usedto.Manypeople,however,wouldbetakenabackiftimeseriesresults werepresentedinanotationtoomarkedlydifferentfromthatusedinthe timeseriesliterature.Wehavetriedveryhardtousenotationthatisatonce consistentandintuitive.Thereaderwillbethejudgeoftheextenttowhich wehavesucceeded.
Itisinconceivablethatabookaslongandtechnicalasthisoneshould befreefromerrors.Allthecorrectionsincorporatedinthisprintingandones discoveredlaterareavailableinelectronicformviatheInternet;seepage875. Therewouldhavebeenfarmoreerrorsifwehadnothadthehelpofagreat manypeopleinreadingpreliminarydrafts.Theypointedoutadisconcertinglylargenumberofmistakes,mostmerelytypographical,butsomequite serious.Weareindebtedtoourstudents,inbothCanadaandFrance,inthis respect.WethankespeciallyDirkEddelb¨uttel,NielsHansen,DougTattrie, ColinTelmer,andJohnTouchieforthemanyhourstheydevotedtogoing throughchapterafterchapterwithafine-toothcomb.Manyofourcolleagues havemadeextremelyvaluablesuggestionstous.Somesuggestedtopicsthat wemightotherwisehaveleftout,andothersweregoodenoughtoprovideus withdetailedcommentsonourpreliminaryefforts.OurthanksgotoRichard Blundell,ColinCameron,GordonFisher,JohnGalbraith,BillGreene,AllanGregory,MarkKamstra,PeterSephton,GregorSmith,ThanasisStengos, TimoTer¨asvirta,andDianaWhistler.Wearealsoindebtedtoananonymousreader,whourgedustorefocusthebookwhenouroriginalplanproved infeasible.
Itiscustomaryforauthorstothanktheirsecretariesforunflaggingsupport,bothtechnicalandmoral,inthepreparationoftheirmanuscript.This customimposesonusthepleasantdutyofthankingeachother,sincethe manuscriptwasprepared,inTEX,byourownunaidedefforts.Attimes,it seemedthattheintricaciesofthispeerlesscomputerprogramwouldtakeus moretimetomasterthanthewholeofeconometricsitself.Weoweadebt ofgratitudetoDonaldKnuth,theoriginalauthorofTEX,andtothemany otherpeoplewhohavecontributedtoitsdevelopment.
Finally,wemustgivethankswhereitisdueforagreatdealofmoral support,andformuchmorebesides,duringthelongperiodwhenwetalked book,morebook,andyetmorebook.Itiswithmuchgratitudethatwe recordourthankstoourwives,PamelaandSusan.
Contents
1TheGeometryofLeastSquares 3
1.1Introduction3
1.2TheGeometryofLeastSquares4
1.3RestrictionsandReparametrizations16
1.4TheFrisch-Waugh-LovellTheorem19
1.5ComputingOLSEstimates25
1.6InfluentialObservationsandLeverage32
1.7FurtherReadingandConclusion39
2NonlinearRegressionModelsandNonlinearLeastSquares 41
2.1Introduction41
2.2TheGeometryofNonlinearLeastSquares43
2.3IdentificationinNonlinearRegressionModels48
2.4ModelsandData-GeneratingProcesses51
2.5LinearandNonlinearRegressionFunctions55
2.6ErrorTerms58
2.7Conclusion64
3InferenceinNonlinearRegressionModels 66
3.1Introduction66
3.2CovarianceMatrixEstimation67
3.3ConfidenceIntervalsandConfidenceRegions71
3.4HypothesisTesting:Introduction78
3.5HypothesisTestinginLinearRegressionModels81
3.6HypothesisTestinginNonlinearRegressionModels88
3.7RestrictionsandPretestEstimators94 3.8Conclusion98
4IntroductiontoAsymptoticTheoryandMethods 99
4.1Introduction99
4.2Sequences,Limits,andConvergence100
4.3RatesofConvergence108
4.4Data-GeneratingProcessesandAsymptoticTheory113
4.5ConsistencyandLawsofLargeNumbers118
4.6AsymptoticNormalityandCentralLimitTheorems125
4.7SomeUsefulResults130
4.8Conclusion137
5AsymptoticMethodsandNonlinearLeastSquares 139
5.1Introduction139
5.2AsymptoticIdentifiability139
5.3ConsistencyoftheNLSEstimator145
5.4AsymptoticNormalityoftheNLSEstimator153
5.5AsymptoticEfficiencyofNonlinearLeastSquares157
5.6PropertiesofNonlinearLeastSquaresResiduals162
5.7TestStatisticsBasedonNLSEstimates168
5.8FurtherReadingandConclusion174
6TheGauss-NewtonRegression 176
6.1Introduction176
6.2ComputingCovarianceMatrices179
6.3CollinearityinNonlinearRegressionModels181
6.4TestingRestrictions186
6.5DiagnosticTestsforLinearRegressionModels193
6.6One-StepEfficientEstimation196
6.7HypothesisTestsUsingAnyConsistentEstimates199
6.8NonlinearEstimationUsingtheGNR201
6.9FurtherReading207
7InstrumentalVariables 209
7.1Introduction209
7.2ErrorsinVariables210
7.3SimultaneousEquations211
7.4InstrumentalVariables:TheLinearCase215
7.5Two-StageLeastSquares220
7.6InstrumentalVariables:TheNonlinearCase224
7.7HypothesisTestsBasedontheGNR226
7.8IdentificationandOveridentifyingRestrictions232
7.9Durbin-Wu-HausmanTests237
7.10Conclusion242
8TheMethodofMaximumLikelihood 243
8.1Introduction243
8.2FundamentalConceptsandNotation247
8.3TransformationsandReparametrizations253
8.4Consistency255
8.5TheAsymptoticDistributionoftheMLEstimator260
8.6TheInformationMatrixEquality263
8.7ConcentratingtheLoglikelihoodFunction267
8.8AsymptoticEfficiencyoftheMLEstimator270
8.9TheThreeClassicalTestStatistics274
8.10NonlinearRegressionModels279
8.11Conclusion287
9MaximumLikelihoodandGeneralizedLeastSquares 288
9.1Introduction288
9.2GeneralizedLeastSquares289 9.3TheGeometryofGLS292
9.4TheGauss-NewtonRegression295
9.5FeasibleGeneralizedLeastSquares298
9.6MaximumLikelihoodandGNLS301
9.7IntroductiontoMultivariateRegressionModels305
9.8GLSEstimationofMultivariateRegressionModels309
9.9MLEstimationofMultivariateRegressionModels315 9.10ModelingTime-Series/Cross-SectionData320 9.11Conclusion325
10SerialCorrelation 327
10.1Introduction327 10.2SerialCorrelationandLeastSquaresEstimation329 10.3EstimatingRegressionModelswithAR(1)Errors331 10.4StandardErrorsandCovarianceMatrices338 10.5Higher-OrderARProcesses341 10.6InitialObservationsinModelswithARErrors343 10.7MovingAverageandARMAProcesses351 10.8TestingforSerialCorrelation357 10.9CommonFactorRestrictions364 10.10InstrumentalVariablesandSerialCorrelation369 10.11SerialCorrelationandMultivariateModels371 10.12Conclusion373
11TestsBasedontheGauss-NewtonRegression 374
11.1Introduction374
11.2TestsforEqualityofTwoParameterVectors375 11.3TestingNonnestedRegressionModels381 11.4TestsBasedonComparingTwoSetsofEstimates389 11.5TestingforHeteroskedasticity396 11.6AHeteroskedasticity-RobustVersionoftheGNR399 11.7Conclusion402
12InterpretingTestsinRegressionDirections 403
12.1Introduction403 12.2SizeandPower405 12.3DriftingDGPs409 12.4TheAsymptoticDistributionofTestStatistics411 12.5TheGeometryofTestPower415 12.6AsymptoticRelativeEfficiency421 12.7InterpretingTestStatisticsthatRejecttheNull423
12.8TestStatisticsthatDoNotRejecttheNull428 12.9Conclusion433
13TheClassicalHypothesisTests 435
13.1Introduction435
13.2TheGeometryoftheClassicalTestStatistics436 13.3AsymptoticEquivalenceoftheClassicalTests445
13.4ClassicalTestsandLinearRegressionModels452 13.5AlternativeCovarianceMatrixEstimators458
13.6ClassicalTestStatisticsandReparametrization463 13.7TheOuter-Product-of-the-GradientRegression471 13.8FurtherReadingandConclusion478
14TransformingtheDependentVariable 480
14.1Introduction480 14.2TheBox-CoxTransformation483 14.3TheRoleofJacobianTermsinMLEstimation489 14.4Double-LengthArtificialRegressions492 14.5TheDLRandModelsInvolvingTransformations498 14.6TestingLinearandLoglinearRegressionModels502 14.7OtherTransformations507 14.8Conclusion510
15QualitativeandLimitedDependentVariables 511
15.1Introduction511
15.2BinaryResponseModels512 15.3EstimationofBinaryResponseModels517 15.4AnArtificialRegression523
15.5ModelsforMorethanTwoDiscreteResponses529
15.6ModelsforTruncatedData534
15.7ModelsforCensoredData537 15.8SampleSelectivity542 15.9Conclusion545
16HeteroskedasticityandRelatedTopics 547
16.1Introduction547
16.2LeastSquaresandHeteroskedasticity548
16.3CovarianceMatrixEstimation552
16.4AutoregressiveConditionalHeteroskedasticity556
16.5TestingforHeteroskedasticity560
16.6SkedasticDirectionsandRegressionDirections564
16.7TestsforSkewnessandExcessKurtosis567
16.8ConditionalMomentTests571
16.9InformationMatrixTests578 16.10Conclusion582
17TheGeneralizedMethodofMoments
583
17.1IntroductionandDefinitions583 17.2CriterionFunctionsandM-Estimators587 17.3EfficientGMMEstimators597
17.4EstimationwithConditionalMoments602
17.5CovarianceMatrixEstimation607 17.6InferencewithGMMModels614 17.7Conclusion620
18SimultaneousEquationsModels
622
18.1Introduction622
18.2ExogeneityandCausality624
18.3IdentificationinSimultaneousEquationsModels631
18.4Full-InformationMaximumLikelihood637 18.5Limited-InformationMaximumLikelihood644
18.6Three-StageLeastSquares651
18.7NonlinearSimultaneousEquationsModels661 18.8Conclusion667
19RegressionModelsforTime-SeriesData
669
19.1Introduction669
19.2SpuriousRegressions669 19.3DistributedLags673
19.4DynamicRegressionModels680
19.5VectorAutoregressions684
19.6SeasonalAdjustment687 19.7ModelingSeasonality696 19.8Conclusion699
20UnitRootsandCointegration
700
20.1Introduction700
20.2TestingforUnitRoots702
20.3AsymptoticTheoryforUnitRootTests705
20.4SerialCorrelationandOtherProblems710
20.5Cointegration715
20.6TestingforCointegration720
20.7Model-BuildingwithCointegratedVariables723
20.8VectorAutoregressionsandCointegration726 20.9Conclusion730
21MonteCarloExperiments
731
21.1Introduction731 21.2GeneratingPseudo-RandomNumbers734 21.3GeneratingPseudo-RandomVariates735
21.4DesigningMonteCarloExperiments738
21.5VarianceReduction:AntitheticVariates744
21.6VarianceReduction:ControlVariates747
21.7ResponseSurfaces755
21.8TheBootstrapandRelatedMethods763
21.9Conclusion768
Appendices
AMatrixAlgebra 770
A.1Introduction770
A.2ElementaryFactsaboutMatrices770
A.3TheGeometryofVectors775
A.4MatricesasMappingsofLinearSpaces777
A.5PartitionedMatrices779
A.6Determinants782
A.7PositiveDefiniteMatrices787
A.8EigenvaluesandEigenvectors789
BResultsfromProbabilityTheory 793
B.1Introduction793
B.2RandomVariablesandProbabilityDistributions793
B.3MomentsofRandomVariables797
B.4SomeStandardProbabilityDistributions802
References 812
AuthorIndex 851
SubjectIndex 857
SSupplement 875
S.1Introduction875
S.2FunctionsofParameterEstimates875
S.3IndependenceofTestsofNestedHypotheses877
S.4SandwichCovarianceMatrices882
S.5PropertiesofRoot-n ConsistentEstimators885
S.6TheNoncentralChi-squaredDistribution887
TheGeometryofLeastSquares
1.1Introduction
Themostcommonlyused,andinmanywaysthemostimportant,estimation techniqueineconometricsis leastsquares.Itisusefultodistinguishbetween twovarietiesofleastsquares, ordinaryleastsquares,or OLS,and nonlinear leastsquares,or NLS.InthecaseofOLStheregressionequationthatisto beestimatedislinearinalloftheparameters,whileinthecaseofNLSitis nonlinearinatleastoneparameter.OLSestimatescanbeobtainedbydirect calculationinseveraldifferentways(seeSection1.5),whileNLSestimates requireiterativeprocedures(seeChapter6).Inthischapter,wewilldiscuss onlyordinaryleastsquares,sinceunderstandinglinearregressionisessential tounderstandingeverythingelseinthisbook.
ThereisanimportantdistinctionbetweenthenumericalandthestatisticalpropertiesofestimatesobtainedusingOLS. Numericalproperties are thosethatholdasaconsequenceoftheuseofordinaryleastsquares,regardlessofhowthedataweregenerated.Sincethesepropertiesarenumerical,they canalwaysbeverifiedbydirectcalculation.Anexampleisthewell-known factthatOLSresidualssumtozerowhentheregressorsincludeaconstant term. Statisticalproperties,ontheotherhand,arethosethatholdonlyunder certainassumptionsaboutthewaythedataweregenerated.Thesecannever beverifiedexactly,althoughinsomecasestheycanbetested.Anexampleis thewell-knownpropositionthatOLSestimatesare,incertaincircumstances, unbiased.
Thedistinctionbetweennumericalpropertiesandstatisticalpropertiesis obviouslyfundamental.Inordertomakethisdistinctionasclearlyaspossible, wewillinthischapterdiscussonlytheformer.Wewillstudyordinaryleast squarespurelyasacomputationaldevice,withoutformallyintroducingany sortofstatisticalmodel(althoughwewillonoccasiondiscussquantitiesthat aremainlyofinterestinthecontextoflinearregressionmodels).Nostatistical modelswillbeintroduceduntilChapter2,wherewewillbegindiscussing nonlinearregressionmodels,ofwhich linearregressionmodels areofcourse aspecialcase.
BysayingthatwewillstudyOLSasacomputationaldevice,wedonot meanthatwewilldiscusscomputeralgorithmsforcalculatingOLSestimates
TheGeometryofLeastSquares (althoughwewilldothattoalimitedextentinSection1.5).Instead,wemean thatwewilldiscussthenumericalpropertiesofordinaryleastsquaresand,in particular,thegeometricalinterpretationofthoseproperties.AllofthenumericalpropertiesofOLScanbeinterpretedintermsofEuclideangeometry. Thisgeometricalinterpretationoftenturnsouttoberemarkablysimple,involvinglittlemorethanPythagoras’Theoremandhigh-schooltrigonometry, inthecontextoffinite-dimensionalvectorspaces.Yettheinsightgainedfrom thisapproachisverygreat.Onceonehasathoroughgraspofthegeometry involvedinordinaryleastsquares,onecanoftensaveoneselfmanytedious linesofalgebrabyasimplegeometricalargument.Moreover,aswehopethe remainderofthisbookwillillustrate,understandingthegeometricalpropertiesofOLSisjustasfundamentaltounderstandingnonlinearmodelsofall typesasitistounderstandinglinearregressionmodels.
1.2TheGeometryofLeastSquares
Theessentialingredientsofa linearregression area regressand y andamatrix of regressors X ≡ [x1 xk ].Theregressand y isan n--vector,andthematrix ofregressors X isan n × k matrix,eachcolumn xi ofwhichisan n--vector. Theregressand y andeachoftheregressors x1 through xk canbethoughtof aspointsin n--dimensionalEuclideanspace, En.The k regressors,provided theyarelinearlyindependent, span a k--dimensional subspace of En.Wewill denotethissubspaceby S(X).1
Thesubspace S(X)consistsofallpoints z in En suchthat z = Xγ for some γ,where γ isa k--vector.Strictlyspeaking,weshouldreferto S(X)as thesubspacespannedbythecolumnsof X,butlessformallywewilloften refertoitsimplyasthe span of X.The dimension of S(X)isalwaysequal to ρ (X),the rank of X (i.e.,thenumberofcolumnsof X thatarelinearly independent).Wewillassumethat k isstrictlylessthan n,somethingwhich itisreasonabletodoinalmostallpracticalcases.If n werelessthan k,it wouldbeimpossiblefor X tohavefullcolumnrank k
AEuclideanspaceisnotdefinedwithoutdefiningan innerproduct.In thiscase,theinnerproductweareinterestedinistheso-called naturalinner product.Thenaturalinnerproductofanytwopointsin En,say zi and zj , maybedenoted zi, zj andisdefinedby
zi, zj ≡ n t=1 zitzjt ≡ zi zj ≡ zj zi.
1 Thenotation S(X)isnotastandardone,therebeingnostandardnotationthat wearecomfortablewith.Webelievethatthisnotationhasmuchtorecommend itandwillthereforeuseithereafter.
Weremarkthatthenaturalinnerproductisnottheonlyonethatcouldbe used;wemight,forexample,choosetogiveadifferent,positive,weightto eachelementofthesum,asin
AswewillseeinChapter9,performingalinearregressionusingthisinnerproductwouldcorrespondtousingaparticularformofgeneralizedleast squares.Fortherestofthebook,unlessotherwisespecified,wheneverwe speakofaninnerproductwewillmeanthenaturalEuclideanone.
Ifapoint z (whichisofcoursean n--vector)belongsto S(X),wecan alwayswrite z asalinearcombinationofthecolumnsof X:
where γ1 through γk arescalarsand γ isa k--vectorwithtypicalelement γi Thusavectorof k coefficientslike γ identifies any pointin S(X).Provided thatthecolumnsof X are linearlyindependent,itdoessouniquely.The vectors x1 through xk arelinearlyindependentifwecannotwriteanyoneof themasalinearcombinationoftheothers.
Ifthe k regressorsare not linearlyindependent,thentheywillspana subspaceofdimensionlessthan k,say k ,where k isthelargestnumber ofcolumnsof X thatarelinearlyindependentofeachother,thatis, ρ (X). Inthiscase, S(X)willbeidenticalto S(X ),where X isan n × k matrix consistingofany k linearlyindependentcolumnsof X.Forexample,consider thefollowing X matrix,whichis6 × 3:
Thecolumnsofthismatrixarenotlinearlyindependent,since x1 = .5x2 + x3. However,anytwoofthecolumnsarelinearlyindependent,andso S(X)= S(x1, x2)= S(x1, x3)= S(x2, x3)
Wehaveintroducedanewnotationhere: S(x1, x2)denotesthesubspace spannedbythetwovectors x1 and x2 jointly.Moregenerally,thenotation
TheGeometryofLeastSquares
S(Z, W )willdenotethesubspacespannedbythecolumnsofthematrices Z and W takentogether;thus S(Z, W )meansthesamethingas S [ZW ] . Notethat,inmanycases, S(Z, W )willbeaspaceofdimensionlessthanthe sumoftheranksof Z and W,sincesomeofthecolumnsof Z mayliein S(W )andviceversa.Fortheremainderofthischapter,unlessthecontrary isexplicitlyassumed,wewill,however,assumethatthecolumnsof X are linearlyindependent.
Thefirstthingtonoteabout S(X)isthatwecansubject X toanyrankpreservinglineartransformationwithoutinanywaychangingthesubspace spannedbythetransformed X matrix.If z = Xγ and
where A isanonsingular k × k matrix,itfollowsthat
Thusanypoint z thatcanbewrittenasalinearcombinationofthecolumns of X canjustaswellbewrittenasalinearcombinationofanylineartransformationofthosecolumns.Weconcludethatif S(X)isthespacespanned bythecolumnsof X,itmustalsobethespacespannedbythecolumnsof X ∗ = XA.Thismeansthatwecouldgivethesamespaceaninfinitenumber ofnames,inthiscase S(X), S(X ∗),orwhatever.Someauthors(e.g.,Seber, 1980;Fisher,1981)havethereforeadoptedanotationinwhichthesubspace thatwehavecalled S(X)isnamedwithoutanyexplicitreferenceto X atall. Wehaveavoidedthis coordinate-free notationbecauseittendstoobscure therelationshipbetweentheresultsandtheregression(s)theyconcernand becauseinmostcasesthereisanaturalchoiceforthematrixwhosespanwe areinterestedin.Aswewillsee,however,manyoftheprincipalresultsabout linearregressionarecoordinate-freeinthesensethattheydependon X only through S(X).
The orthogonalcomplement of S(X)in En,whichisdenoted S⊥(X),is thesetofallpoints w in En suchthat,forany z in S(X), w z = 0.Thus everypointin S⊥(X)is orthogonal toeverypointin S(X)(twopointsare saidtobeorthogonaliftheirinnerproductiszero).Sincethedimensionof S(X)is k,thedimensionof S⊥(X)is n k.Itissometimesconvenientto refernottothedimensionofalinearsubspacebuttoits codimension.A linearsubspaceof En issaidtohavecodimension j ifthedimensionofits orthogonalcomplementis j.Thus,inthiscase, S(X)hasdimension k and codimension n k,and S⊥(X)hasdimension n k andcodimension k
BeforediscussingFigure1.1,whichillustratestheseconcepts,wemust sayawordaboutgeometricalconventions.Thesimplestwaytorepresentan n--dimensionalvector,say z,inadiagramissimplytoshowitasapointinan n--dimensionalspace; n ofcoursemustbelimitedto2or3.Itisoftenmore intuitive,however,explicitlytoshow z asavector,inthegeometricalsense.
Thisisdonebyconnectingthepoint z withtheoriginandputtinganarrowheadat z.Theresultingarrowthenshowsgraphicallythetwothingsabouta vectorthatmatter,namely,its length andits direction.TheEuclideanlength ofavector z is
wherethenotationemphasizesthat z isthepositivesquarerootofthesum ofthesquaredelementsof z.Thedirectionisthevectoritselfnormalized tohavelengthunity,thatis, z/ z .Oneadvantageofthisconventionis thatifwemoveoneofthearrows,beingcarefultochangeneitheritslength noritsdirection,thenewarrowrepresentsthesamevector,eventhoughthe arrowheadisnowatadifferentpoint.Itwilloftenbeveryconvenienttodo this,andwethereforeadoptthisconventioninmostofourdiagrams.
Figure1.1illustratestheconceptsdiscussedaboveforthecase n =2and k =1.Thematrixofregressors X hasonlyonecolumninthiscase,anditis thereforerepresentedbyasinglevectorinthefigure.Asaconsequence, S(X) isone-dimensional,andsince n =2, S⊥(X)isalsoone-dimensional.Notice that S(X)and S⊥(X)wouldbethesameif X were any pointonthestraight linewhichis S(X),exceptfortheorigin.Thisillustratesthefactthat S(X) isinvarianttoanynonsingulartransformationof X.
Aswehaveseen,anypointin S(X)canberepresentedbyavectorofthe form Xβ forsome k--vector β.Ifonewantstofindthepointin S(X)thatis closesttoagivenvector y,theproblemtobesolvedisthatofminimizing, withrespecttothechoiceof β,thedistancebetween y and Xβ.Minimizing thisdistanceisevidentlyequivalenttominimizingthesquareofthisdistance.
Figure1.1 Thespaces S(X)and S⊥(X)
Theprojectionof
Thus,solvingtheproblem
willfindtheclosestpointto y in S(X).Thevalueof β thatsolves(1.01), whichistheOLSestimate,willbedenoted ˆ β.
Thesquareddistancebetween y and Xβ canalsobewrittenas n t=1 (yt Xtβ)2 =(y Xβ) (y Xβ), (1.02)
where yt and Xt denote,respectively,the t th elementofthevector y and the t th rowofthematrix X. 2 Sincethedifferencebetween yt and Xtβ is commonlyreferredtoasa residual,thisquantityisgenerallycalledthe sum ofsquaredresiduals,or SSR.Itisalsosometimescalledthe residualsum ofsquares,whichmorecloselyparallelstheterminologyforitscounterpart, the explainedsumofsquares.TheacronymswouldthenbeRSSandESS. Unfortunately,someauthorsusetheformertostandforthe regression sum ofsquaresandthelatterforthe error sumofsquares,makingitunclearwhat theacronymsRSSandESSstandfor.WhenwerefertoSSRandESS,there shouldbenosuchambiguity.
ThegeometryofordinaryleastsquaresisillustratedinFigure1.2,which isFigure1.1withafewadditions.Theregressandisnowshownasthe vector y.Thevector X ˆ β,whichisoftenreferredtoasthevectorof fitted values,istheclosestpointin S(X)to y;notethat ˆ β isascalarinthiscase.It isevidentthatthelinejoining y and X ˆ β mustformarightanglewith S(X) at X ˆ β.Thislineissimplythevector y X ˆ β,translatedsothatitsoriginis
2 Werefertothe t th rowof X as Xt ratherthanas xt toavoidconfusionwith thecolumnsof X,whichwehavereferredtoas x1, x2,andsoon.
Figure1.2
y onto S(X)
at X ˆ β insteadofatzero.Therightangleformedby y X ˆ β and S(X)isthe keyfeatureofleastsquares.Atanyotherpointin S(X),suchas Xβ inthe figure, y Xβ doesnotformarightanglewith S(X)and,asaconsequence, y Xβ mustnecessarilybelargerthan y X ˆ β .
ThevectorofderivativesoftheSSR(1.02)withrespecttotheelements of β is
2X (y Xβ),
whichmustequal 0 ataminimum.Sincewehaveassumedthatthecolumns of X arelinearlyindependent,thematrix X X musthavefullrank.This, combinedwiththatfactthatanymatrixoftheform X X isnecessarily nonnegativedefinite,impliesthatthesumofsquaredresidualsisastrictly convexfunctionof β andmustthereforehaveauniqueminimum.Thus ˆ β is uniquelydeterminedbythe normalequations X (y X ˆ β)= 0. (1.03)
Thesenormalequationssaythatthevector y X ˆ β mustbeorthogonaltoall ofthecolumnsof X andhencetoanyvectorthatliesinthespacespanned bythosecolumns.Thenormalequations(1.03)arethussimplyawayofstatingalgebraicallywhatFigure1.2showedgeometrically,namely,that y X ˆ β mustformarightanglewith S(X).
Sincethematrix X X hasfullrank,wecanalwaysinvertittosolvethe normalequationsfor ˆ β.Weobtainthestandardformula: ˆ
(1 04)
Evenif X isnotoffullrank,thefittedvalues X ˆ β areuniquelydefined,because X ˆ β issimplythepointin S(X)thatisclosestto y.LookagainatFigure1.2 andsupposethat X isan n × 2matrix,butofrankonlyone.Thegeometrical point X ˆ β isstilluniquelydefined.However,since β isnowa2--vectorand S(X)isjustone-dimensional,thevector ˆ β isnotuniquelydefined.Thusthe requirementthat X havefullrankisapurelyalgebraicrequirementthatis neededtoobtainuniqueestimates ˆ β.
Ifwesubstitutetheright-handsideof(1.04)for ˆ β into X ˆ β,weobtain
Thisequationdefinesthe n × n matrix PX ≡ X(X X) 1X ,which projects thevector y orthogonallyonto S(X).Thematrix PX isanexampleofan orthogonalprojectionmatrix.Associatedwitheverylinearsubspaceof En are twosuchmatrices,oneofwhichprojectsanypointin En ontothatsubspace, andoneofwhichprojectsanypointin En ontoitsorthogonalcomplement. Thematrixthatprojectsonto S⊥(X)is
TheGeometryofLeastSquares
where I isthe n × n identitymatrix.Wesaythat S(X)isthe range ofthe projection PX while S⊥(X)istherangeof MX .Notethatboth PX and MX aresymmetricmatricesandthat
X + PX = I
Anypointin En,say z,isthereforeequalto MX z + PX z.Thusthesetwo projectionmatricesdefinean orthogonaldecomposition of En,becausethe twovectors MX z and PX z lieintwoorthogonalsubspaces.
Throughoutthisbook,wewilluse P and M subscriptedbymatrixexpressionstodenotethematricesthatrespectivelyprojectontoandoffthe subspacesspannedbythecolumnsofthosematrixexpressions.Thus PZ wouldbethematrixthatprojectsonto S(Z), MX,W wouldbethematrix thatprojectsoff S(X, W ),andsoon.Theseprojectionmatricesareofno usewhatsoeverforcomputation,becausetheyareofdimension n × n,which makesthemmuchtoolargetoworkwithonacomputerexceptwhenthe samplesizeisquitesmall.Buttheyareneverthelessextremelyuseful.Itis frequentlyveryconvenienttoexpressthequantitiesthatariseineconometrics usingthesematrices,partlybecausetheresultingexpressionsarerelatively compactandpartlybecausethepropertiesofprojectionmatricesoftenmake iteasytounderstandwhatthoseexpressionsmean.
Inthecaseofanylinearregressionwithregressors X,theprojection matricesofprimaryinterestare PX and MX .Thesematriceshaveseveral importantpropertieswhichcanallbeseenclearlyfromFigure1.2.One property,whichisoftenextremelyconvenient,isthattheyare idempotent. Anidempotentmatrixisonethat,whenmultipliedbyitself,yieldsitselfagain. Thus
PX PX = PX and MX MX = MX .
Theseresultsareeasilyprovedbyalittlealgebra,butthegeometryofthe situationmakesthemobvious.Ifonetakesanypoint,projectsitonto S(X), andthenprojectsitonto S(X) again,thesecondprojectioncanhavenoeffect, becausethepointis already in S(X).Thisimpliesthat PX PX z = PX z for anyvector z;therefore, PX PX = PX .Asimilarargumentholdsfor MX
Asecondimportantpropertyof PX and MX isthat
PX MX = 0. (1.06)
Thus PX and MX annihilateeachother.Again,thiscaneasilybeproved algebraicallyusingthedefinitionsof PX and MX ,butsuchaproofisquite unnecessary.Itshouldbeobviousthat(1.06)musthold,because PX projects onto S(X)and MX projectsonto S⊥(X).Theonlypointthatbelongsto both S(X)and S⊥(X)istheorigin,i.e.,thezerovector.Thus,ifweattempt toprojectanyvectorontoboth S(X)anditsorthogonalcomplement,weget thezerovector.
M
Infact, MX annihilatesnotjust PX butallpointsthatliein S(X), and PX annihilatesnotjust MX butallpointsthatliein S⊥(X).These propertiescanagainbeprovedbystraightforwardalgebra,butthegeometry ofthesituationisevensimpler.ConsiderFigure1.2again.Itisevidentthat ifweprojectanypointin S⊥(X)orthogonallyonto S(X),weendupatthe origin(whichisjustavectorofzeros),aswedoifweprojectanypointin S(X)orthogonallyonto S⊥(X).
Sincethespacespannedbythecolumnsof X isinvarianttononsingular lineartransformationsofthecolumnsof X,somustbetheprojectionmatrices PX and MX .Thiscanalsobeseenalgebraically.Considerwhathappens whenwepostmultiply X byanynonsingular k × k matrix A.Thematrix thatprojectsontothespanof XA is PXA = XA A X XA 1A X = XAA 1 X X 1(A ) 1A X = X X X
Thisresultsuggeststhatperhapsthe best waytocharacterizealinearsubspace isbythematrixthatprojectsorthogonallyontoit,withwhichitisinaoneto-onecorrespondence.
Iftherankofthematrix X is k,thensoistherankof PX .Thisfollows fromthefactthattherangeoftheprojectionmatrix PX isjust S(X),the spanof X,whichhasdimensionequalto ρ (X).Thus,although PX isan n×n matrix,itsrankisingeneralmuchsmallerthan n.Thiscrucialfactpermitsus tomakemuchgreateruseofsimplegeometrythanmightatfirstseempossible. Sinceweareworkingwithvectorsthatlieinan n--dimensionalspace,with n almostalwaysgreaterthan3,itmightseemthatdiagramslikeFigure1.2 wouldalmostneverbeapplicable.Butmostofthetimewewillbeinterested onlyinasmall-dimensionalsubspaceofthe n--dimensionalspaceinwhich theregressandandregressorsarelocated.Thesmall-dimensionalsubspaceof interestwillgenerallybeeitherthespacespannedbytheregressorsonlyorthe spacespannedbytheregressandalongwiththeregressors.Thesesubspaces willhavedimensions k and k +1,respectively,whateverthesamplesize n Theformersubspaceisuniquelycharacterizedbytheorthogonalprojection PX,andthelatterbytheorthogonalprojection PX,y
Whenwelookatafigurethatistwo-dimensional,possiblyintendedas atwo-dimensionalprojectionofathree-dimensionalimage,thetwoorthree dimensionsthatwecanvisualizewillthereforebethoseof S(X)or S(X, y). Whatweloseincollapsingtheoriginal n dimensionsintojusttwoorthree isthepossibilityofdrawingcoordinateaxesthatcorrespondtotheseparate observationsofasample.Forthattobepossible,itwouldindeedbenecessary torestrictourselvestosamplesoftwoorthree.Butthisseemsasmallpriceto payforthepossibilityofseeingthegeometricalinterpretationofagreatmany