MachineLearningforSignal Processing
GreatClarendonStreet,Oxford,OX26DP, UnitedKingdom
OxfordUniversityPressisadepartmentoftheUniversityofOxford. ItfurtherstheUniversity’sobjectiveofexcellenceinresearch,scholarship, andeducationbypublishingworldwide.Oxfordisaregisteredtrademarkof OxfordUniversityPressintheUKandincertainothercountries
©MaxA.Little2019
Themoralrightsoftheauthorhavebeenasserted
FirstEditionpublishedin2019
Impression:1
Allrightsreserved.Nopartofthispublicationmaybereproduced,storedin aretrievalsystem,ortransmitted,inanyformorbyanymeans,withoutthe priorpermissioninwritingofOxfordUniversityPress,orasexpresslypermitted bylaw,bylicenceorundertermsagreedwiththeappropriatereprographics rightsorganization.Enquiriesconcerningreproductionoutsidethescopeofthe aboveshouldbesenttotheRightsDepartment,OxfordUniversityPress,atthe addressabove
Youmustnotcirculatethisworkinanyotherform andyoumustimposethissameconditiononanyacquirer
PublishedintheUnitedStatesofAmericabyOxfordUniversityPress 198MadisonAvenue,NewYork,NY10016,UnitedStatesofAmerica
BritishLibraryCataloguinginPublicationData
Dataavailable
LibraryofCongressControlNumber:2019944777
ISBN978–0–19–871493–4 DOI:10.1093/oso/9780198714934.001.0001
Printedandboundby CPIGroup(UK)Ltd,Croydon,CR04YY
Preface
Digitalsignalprocessing(DSP)isoneofthe‘foundational’butsomewhatinvisible,engineeringtopicsofthemodernworld,withoutwhich, manyofthetechnologieswetakeforgranted:thedigitaltelephone, digitalradio,television,CDandMP3players,WiFi,radar,toname justafew,wouldnotbepossible.Arelativenewcomerbycomparison, statisticalmachinelearningisthetheoreticalbackboneofexcitingtechnologiesthatarebynowstartingtoreachalevelofubiquity,suchas automatictechniquesforcarregistrationplaterecognition,speechrecognition,stockmarketprediction,defectdetectiononassemblylines,robot guidanceandautonomouscarnavigation.Statisticalmachinelearning hasoriginsintherecentmergingofclassicalprobabilityandstatistics withartificialintelligence,whichexploitstheanalogybetweenintelligent informationprocessinginbiologicalbrainsandsophisticatedstatistical modellingandinference.
DSPandstatisticalmachinelearningareofsuchwideimportanceto theknowledgeeconomythatbothhaveundergonerapidchangesand seenradicalimprovementsinscopeandapplicability.BothDSPand statisticalmachinelearningmakeuseofkeytopicsinappliedmathematicssuchasprobabilityandstatistics,algebra,calculus,graphsand networks.Therefore,intimateformallinksbetweenthetwosubjects existandbecauseofthis,anemergingconsensusviewisthatDSPand statisticalmachinelearningshouldnotbeseenasseparatesubjects.The manyoverlapsthatexistbetweenthetwosubjectscanbeexploitedto producenewdigitalsignalprocessingtoolsofsurprisingutilityandefficiency,andwideapplicability,highlysuitedtothecontemporaryworld ofpervasivedigitalsensorsandhigh-poweredandyetcheap,computinghardware.Thisbookgivesasolidmathematicalfoundationtothe topicofstatisticalmachinelearningforsignalprocessing,includingthe contemporaryconceptsofthe probabilisticgraphicalmodel (PGM)and nonparametricBayes,conceptswhichhaveonlymorerecentlyemerged asimportantforsolvingDSPproblems.
Thebookisaimedatadvancedundergraduatesorfirst-yearPhDstudentsaswellasresearchersandpractitioners.Itaddressesthefoundationalmathematicalconcepts,illustratedwithpertinentandpractical examplesacrossarangeofproblemsinengineeringandscience.Theaim istoenablestudentswithanundergraduatebackgroundinmathematics,statisticsorphysics,fromawiderangeofquantitativedisciplines, togetquicklyuptospeedwiththelatesttechniquesandconceptsin thisfast-movingfield.Theaccompanyingsoftwarewillenablereaders totestoutthetechniquestotheirownsignalanalysisproblems.The presentationofthemathematicsismuchalongthelinesofastandard undergraduatephysicsorstatisticstextbook,freeofdistractingtechnicalcomplexitiesandjargon,whilenotsacrificingrigour.Itwouldbean excellenttextbookforemergingcoursesinmachinelearningforsignals.
4Statisticalmodellingandinference93
4.1Statisticalmodels
4.2Optimalprobabilityinferences
4.4Distributionsassociatedwithmetricsandnorms
4.5Theexponentialfamily(EF)
4.6Distributionsdefinedthroughquantiles
4.7Densitiesassociatedwithpiecewiselinearlossfunctions
5Probabilisticgraphicalmodels133
5.1StatisticalmodellingwithPGMs
5.3InferenceonPGMs
7.3LTIsignalprocessing
7.4Exploitingstatisticalstabilityforlinear-GaussianDSP
Discrete-timeGaussianprocesses(GPs)andDSP
7.5TheKalmanfilter(KF)
7.6Time-varyinglinearsystems
8Discretesignals:sampling,quantizationandcoding265
8.1Discrete-timesampling
8.2Quantization
ListofAlgorithms
2.1Iterativelyreweightedleastsquares(IRLS)......... 52
2.2Gradientdescent........................ 53
2.3Backtrackinglinesearch.................... 54
2.4Goldensectionsearch..................... 55
2.5Newton’smethod....................... 56
2.6Thesubgradientmethod................... 61
2.7Aprimal-dualinterior-pointmethodforlinearprogramming(LP)........................... 64
2.8Greedysearchfordiscreteoptimization........... 67
2.9(Simple)tabusearch...................... 68
2.10Simulatedannealing...................... 69
2.11Randomrestarting....................... 70
3.1Newton’smethodfornumericalinversetransformsampling. 73
3.2Adaptiverejectionsampling(ARS)forlog-concavedensities.............................. 76
3.3Sequentialsearchinversion(simplifiedversion)....... 80
3.4Binary(subdivision)searchsampling............ 81
3.5Gibb’sMarkov-ChainMonteCarlo(MCMC)sampling... 83
3.6Markov-ChainMonteCarlo(MCMC)Metropolis-Hastings (MH)sampling......................... 86
5.1Thejunctiontreealgorithmfor(semi-ring)marginalizationinferenceofasinglevariableclique........... 142
5.2Iterativeconditionalmodes(ICM).............. 144
5.3 Expectation-maximization (E-M)............... 146
6.1Expectation-maximization(E-M)forgenerali.i.d.mixturemodels........................... 153
6.2The K-meansalgorithm.................... 172
7.1Recursive,Cooley-Tukey,radix-2,decimationintime,fast Fouriertransform(FFT)algorithm............. 210
7.2Overlap-addFIRconvolution................. 224
8.1TheLloyd-Maxalgorithmforfixed-ratequantizerdesign. 279
8.2The K-meansalgorithmforfixed-ratequantizerdesign.. 280
8.3Iterativevariable-rateentropy-constrainedquantizerdesign.281
9.1 Baum-Welch expectation-maximization(E-M)forhidden Markovmodels(HMMs)................... 308
9.2 Viterbitraining forhiddenMarkovmodels(HMMs).... 310
10.1Gaussianprocess(GP) informativevectormachine (IVM) regression............................ 325
10.2Dirichletprocessmixturemodel(DPMM)Gibbssampler. 337
10.3Dirichletprocessmeans(DP-means)algorithm....... 340
10.4Maximuma-posterioriDirichletprocessmixturecollapsed (MAP-DP)algorithmforconjugateexponentialfamily distributions.......................... 342
ListofFigures
1.1Mappingbetweenabstractgroups
2.1Lagrangemultipliersinconstrainedoptimization
2.7ConvergenceofNewton’smethodwithandwithoutline search
2.8Piecewiselinearobjectivefunction(1D)
4.1Overfittingandunderfitting:polynomialmodels
4.2Minimumdescriptionlength(MDL)
4.5ConvergencepropertiesofMetropolis-Hastings(MH)sampling
5.1Simple5-nodeprobabilisticgraphicalmodel(PGM)
5.2Notationforrepeatedconnectivityingraphicalmodels
5.3Somecomplexgraphicalmodels
5.4Threebasicprobabilisticgraphicalmodelpatterns
6.1GaussianmixturemodelforGammarayintensitystrati-
6.2Convergenceofexpectation-maximization(E-M)forthe
6.3Classifierdecisionboundaries
6.14Principalcomponentsanalysis(PCA):trafficcountsignals
6.16Locallylinearembedding(LLE):chirpsignals
7.8Computationalcomplexityoflong-durationFIRimplementations
7.9Impulseresponseandtransferfunctionofdiscretizedkernelregression
7.11Variousnonparametricpowerspectraldensityestimators 234
7.12Linearpredictivepowerspectraldensityestimators 236
7.13Regularizedlinearpredictionanalysismodelselection 238
7.14Wienerfilteringofhumanwalkingsignals 239
7.15Sinusoidalsubspaceprincipalcomponentsanalysis(PCA) filtering
7.16SubspaceMUSICfrequencyestimation
7.17Typical2DKalmanfiltertrajectories 243
7.18KalmanfiltersmoothingbyTikhonovregularization 252
7.19Short-timeFouriertransformanalysis 254
7.20Time-frequencytiling:Fourierversuswaveletanalysis 256
7.21Aselectionofwaveletfunctions 261
7.22VanishingmomentsoftheDaubechieswavelet 262
7.23Discretewavelettransformwaveletshrinkage 263
8.1DigitalMEMSsensors
8.2Discrete-timesamplinghardwareblockdiagram
8.3Non-uniformanduniformandbandlimitedsampling 268
8.4Shannon-Whittakeruniformsamplinginthefrequency domain 269
8.5DigitalGammarayintensitysignalfromadrillhole 271
8.6QuadraticB-splineuniformsampling 272
8.7Rate-distortioncurvesforani.i.d.Gaussiansource 277
8.8Lloyd-Maxscalarquantization 280
8.9Entropy-constrainedscalarquantization 282
8.10Shannon-Whittakerreconstructionofdensityfunctions 283
8.11Quantizationanddithering 285
8.12Scalarvesusvectorquantization 286
8.13Compandingquantization 287
8.14Linearpredictivecodingfordatacompression 289
8.15Transformcoding:bitcountversuscoefficientvariance 291
8.16Transformcodingofmusicalaudiosignals 292
8.17Compressivesensing:discretecosinetransformbasis 295
8.18Randomdemodulationcompressivesensing 297
9.1Quantilerunningfilterforexoplanetarylightcurves 301
9.2First-orderlog-normalrecursivefilter:dailyrainfallsignals 302
9.3Totalvariationdenoising(TVD)ofpowerspectraldensity timeseries 303
9.4Bilateralfilteringofhumanwalkingdata 304
9.5HiddenMarkovmodelling(HMM)ofpowerspectraltime seriesdata 306
9.6Cepstralanalysisofvoicesignals 312
10.1Infiniteexchangeabilityinprobabilisticgraphicalmodel form 315
10.2Linkingkernelandregularizedbasisfunctionregression 319
10.3Randomfunctiondrawsfromthelinearbasisregression model
10.4Gaussianprocess(GP)functiondrawswithvariouscovariancefunctions
10.6Gaussianprocess(GP)regressionforclassification
10.7DrawsfromtheDirichletdistributionfor K =3
10.8TheDirichletprocess(DP)
10.10ComparingpartitionsizesofDirichletversusPitman-Yor processes
10.11Parametricversusnonparametricestimationoflinearpredictivecoding(LPC)coefficients
10.12DP-meansforsignalnoisereduction:exoplanetlightcurves341
10.13Probabilisticgraphicalmodel(PGM)fortheBayesian mixturemodel
10.14PerformanceofMAP-DPversusDP-meansnonparametricclustering
Mathematicalfoundations 1
Statisticalmachinelearningandsignalprocessingaretopicsinapplied mathematics,whicharebaseduponmanyabstractmathematicalconcepts.Definingtheseconceptsclearlyisthemostimportantfirststepin thisbook.Thepurposeofthischapteristointroducethesefoundational mathematicalconcepts.Italsojustifiesthestatementthatmuchofthe artofstatisticalmachinelearningasappliedtosignalprocessing,liesin thechoiceofconvenientmathematicalmodelsthathappentobeuseful inpractice.Convenientinthiscontextmeansthatthealgebraicconsequencesofthechoiceofmathematicalmodelingassumptionsareinsome sensemanageable.Theseedsofthismanageabilityaretheelementary mathematicalconceptsuponwhichthesubjectisbuilt.
1.1Abstractalgebras
Wewilltakethesimpleviewinthisbookthatmathematicsisbasedon logicappliedto sets:asetisanunorderedcollectionofobjects,often realnumbers suchastheset {π, 1,e} (whichhasthree elements),orthe setofallrealnumbers R (withaninfinitenumberofelements).From thismodestoriginitisaremarkablefactthatwecanbuildtheentirety ofthemathematicalmethodsweneed.Wefirststartbyreviewingsome elementaryprinciplesof(abstract) algebras.
Groups
An algebra isastructurethatdefinestherulesofwhathappenswhen pairsofelementsofasetareacteduponby operations.Akindof algebraknownasa group (+, R) istheusualnotionofadditionwith pairsofrealnumbers.Itisagroupbecauseithasan identity,the numberzero(whenzeroisaddedtoanynumberitremainsunchanged, i.e.a+0=0+a = a,andeveryelementinthesethasan inverse (forany number a,thereisaninverse a whichmeansthat a+( a)=0.Finally, theoperationis associative,whichistosaythatwhenoperatingonthree ormorenumbers,additiondoesnotdependontheorderinwhichthe numbersareadded(i.e. a +(b + c)=(a + b)+ c).Additionalsohas theintuitivepropertythat a + b = b + a,i.e.itdoesnotmatterifthe numbersareswapped:theoperatoriscalled commutative,andthegroup isthencalledan Abeliangroup.Mirroringadditionismultiplication actingonthesetofrealnumberswithzeroremoved (×, R −{0}),which isalsoanAbeliangroup.Theidentityelementis1,andtheinverses
Fig.1.1:Illustratingabstractgroups andmappingbetweenthem.Shown arethetwocontinuousgroupsofreal numbersunderadditionis(leftcolumn)andmultiplication(rightcolumn),withidentities 0 and 1 respectively.Thehomomorphismof exponentiationmapsadditiononto multiplication(lefttorightcolumn), andtheinverse,thelogarithm,maps multiplicationbackontoaddition (righttoleftcolumn).Therefore, thesetwogroupsarehomomorphic.
arethereciprocalsofeachnumber.Multiplicationisalsoassociative, andcommutative.Notethatwecannotincludezerobecausethiswould requiretheinclusionoftheinverseofzero 1/0,whichdoesnotexist (Figure1.1).
Groupsarenaturallyassociatedwith symmetries.Forexample,the setofrigidgeometrictransformationsofarectanglethatleavetherectangleunchangedinthesameposition,formagrouptogetherwithcompositionsofthesetransformations(thereareflipsalongthehorizontal andverticalmidlines,oneclockwiserotationthrough180° aboutthe centre,andtheidentitytransformationthatdoesnothing).Thisgroup canbedenotedas V4 =(◦, {e,h,v,r}),where e istheidentity, h thehorizontalflip, v theverticalflip,and r therotation,withthecomposition operation ◦.Fortherectangle,wecanseethat h◦v = r,i.e.ahorizontal followedbyaverticalflipcorrespondstoa180° rotation(Figure1.2).
Veryoften,thefactthatweareabletomakesomeconvenientalgebraiccalculationsinstatisticalmachinelearningandsignalprocessing, canbetracedtotheexistenceofoneormoresymmetrygroupsthatarise duetothechoiceofmathematicalassumptions,andwewillencounter manyexamplesofthisphenomenainlaterchapters,whichoftenleadto significantcomputationalefficiencies.Astrikingexampleoftheconsequencesofgroupsinclassicalalgebraistheexplanationforwhythere arenosolutionsthatcanbewrittenintermsofaddition,multiplicationandroots,tothegeneralpolynomialequation N i=0 aixi =0 when N ≥ 5.Thisfacthasmanypracticalconsequences,forexample,itis possibletofindtheeigenvaluesofageneralmatrixofsize N × N using simpleanalyticalcalculationswhen N< 5 (althoughtheanalyticalcalculationsdobecomeprohibitivelycomplex),butthereisnopossibility ofusingsimilaranalyticaltechniqueswhen N ≥ 5,andonemustresort tonumericalmethods,andthesemethodssometimescannotguarantee tofindallsolutions!
Fig.1.2:Thegroupofsymmetries oftherectangle, V4 =(◦, {e,h,v,r}). Itconsistsofhorizontalandvertical flips,andarotationof180° aboutthe centre.Thisgroupisisomorphicto thegroup M8 =(×8, {1, 3, 5, 7}) (see Figure1.3).
Manysimplegroupswiththesamenumberofelementsare isomorphic toeachother,thatis,thereisauniquefunctionthatmapstheelements ofonegrouptotheelementsoftheother,suchthattheoperations canbeappliedconsistentlytothemappedelements.Intuitivelythen, theidentityinonegroupismappedtothatoftheothergroup.For example,therotationgroup V4 aboveisisomorphictothegroup M8 = (×8, {1, 3, 5, 7}),where×8 indicatesmultiplicationmodulo8(thatis, takingtheremainderofthemultiplicationondivisionby8,seeFigure 1.3).
Whilsttwogroupsmightnotbeisomorphic,theyaresometimes homomorphic:thereisafunctionbetweenonegroupandtheotherthat mapseachelementinthefirstgrouptooneormoreelementsinthe secondgroup,butthemappingisstillconsistentunderthesecondoperation.Averyimportantexampleistheexponentialmap, exp(x),that convertsadditionoverthesetofrealnumbers,tomultiplicationover thesetofpositiverealnumbers: ea+b = eaeb;apowerfulvariantof thismapiswidelyusedinstatisticalinferencetosimplifyandstabilize calculationsinvolvingtheprobabilitiesofindependentstatisticalevents,
byconvertingthemintocalculationswiththeirassociatedinformation content.Thisisthenegativelogarithmicmap ln(x),thatconverts probabilitiesundermultiplication,intoentropiesunderaddition.This mapisverywidelyusedinstatisticalinferenceasweshallsee.
Foramoredetailedbutaccessiblebackgroundtogrouptheory,read Humphreys(1996).
Rings
Whilstgroupsdealwithoneoperationonasetofnumbers, rings area slightlymorecomplexstructurethatoftenariseswhentwooperations areappliedtothesameset.Themostimmediatelytangibleexample istheoperationsofadditionandmultiplicationonthesetofintegers Z (thepositiveandnegativewholenumberswithzero).Usingthedefinitionabove,thesetofintegersunderadditionformanAbeliangroup, whereasundermultiplicationtheintegersformasimplestructureknown asa monoid –agroupwithoutinverses.Multiplicationwiththeintegers isassociative,andthereisanidentity(thepositivenumberone),butthe multiplicativeinversesarenotintegers(theyarefractionssuchas 1/2, 1/5 etc.)Finally,incombination,integermultiplication distributes overintegeraddition: a × (b + c)= a × b + a × c =(b + c) × a.These propertiesdefinearing:ithasoneoperationthattogetherwiththeset formsanAbeliangroup,andanotheroperationthat,withtheset,forms amonoid,andthesecondoperationdistributesoverthefirst.Aswith integers,thesetofrealnumbersundertheusualadditionandmultiplicationalsohasthestructureofaring.Anotherveryimportantexample isthesetof squarematrices allofsize N × N withrealelementsundernormalmatrixadditionandmultiplication.Herethemultiplicative identityelementisthe identitymatrix ofsize N × N ,andtheadditive identityelementisthesamesizesquarematrixwithallzeroelements.
Ringsarepowerfulstructuresthatcanleadtoverysubstantialcomputationalsavingsformanystatisticalmachinelearningandsignalprocessingproblems.Forexample,ifweremovetheconditionthatthe additiveoperationmusthaveinverses,thenwehaveapairofmonoids thataredistributive.Thisstructureisknownasa semiring or semifield anditturnsoutthattheexistenceofthisstructureinmanymachine learningandsignalprocessingproblemsmakestheseotherwisecomputationallyintractableproblemsfeasible.Forexample,theclassical Viterbi algorithm fordeterminingthemostlikelysequenceofhiddenstatesina HiddenMarkovModel (HMM)isanapplicationofthe max-sumsemifield onthe dynamicBayesiannetwork thatdefinesthestochasticdependenciesinthemodel.
BothDummitandFoote(2004)andRotman(2000)containdetailed introductionstoabstractalgebraincludinggroupsandrings.
Fig.1.3:Thetableforthesymmetrygroup V4 =(◦, {e,h,v,r}) (top), andthegroup M8 =(×8, {1, 3, 5, 7}) (bottom),showingtheisomorphism betweenthemobtainedbymapping e → 1, h → 3, v → 5 and r → 7.
Fig.1.4:Metric2Dcircles d (x, 0)= c forvariousdistancemetrics.From toptobottom,Euclideandistance, absolutedistance,thedistance d (x, y)= D i=1 |xi yi|0 3 0 3 1 , andtheMahalanobisdistancefor Σ11 =Σ22 =1 0, Σ12 =Σ21 = 0 5. Thecontoursare c =1 (redlines) and c =0 5 (bluelines).
1.2Metrics
Distanceisafundamentalconceptinmathematics.Distancefunctions playakeyroleinmachinelearningandsignalprocessing,particularly asmeasuresofsimilaritybetweenobjects,forexample,digitalsignals encodedasitemsofaset.Wewillalsoseethatastatisticalmodeloften impliestheuseofaparticularmeasureofdistance,andthismeasure determinesthepropertiesofstatisticalinferencesthatcanbemade.
A geometry isobtainedbyattachinganotionofdistancetoaset:it becomesa metricspace.A metric takestwopointsinthesetandreturns asingle(usuallyreal)valuerepresentingthedistancebetweenthem.A metricmusthavethefollowingpropertiestosatisfyintuitivenotionsof distance:
(1) Non-negativity: d (x,y) ≥ 0, (2) Symmetry: d (x,y)= d (y,x) , (3) Coincidence: d (x,x)=0,and (4) Triangleinequality: d (x,z) ≤ d (x,y)+ d (y,z).
Respectively,theserequirementsarethat(1)distancecannotbenegative,(2)thedistancegoingfrom x to y isthesameasthatfrom y to x,(3)onlypointslyingontopofeachotherhavezerodistancebetween them,and(4)thelengthofanyonesideofatriangledefinedbythree pointscannotbegreaterthanthesumofthelengthoftheothertwo sides.Forexample,the Euclideanmetric ona D-dimensionalsetis:
Thisrepresentsthenotionofdistancethatweexperienceineveryday geometry.Thedefiningpropertiesofdistanceleadtoavastrangeof possiblegeometries,forexample,the city-blockgeometry isdefinedby theabsolutedistancemetric: d (x, y)= D i=1 |xi yi| (1.2)
City-blockdistanceissonamedbecauseitmeasuresdistancesona gridparalleltotheco-ordinateaxes.Distanceneednottakeonreal values,forexample,the discretemetric isdefinedas d (x,y)=0 if x = y and d (x,y)=1 otherwise.Anotherveryimportantmetricisthe Mahalanobisdistance: d (x, y)= (x y)T Σ 1 (x y) (1.3)
Thisdistancemaynotbeaxis-aligned:itcorrespondstothefinding theEuclideandistanceafterapplyinganarbitrarystretchorcompression alongeachaxisfollowedbyanarbitrary D-dimensionalrotation(indeed if Σ = I,theidentitymatrix,thisisidenticaltotheEuclideandistance).
Figure1.4showsplotsof2D circles d (x, 0)= c forvariousmetrics,in particular c =1 whichisknownasthe unitcircle inthatmetricspace. Forfurtherreading,metricspacesareintroducedinSutherland(2009) inthecontextofrealanalysisandtopology.
1.3Vectorspaces
A space isjustthenamegiventoasetendowedwithsomeadditional mathematicalstructure.A(real) vectorspace isthekeystructureof linearalgebrathatisacentraltopicinmostofclassicalsignalprocessing—alldigitalsignalsarevectors,forexample.Thedefinition ofa(finite)vectorspacebeginswithanorderedset(oftenwrittenas acolumn)of N realnumberscalleda vector,andasinglerealnumbercalleda scalar.Tothatvectorweattachtheadditionoperation whichisbothassociativeandcommutative,thatsimplyaddseverycorrespondingelementofthenumbersineachvectortogether,writtenas v + u.Theidentityforthisoperationisthevectorwith N zeros, 0.
Additionally,wedefineascalarmultiplicationoperationthatmultiplieseachelementofthevectorwithascalar, λ.Usingthescalar multiplicationby λ = 1,wecanthenforminversesofanyvector. Scalarmultiplicationshouldnotmatterinwhichordertwoscalarmultiplicationsoccur,e.g. λ (µv)=(λµ) v =(µλ) v = µ (λv).We alsorequirethatscalarmultiplicationdistributesovervectoraddition, λ (v + u)= λv + λu,andscalaradditiondistributesoverscalarmultiplication, (λ + µ) v = λv + µv.
Everyvectorspacehasatleastone basis forthespace:thisisasetof linearlyindependentvectors,suchthateveryvectorinthevectorspace canbewrittenasauniquelinearcombinationofthesebasisvectors (Figure1.5).Sinceourvectorshave N entries,therearealways N vectorsinthebasis.Thus, N isthe dimension ofthevectorspace. Thesimplestbasisistheso-called standardbasis,consistingofthe N vectors e1 =(1, 0,..., 0)T , e2 =(0, 1,..., 0)T etc.Itiseasytoseethat avector v =(v1,v2,...,vN )T canbeexpressedintermsofthisbasisas v = v1e1 + v2e2 + ··· + vN eN
Byattachinga norm toavectorspace(seebelow),wecanmeasure thelengthofanyvector,thevectorspaceisthenreferredtoasa normed space.Tosatisfyintuitivenotionsoflength,anorm V (u) musthave thefollowingproperties:
(1) Non-negativity: V (u) ≥ 0,
(2) Positivescalability: V (αu)= |α| V (u),
(3) Separation:If V (u)=0 then u = 0,and
(4) Triangleinequality: V (u + v) ≤ V (u)+ V (v).
Often,thenotation u isused.ProbablymostfamiliaristheEuclideannorm u 2 = N i=1 u2 i ,butanothernormthatgetsheavyuse instatisticalmachinelearningisthe Lp-norm:
Fig.1.5:Importantconceptsin2D vectorspaces.Thestandardbasis (e1, e2) isalignedwiththeaxes.Two othervectors (v1, v2) canalsobeused asabasisforthespace,allthatisrequiredistheyarelinearlyindependentofeachother(inthe2Dcase, theyarenotsimplyscalarmultiples ofeachother).Then x canberepresentedineitherbasis.Thedot productbetweentwovectorsisproportionaltothecosineoftheanglebetweenthem: cos(θ) ∝ v1, v2 . Because (e1, e2) areatmutualright angles,theyhavezerodotproduct andthereforethebasisisorthogonal.Additionally,itisanorthonormalbasisbecausetheyareunitnorm (length) ( e1 = e2 =1).Theother basisvectorsareneitherorthogonal norunitnorm.
ofwhichtheEuclidean(p =2)andcity-block(p =1)normsarespecial cases.Alsoofimportanceisthemax-norm u ∞ =maxi=1...N |ui|, whichisjustthelengthofthelargestco-ordinate.
Thereareseveralwaysinwhichaproductofvectorscanbeformed. Themostimportantinourcontextisthe innerproduct betweentwo vectors:
Thisissometimesalsodescribedasthe dotproduct u v.Forcomplex vectors,thisisdefinedas:
where a isthecomplexconjugateof a ∈ C. Wewillseelaterthatthedotproductplaysacentralroleinthestatisticalnotionofcorrelation.Whentwovectorshavezeroinnerproduct,theyaresaidtobe orthogonal;geometricallytheymeetatarightangle.Thisalsohasastatisticalinterpretation:forcertainrandomvariables,orthogonalityimpliesstatisticalindependence.Thus,orthogonalityleadstosignificantsimplificationsincommoncalculationsinclassical DSP.
Aspecialandveryusefulkindofbasisisan orthogonalbasis where theinnerproductbetweeneverypairofdistinctbasisvectorsiszero:
(1.7)
Inaddition,thebasisis orthonormal ifeverybasisvectorhasunit norm vi =1 –thestandardbasisisorthonormal,forexample(Figure 1.5).Orthogonality/orthonormalitydramaticallysimplifiesmanycalculationsovervectorspaces,partlybecauseitisstraightforwardtofind the N scalarcoefficients ai ofanarbitraryvector u inthisbasisusing theinnerproduct:
(1.8) whichsimplifiesto ai = u, vi intheorthonormalcase.Orthonormal basesarethebackboneofmanymethodsinDSPandmachinelearning. WecanexpresstheEuclideannormusingtheinnerproduct: u 2 = √u u.Aninnerproductsatisfiesthefollowingproperties:
(1) Non-negativity: u v ≥ 0, (2) Symmetry: u v = v u, and
(3) Linearity: (αu) · v = α (u · v).
Thereisanintuitiveconnectionbetweendistanceandlength:assuming thatthemetricishomogeneous d (αu,αv)= |α| d (u, v) andtranslationinvariant d (u, v)= d (u + a, v + a),anormcanbedefinedasthe distancetotheorigin u = d (0, u).Acommonlyoccurringexample ofthistheso-called squared L2 weightednorm u 2 A =uT Au whichis justthesquaredMahalanobisdistance d (0, u)2 discussedearlierwith Σ 1 = A.
Ontheotherhand,thereisonesenseinwhicheverynorm induces an associatedmetricwiththeconstruction d (u, v)= u v .ThisconstructionenjoysextensiveuseinmachinelearningandstatisticalDSPto quantifythe“discrepancy”or“error”betweentwosignals.Infact,since normsare convex (discussedlater),itfollowsthatmetricsconstructed thiswayfromnorms,arealsoconvex,afactofcrucialimportancein practice.
Afinalproductwewillhaveneedforinlaterchaptersisthe elementwiseproduct w = u ◦ v whichisobtainedbymultiplyingeachelement inthevectortogether wn = unvn
Linearoperators
Alinearoperatoror map actsonvectorstocreateothervectors,and whiledoingso,preservetheoperationsofvectoradditionandscalar multiplication.Theyarehomomorphismsbetweenvectorspaces.Linearoperatorsarefundamentaltoclassicaldigitalsignalprocessingand statistics,andsofindheavyuseinmachinelearning.Linearoperators L havethe linearcombination property:
Whatthissaysisthattheoperatorcommuteswithscalarmultiplicationandvectoraddition:wegetthesameresultifwefirstscale,then addthevectors,andthenapplytheoperatortotheresult,or,applythe operatortoeachvector,thenscalethem,andthemadduptheresults (Figure1.6).
Matrices(whichwediscussnext),differentiationandintegration,and expectationinprobabilityareallexamplesoflinearoperators.The linearityofintegrationanddifferentiationarestandardruleswhichcan bederivedfromthebasicdefinitions.Linearmapsintwo-dimensional spacehaveanicegeometricinterpretation:straightlinesinthevector spacearemappedontootherstraightlines(orontoapointiftheyare degenerate maps).Thisideaextendstohigherdimensionalvectorspaces inthenaturalway.
Matrixalgebra
Whenvectorsare‘stackedtogether’theyformapowerfulstructure
Fig.1.6:A‘flowdiagram’depicting linearoperators.Alllinearoperatorssharethepropertythattheoperator L appliedtothescaledsum of(twoormore)vectors α1u1 + α2u2 (toppanel),isthesameasthescaled sumofthesameoperatorappliedto eachofthesevectorsfirst(bottom panel).Inotherwords,itdoesnot matterwhethertheoperatorisappliedbeforeorafterthescaledsum.