Machine learning for signal processing: data science, algorithms, and computational statistics max a

Page 1


https://ebookmass.com/product/machine-learning-for-signalprocessing-data-science-algorithms-and-computationalstatistics-max-a-little/

Instant digital products (PDF, ePub, MOBI) ready for you

Download now and discover formats that fit your needs...

Machine Learning Algorithms for Signal and Image Processing Suman Lata Tripathi

https://ebookmass.com/product/machine-learning-algorithms-for-signaland-image-processing-suman-lata-tripathi/

ebookmass.com

Introduction to algorithms for data mining and machine learning Yang

https://ebookmass.com/product/introduction-to-algorithms-for-datamining-and-machine-learning-yang/

ebookmass.com

Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms,

https://ebookmass.com/product/fundamentals-of-machine-learning-forpredictive-data-analytics-algorithms/

ebookmass.com

Imaging Atlas of Human Anatomy 4th Edition

https://ebookmass.com/product/imaging-atlas-of-human-anatomy-4thedition/

ebookmass.com

Nationalism in Internationalism: Ireland's Relationship with the European Union Michael Holmes

https://ebookmass.com/product/nationalism-in-internationalismirelands-relationship-with-the-european-union-michael-holmes/

ebookmass.com

Give Me Liberty!: An American History (Fifth APu00acu00c6 Edition)

https://ebookmass.com/product/give-me-liberty-an-american-historyfifth-ap%c2%acae-edition/

ebookmass.com

Aries Armed: A fated mates superhero series (Zodiac Guardians Book 8) Tamar Sloan & Tricia Barr

https://ebookmass.com/product/aries-armed-a-fated-mates-superheroseries-zodiac-guardians-book-8-tamar-sloan-tricia-barr/

ebookmass.com

Bad Boy's Downfall: A Surprise Baby Hockey Romance (Tennessee Thunderbolts Book 6) Gina Azzi

https://ebookmass.com/product/bad-boys-downfall-a-surprise-babyhockey-romance-tennessee-thunderbolts-book-6-gina-azzi/

ebookmass.com

Newsmaking Cultures in Africa 1st ed. Edition Hayes Mawindi Mabweazara

https://ebookmass.com/product/newsmaking-cultures-in-africa-1st-ededition-hayes-mawindi-mabweazara/

ebookmass.com

https://ebookmass.com/product/survival-instincts-1-whiteout-1stedition-adriana-anders/

ebookmass.com

MachineLearningforSignal Processing

GreatClarendonStreet,Oxford,OX26DP, UnitedKingdom

OxfordUniversityPressisadepartmentoftheUniversityofOxford. ItfurtherstheUniversity’sobjectiveofexcellenceinresearch,scholarship, andeducationbypublishingworldwide.Oxfordisaregisteredtrademarkof OxfordUniversityPressintheUKandincertainothercountries

©MaxA.Little2019

Themoralrightsoftheauthorhavebeenasserted

FirstEditionpublishedin2019

Impression:1

Allrightsreserved.Nopartofthispublicationmaybereproduced,storedin aretrievalsystem,ortransmitted,inanyformorbyanymeans,withoutthe priorpermissioninwritingofOxfordUniversityPress,orasexpresslypermitted bylaw,bylicenceorundertermsagreedwiththeappropriatereprographics rightsorganization.Enquiriesconcerningreproductionoutsidethescopeofthe aboveshouldbesenttotheRightsDepartment,OxfordUniversityPress,atthe addressabove

Youmustnotcirculatethisworkinanyotherform andyoumustimposethissameconditiononanyacquirer

PublishedintheUnitedStatesofAmericabyOxfordUniversityPress 198MadisonAvenue,NewYork,NY10016,UnitedStatesofAmerica

BritishLibraryCataloguinginPublicationData

Dataavailable

LibraryofCongressControlNumber:2019944777

ISBN978–0–19–871493–4 DOI:10.1093/oso/9780198714934.001.0001

Printedandboundby CPIGroup(UK)Ltd,Croydon,CR04YY

Preface

Digitalsignalprocessing(DSP)isoneofthe‘foundational’butsomewhatinvisible,engineeringtopicsofthemodernworld,withoutwhich, manyofthetechnologieswetakeforgranted:thedigitaltelephone, digitalradio,television,CDandMP3players,WiFi,radar,toname justafew,wouldnotbepossible.Arelativenewcomerbycomparison, statisticalmachinelearningisthetheoreticalbackboneofexcitingtechnologiesthatarebynowstartingtoreachalevelofubiquity,suchas automatictechniquesforcarregistrationplaterecognition,speechrecognition,stockmarketprediction,defectdetectiononassemblylines,robot guidanceandautonomouscarnavigation.Statisticalmachinelearning hasoriginsintherecentmergingofclassicalprobabilityandstatistics withartificialintelligence,whichexploitstheanalogybetweenintelligent informationprocessinginbiologicalbrainsandsophisticatedstatistical modellingandinference.

DSPandstatisticalmachinelearningareofsuchwideimportanceto theknowledgeeconomythatbothhaveundergonerapidchangesand seenradicalimprovementsinscopeandapplicability.BothDSPand statisticalmachinelearningmakeuseofkeytopicsinappliedmathematicssuchasprobabilityandstatistics,algebra,calculus,graphsand networks.Therefore,intimateformallinksbetweenthetwosubjects existandbecauseofthis,anemergingconsensusviewisthatDSPand statisticalmachinelearningshouldnotbeseenasseparatesubjects.The manyoverlapsthatexistbetweenthetwosubjectscanbeexploitedto producenewdigitalsignalprocessingtoolsofsurprisingutilityandefficiency,andwideapplicability,highlysuitedtothecontemporaryworld ofpervasivedigitalsensorsandhigh-poweredandyetcheap,computinghardware.Thisbookgivesasolidmathematicalfoundationtothe topicofstatisticalmachinelearningforsignalprocessing,includingthe contemporaryconceptsofthe probabilisticgraphicalmodel (PGM)and nonparametricBayes,conceptswhichhaveonlymorerecentlyemerged asimportantforsolvingDSPproblems.

Thebookisaimedatadvancedundergraduatesorfirst-yearPhDstudentsaswellasresearchersandpractitioners.Itaddressesthefoundationalmathematicalconcepts,illustratedwithpertinentandpractical examplesacrossarangeofproblemsinengineeringandscience.Theaim istoenablestudentswithanundergraduatebackgroundinmathematics,statisticsorphysics,fromawiderangeofquantitativedisciplines, togetquicklyuptospeedwiththelatesttechniquesandconceptsin thisfast-movingfield.Theaccompanyingsoftwarewillenablereaders totestoutthetechniquestotheirownsignalanalysisproblems.The presentationofthemathematicsismuchalongthelinesofastandard undergraduatephysicsorstatisticstextbook,freeofdistractingtechnicalcomplexitiesandjargon,whilenotsacrificingrigour.Itwouldbean excellenttextbookforemergingcoursesinmachinelearningforsignals.

4Statisticalmodellingandinference93

4.1Statisticalmodels

4.2Optimalprobabilityinferences

4.4Distributionsassociatedwithmetricsandnorms

4.5Theexponentialfamily(EF)

4.6Distributionsdefinedthroughquantiles

4.7Densitiesassociatedwithpiecewiselinearlossfunctions

5Probabilisticgraphicalmodels133

5.1StatisticalmodellingwithPGMs

5.3InferenceonPGMs

7.3LTIsignalprocessing

7.4Exploitingstatisticalstabilityforlinear-GaussianDSP

Discrete-timeGaussianprocesses(GPs)andDSP

7.5TheKalmanfilter(KF)

7.6Time-varyinglinearsystems

8Discretesignals:sampling,quantizationandcoding265

8.1Discrete-timesampling

8.2Quantization

ListofAlgorithms

2.1Iterativelyreweightedleastsquares(IRLS)......... 52

2.2Gradientdescent........................ 53

2.3Backtrackinglinesearch.................... 54

2.4Goldensectionsearch..................... 55

2.5Newton’smethod....................... 56

2.6Thesubgradientmethod................... 61

2.7Aprimal-dualinterior-pointmethodforlinearprogramming(LP)........................... 64

2.8Greedysearchfordiscreteoptimization........... 67

2.9(Simple)tabusearch...................... 68

2.10Simulatedannealing...................... 69

2.11Randomrestarting....................... 70

3.1Newton’smethodfornumericalinversetransformsampling. 73

3.2Adaptiverejectionsampling(ARS)forlog-concavedensities.............................. 76

3.3Sequentialsearchinversion(simplifiedversion)....... 80

3.4Binary(subdivision)searchsampling............ 81

3.5Gibb’sMarkov-ChainMonteCarlo(MCMC)sampling... 83

3.6Markov-ChainMonteCarlo(MCMC)Metropolis-Hastings (MH)sampling......................... 86

5.1Thejunctiontreealgorithmfor(semi-ring)marginalizationinferenceofasinglevariableclique........... 142

5.2Iterativeconditionalmodes(ICM).............. 144

5.3 Expectation-maximization (E-M)............... 146

6.1Expectation-maximization(E-M)forgenerali.i.d.mixturemodels........................... 153

6.2The K-meansalgorithm.................... 172

7.1Recursive,Cooley-Tukey,radix-2,decimationintime,fast Fouriertransform(FFT)algorithm............. 210

7.2Overlap-addFIRconvolution................. 224

8.1TheLloyd-Maxalgorithmforfixed-ratequantizerdesign. 279

8.2The K-meansalgorithmforfixed-ratequantizerdesign.. 280

8.3Iterativevariable-rateentropy-constrainedquantizerdesign.281

9.1 Baum-Welch expectation-maximization(E-M)forhidden Markovmodels(HMMs)................... 308

9.2 Viterbitraining forhiddenMarkovmodels(HMMs).... 310

10.1Gaussianprocess(GP) informativevectormachine (IVM) regression............................ 325

10.2Dirichletprocessmixturemodel(DPMM)Gibbssampler. 337

10.3Dirichletprocessmeans(DP-means)algorithm....... 340

10.4Maximuma-posterioriDirichletprocessmixturecollapsed (MAP-DP)algorithmforconjugateexponentialfamily distributions.......................... 342

ListofFigures

1.1Mappingbetweenabstractgroups

2.1Lagrangemultipliersinconstrainedoptimization

2.7ConvergenceofNewton’smethodwithandwithoutline search

2.8Piecewiselinearobjectivefunction(1D)

4.1Overfittingandunderfitting:polynomialmodels

4.2Minimumdescriptionlength(MDL)

4.5ConvergencepropertiesofMetropolis-Hastings(MH)sampling

5.1Simple5-nodeprobabilisticgraphicalmodel(PGM)

5.2Notationforrepeatedconnectivityingraphicalmodels

5.3Somecomplexgraphicalmodels

5.4Threebasicprobabilisticgraphicalmodelpatterns

6.1GaussianmixturemodelforGammarayintensitystrati-

6.2Convergenceofexpectation-maximization(E-M)forthe

6.3Classifierdecisionboundaries

6.14Principalcomponentsanalysis(PCA):trafficcountsignals

6.16Locallylinearembedding(LLE):chirpsignals

7.8Computationalcomplexityoflong-durationFIRimplementations

7.9Impulseresponseandtransferfunctionofdiscretizedkernelregression

7.11Variousnonparametricpowerspectraldensityestimators 234

7.12Linearpredictivepowerspectraldensityestimators 236

7.13Regularizedlinearpredictionanalysismodelselection 238

7.14Wienerfilteringofhumanwalkingsignals 239

7.15Sinusoidalsubspaceprincipalcomponentsanalysis(PCA) filtering

7.16SubspaceMUSICfrequencyestimation

7.17Typical2DKalmanfiltertrajectories 243

7.18KalmanfiltersmoothingbyTikhonovregularization 252

7.19Short-timeFouriertransformanalysis 254

7.20Time-frequencytiling:Fourierversuswaveletanalysis 256

7.21Aselectionofwaveletfunctions 261

7.22VanishingmomentsoftheDaubechieswavelet 262

7.23Discretewavelettransformwaveletshrinkage 263

8.1DigitalMEMSsensors

8.2Discrete-timesamplinghardwareblockdiagram

8.3Non-uniformanduniformandbandlimitedsampling 268

8.4Shannon-Whittakeruniformsamplinginthefrequency domain 269

8.5DigitalGammarayintensitysignalfromadrillhole 271

8.6QuadraticB-splineuniformsampling 272

8.7Rate-distortioncurvesforani.i.d.Gaussiansource 277

8.8Lloyd-Maxscalarquantization 280

8.9Entropy-constrainedscalarquantization 282

8.10Shannon-Whittakerreconstructionofdensityfunctions 283

8.11Quantizationanddithering 285

8.12Scalarvesusvectorquantization 286

8.13Compandingquantization 287

8.14Linearpredictivecodingfordatacompression 289

8.15Transformcoding:bitcountversuscoefficientvariance 291

8.16Transformcodingofmusicalaudiosignals 292

8.17Compressivesensing:discretecosinetransformbasis 295

8.18Randomdemodulationcompressivesensing 297

9.1Quantilerunningfilterforexoplanetarylightcurves 301

9.2First-orderlog-normalrecursivefilter:dailyrainfallsignals 302

9.3Totalvariationdenoising(TVD)ofpowerspectraldensity timeseries 303

9.4Bilateralfilteringofhumanwalkingdata 304

9.5HiddenMarkovmodelling(HMM)ofpowerspectraltime seriesdata 306

9.6Cepstralanalysisofvoicesignals 312

10.1Infiniteexchangeabilityinprobabilisticgraphicalmodel form 315

10.2Linkingkernelandregularizedbasisfunctionregression 319

10.3Randomfunctiondrawsfromthelinearbasisregression model

10.4Gaussianprocess(GP)functiondrawswithvariouscovariancefunctions

10.6Gaussianprocess(GP)regressionforclassification

10.7DrawsfromtheDirichletdistributionfor K =3

10.8TheDirichletprocess(DP)

10.10ComparingpartitionsizesofDirichletversusPitman-Yor processes

10.11Parametricversusnonparametricestimationoflinearpredictivecoding(LPC)coefficients

10.12DP-meansforsignalnoisereduction:exoplanetlightcurves341

10.13Probabilisticgraphicalmodel(PGM)fortheBayesian mixturemodel

10.14PerformanceofMAP-DPversusDP-meansnonparametricclustering

Mathematicalfoundations 1

Statisticalmachinelearningandsignalprocessingaretopicsinapplied mathematics,whicharebaseduponmanyabstractmathematicalconcepts.Definingtheseconceptsclearlyisthemostimportantfirststepin thisbook.Thepurposeofthischapteristointroducethesefoundational mathematicalconcepts.Italsojustifiesthestatementthatmuchofthe artofstatisticalmachinelearningasappliedtosignalprocessing,liesin thechoiceofconvenientmathematicalmodelsthathappentobeuseful inpractice.Convenientinthiscontextmeansthatthealgebraicconsequencesofthechoiceofmathematicalmodelingassumptionsareinsome sensemanageable.Theseedsofthismanageabilityaretheelementary mathematicalconceptsuponwhichthesubjectisbuilt.

1.1Abstractalgebras

Wewilltakethesimpleviewinthisbookthatmathematicsisbasedon logicappliedto sets:asetisanunorderedcollectionofobjects,often realnumbers suchastheset {π, 1,e} (whichhasthree elements),orthe setofallrealnumbers R (withaninfinitenumberofelements).From thismodestoriginitisaremarkablefactthatwecanbuildtheentirety ofthemathematicalmethodsweneed.Wefirststartbyreviewingsome elementaryprinciplesof(abstract) algebras.

Groups

An algebra isastructurethatdefinestherulesofwhathappenswhen pairsofelementsofasetareacteduponby operations.Akindof algebraknownasa group (+, R) istheusualnotionofadditionwith pairsofrealnumbers.Itisagroupbecauseithasan identity,the numberzero(whenzeroisaddedtoanynumberitremainsunchanged, i.e.a+0=0+a = a,andeveryelementinthesethasan inverse (forany number a,thereisaninverse a whichmeansthat a+( a)=0.Finally, theoperationis associative,whichistosaythatwhenoperatingonthree ormorenumbers,additiondoesnotdependontheorderinwhichthe numbersareadded(i.e. a +(b + c)=(a + b)+ c).Additionalsohas theintuitivepropertythat a + b = b + a,i.e.itdoesnotmatterifthe numbersareswapped:theoperatoriscalled commutative,andthegroup isthencalledan Abeliangroup.Mirroringadditionismultiplication actingonthesetofrealnumberswithzeroremoved (×, R −{0}),which isalsoanAbeliangroup.Theidentityelementis1,andtheinverses

Fig.1.1:Illustratingabstractgroups andmappingbetweenthem.Shown arethetwocontinuousgroupsofreal numbersunderadditionis(leftcolumn)andmultiplication(rightcolumn),withidentities 0 and 1 respectively.Thehomomorphismof exponentiationmapsadditiononto multiplication(lefttorightcolumn), andtheinverse,thelogarithm,maps multiplicationbackontoaddition (righttoleftcolumn).Therefore, thesetwogroupsarehomomorphic.

arethereciprocalsofeachnumber.Multiplicationisalsoassociative, andcommutative.Notethatwecannotincludezerobecausethiswould requiretheinclusionoftheinverseofzero 1/0,whichdoesnotexist (Figure1.1).

Groupsarenaturallyassociatedwith symmetries.Forexample,the setofrigidgeometrictransformationsofarectanglethatleavetherectangleunchangedinthesameposition,formagrouptogetherwithcompositionsofthesetransformations(thereareflipsalongthehorizontal andverticalmidlines,oneclockwiserotationthrough180° aboutthe centre,andtheidentitytransformationthatdoesnothing).Thisgroup canbedenotedas V4 =(◦, {e,h,v,r}),where e istheidentity, h thehorizontalflip, v theverticalflip,and r therotation,withthecomposition operation ◦.Fortherectangle,wecanseethat h◦v = r,i.e.ahorizontal followedbyaverticalflipcorrespondstoa180° rotation(Figure1.2).

Veryoften,thefactthatweareabletomakesomeconvenientalgebraiccalculationsinstatisticalmachinelearningandsignalprocessing, canbetracedtotheexistenceofoneormoresymmetrygroupsthatarise duetothechoiceofmathematicalassumptions,andwewillencounter manyexamplesofthisphenomenainlaterchapters,whichoftenleadto significantcomputationalefficiencies.Astrikingexampleoftheconsequencesofgroupsinclassicalalgebraistheexplanationforwhythere arenosolutionsthatcanbewrittenintermsofaddition,multiplicationandroots,tothegeneralpolynomialequation N i=0 aixi =0 when N ≥ 5.Thisfacthasmanypracticalconsequences,forexample,itis possibletofindtheeigenvaluesofageneralmatrixofsize N × N using simpleanalyticalcalculationswhen N< 5 (althoughtheanalyticalcalculationsdobecomeprohibitivelycomplex),butthereisnopossibility ofusingsimilaranalyticaltechniqueswhen N ≥ 5,andonemustresort tonumericalmethods,andthesemethodssometimescannotguarantee tofindallsolutions!

Fig.1.2:Thegroupofsymmetries oftherectangle, V4 =(◦, {e,h,v,r}). Itconsistsofhorizontalandvertical flips,andarotationof180° aboutthe centre.Thisgroupisisomorphicto thegroup M8 =(×8, {1, 3, 5, 7}) (see Figure1.3).

Manysimplegroupswiththesamenumberofelementsare isomorphic toeachother,thatis,thereisauniquefunctionthatmapstheelements ofonegrouptotheelementsoftheother,suchthattheoperations canbeappliedconsistentlytothemappedelements.Intuitivelythen, theidentityinonegroupismappedtothatoftheothergroup.For example,therotationgroup V4 aboveisisomorphictothegroup M8 = (×8, {1, 3, 5, 7}),where×8 indicatesmultiplicationmodulo8(thatis, takingtheremainderofthemultiplicationondivisionby8,seeFigure 1.3).

Whilsttwogroupsmightnotbeisomorphic,theyaresometimes homomorphic:thereisafunctionbetweenonegroupandtheotherthat mapseachelementinthefirstgrouptooneormoreelementsinthe secondgroup,butthemappingisstillconsistentunderthesecondoperation.Averyimportantexampleistheexponentialmap, exp(x),that convertsadditionoverthesetofrealnumbers,tomultiplicationover thesetofpositiverealnumbers: ea+b = eaeb;apowerfulvariantof thismapiswidelyusedinstatisticalinferencetosimplifyandstabilize calculationsinvolvingtheprobabilitiesofindependentstatisticalevents,

byconvertingthemintocalculationswiththeirassociatedinformation content.Thisisthenegativelogarithmicmap ln(x),thatconverts probabilitiesundermultiplication,intoentropiesunderaddition.This mapisverywidelyusedinstatisticalinferenceasweshallsee.

Foramoredetailedbutaccessiblebackgroundtogrouptheory,read Humphreys(1996).

Rings

Whilstgroupsdealwithoneoperationonasetofnumbers, rings area slightlymorecomplexstructurethatoftenariseswhentwooperations areappliedtothesameset.Themostimmediatelytangibleexample istheoperationsofadditionandmultiplicationonthesetofintegers Z (thepositiveandnegativewholenumberswithzero).Usingthedefinitionabove,thesetofintegersunderadditionformanAbeliangroup, whereasundermultiplicationtheintegersformasimplestructureknown asa monoid –agroupwithoutinverses.Multiplicationwiththeintegers isassociative,andthereisanidentity(thepositivenumberone),butthe multiplicativeinversesarenotintegers(theyarefractionssuchas 1/2, 1/5 etc.)Finally,incombination,integermultiplication distributes overintegeraddition: a × (b + c)= a × b + a × c =(b + c) × a.These propertiesdefinearing:ithasoneoperationthattogetherwiththeset formsanAbeliangroup,andanotheroperationthat,withtheset,forms amonoid,andthesecondoperationdistributesoverthefirst.Aswith integers,thesetofrealnumbersundertheusualadditionandmultiplicationalsohasthestructureofaring.Anotherveryimportantexample isthesetof squarematrices allofsize N × N withrealelementsundernormalmatrixadditionandmultiplication.Herethemultiplicative identityelementisthe identitymatrix ofsize N × N ,andtheadditive identityelementisthesamesizesquarematrixwithallzeroelements.

Ringsarepowerfulstructuresthatcanleadtoverysubstantialcomputationalsavingsformanystatisticalmachinelearningandsignalprocessingproblems.Forexample,ifweremovetheconditionthatthe additiveoperationmusthaveinverses,thenwehaveapairofmonoids thataredistributive.Thisstructureisknownasa semiring or semifield anditturnsoutthattheexistenceofthisstructureinmanymachine learningandsignalprocessingproblemsmakestheseotherwisecomputationallyintractableproblemsfeasible.Forexample,theclassical Viterbi algorithm fordeterminingthemostlikelysequenceofhiddenstatesina HiddenMarkovModel (HMM)isanapplicationofthe max-sumsemifield onthe dynamicBayesiannetwork thatdefinesthestochasticdependenciesinthemodel.

BothDummitandFoote(2004)andRotman(2000)containdetailed introductionstoabstractalgebraincludinggroupsandrings.

Fig.1.3:Thetableforthesymmetrygroup V4 =(◦, {e,h,v,r}) (top), andthegroup M8 =(×8, {1, 3, 5, 7}) (bottom),showingtheisomorphism betweenthemobtainedbymapping e → 1, h → 3, v → 5 and r → 7.

Fig.1.4:Metric2Dcircles d (x, 0)= c forvariousdistancemetrics.From toptobottom,Euclideandistance, absolutedistance,thedistance d (x, y)= D i=1 |xi yi|0 3 0 3 1 , andtheMahalanobisdistancefor Σ11 =Σ22 =1 0, Σ12 =Σ21 = 0 5. Thecontoursare c =1 (redlines) and c =0 5 (bluelines).

1.2Metrics

Distanceisafundamentalconceptinmathematics.Distancefunctions playakeyroleinmachinelearningandsignalprocessing,particularly asmeasuresofsimilaritybetweenobjects,forexample,digitalsignals encodedasitemsofaset.Wewillalsoseethatastatisticalmodeloften impliestheuseofaparticularmeasureofdistance,andthismeasure determinesthepropertiesofstatisticalinferencesthatcanbemade.

A geometry isobtainedbyattachinganotionofdistancetoaset:it becomesa metricspace.A metric takestwopointsinthesetandreturns asingle(usuallyreal)valuerepresentingthedistancebetweenthem.A metricmusthavethefollowingpropertiestosatisfyintuitivenotionsof distance:

(1) Non-negativity: d (x,y) ≥ 0, (2) Symmetry: d (x,y)= d (y,x) , (3) Coincidence: d (x,x)=0,and (4) Triangleinequality: d (x,z) ≤ d (x,y)+ d (y,z).

Respectively,theserequirementsarethat(1)distancecannotbenegative,(2)thedistancegoingfrom x to y isthesameasthatfrom y to x,(3)onlypointslyingontopofeachotherhavezerodistancebetween them,and(4)thelengthofanyonesideofatriangledefinedbythree pointscannotbegreaterthanthesumofthelengthoftheothertwo sides.Forexample,the Euclideanmetric ona D-dimensionalsetis:

Thisrepresentsthenotionofdistancethatweexperienceineveryday geometry.Thedefiningpropertiesofdistanceleadtoavastrangeof possiblegeometries,forexample,the city-blockgeometry isdefinedby theabsolutedistancemetric: d (x, y)= D i=1 |xi yi| (1.2)

City-blockdistanceissonamedbecauseitmeasuresdistancesona gridparalleltotheco-ordinateaxes.Distanceneednottakeonreal values,forexample,the discretemetric isdefinedas d (x,y)=0 if x = y and d (x,y)=1 otherwise.Anotherveryimportantmetricisthe Mahalanobisdistance: d (x, y)= (x y)T Σ 1 (x y) (1.3)

Thisdistancemaynotbeaxis-aligned:itcorrespondstothefinding theEuclideandistanceafterapplyinganarbitrarystretchorcompression alongeachaxisfollowedbyanarbitrary D-dimensionalrotation(indeed if Σ = I,theidentitymatrix,thisisidenticaltotheEuclideandistance).

Figure1.4showsplotsof2D circles d (x, 0)= c forvariousmetrics,in particular c =1 whichisknownasthe unitcircle inthatmetricspace. Forfurtherreading,metricspacesareintroducedinSutherland(2009) inthecontextofrealanalysisandtopology.

1.3Vectorspaces

A space isjustthenamegiventoasetendowedwithsomeadditional mathematicalstructure.A(real) vectorspace isthekeystructureof linearalgebrathatisacentraltopicinmostofclassicalsignalprocessing—alldigitalsignalsarevectors,forexample.Thedefinition ofa(finite)vectorspacebeginswithanorderedset(oftenwrittenas acolumn)of N realnumberscalleda vector,andasinglerealnumbercalleda scalar.Tothatvectorweattachtheadditionoperation whichisbothassociativeandcommutative,thatsimplyaddseverycorrespondingelementofthenumbersineachvectortogether,writtenas v + u.Theidentityforthisoperationisthevectorwith N zeros, 0.

Additionally,wedefineascalarmultiplicationoperationthatmultiplieseachelementofthevectorwithascalar, λ.Usingthescalar multiplicationby λ = 1,wecanthenforminversesofanyvector. Scalarmultiplicationshouldnotmatterinwhichordertwoscalarmultiplicationsoccur,e.g. λ (µv)=(λµ) v =(µλ) v = µ (λv).We alsorequirethatscalarmultiplicationdistributesovervectoraddition, λ (v + u)= λv + λu,andscalaradditiondistributesoverscalarmultiplication, (λ + µ) v = λv + µv.

Everyvectorspacehasatleastone basis forthespace:thisisasetof linearlyindependentvectors,suchthateveryvectorinthevectorspace canbewrittenasauniquelinearcombinationofthesebasisvectors (Figure1.5).Sinceourvectorshave N entries,therearealways N vectorsinthebasis.Thus, N isthe dimension ofthevectorspace. Thesimplestbasisistheso-called standardbasis,consistingofthe N vectors e1 =(1, 0,..., 0)T , e2 =(0, 1,..., 0)T etc.Itiseasytoseethat avector v =(v1,v2,...,vN )T canbeexpressedintermsofthisbasisas v = v1e1 + v2e2 + ··· + vN eN

Byattachinga norm toavectorspace(seebelow),wecanmeasure thelengthofanyvector,thevectorspaceisthenreferredtoasa normed space.Tosatisfyintuitivenotionsoflength,anorm V (u) musthave thefollowingproperties:

(1) Non-negativity: V (u) ≥ 0,

(2) Positivescalability: V (αu)= |α| V (u),

(3) Separation:If V (u)=0 then u = 0,and

(4) Triangleinequality: V (u + v) ≤ V (u)+ V (v).

Often,thenotation u isused.ProbablymostfamiliaristheEuclideannorm u 2 = N i=1 u2 i ,butanothernormthatgetsheavyuse instatisticalmachinelearningisthe Lp-norm:

Fig.1.5:Importantconceptsin2D vectorspaces.Thestandardbasis (e1, e2) isalignedwiththeaxes.Two othervectors (v1, v2) canalsobeused asabasisforthespace,allthatisrequiredistheyarelinearlyindependentofeachother(inthe2Dcase, theyarenotsimplyscalarmultiples ofeachother).Then x canberepresentedineitherbasis.Thedot productbetweentwovectorsisproportionaltothecosineoftheanglebetweenthem: cos(θ) ∝ v1, v2 . Because (e1, e2) areatmutualright angles,theyhavezerodotproduct andthereforethebasisisorthogonal.Additionally,itisanorthonormalbasisbecausetheyareunitnorm (length) ( e1 = e2 =1).Theother basisvectorsareneitherorthogonal norunitnorm.

ofwhichtheEuclidean(p =2)andcity-block(p =1)normsarespecial cases.Alsoofimportanceisthemax-norm u ∞ =maxi=1...N |ui|, whichisjustthelengthofthelargestco-ordinate.

Thereareseveralwaysinwhichaproductofvectorscanbeformed. Themostimportantinourcontextisthe innerproduct betweentwo vectors:

Thisissometimesalsodescribedasthe dotproduct u v.Forcomplex vectors,thisisdefinedas:

where a isthecomplexconjugateof a ∈ C. Wewillseelaterthatthedotproductplaysacentralroleinthestatisticalnotionofcorrelation.Whentwovectorshavezeroinnerproduct,theyaresaidtobe orthogonal;geometricallytheymeetatarightangle.Thisalsohasastatisticalinterpretation:forcertainrandomvariables,orthogonalityimpliesstatisticalindependence.Thus,orthogonalityleadstosignificantsimplificationsincommoncalculationsinclassical DSP.

Aspecialandveryusefulkindofbasisisan orthogonalbasis where theinnerproductbetweeneverypairofdistinctbasisvectorsiszero:

(1.7)

Inaddition,thebasisis orthonormal ifeverybasisvectorhasunit norm vi =1 –thestandardbasisisorthonormal,forexample(Figure 1.5).Orthogonality/orthonormalitydramaticallysimplifiesmanycalculationsovervectorspaces,partlybecauseitisstraightforwardtofind the N scalarcoefficients ai ofanarbitraryvector u inthisbasisusing theinnerproduct:

(1.8) whichsimplifiesto ai = u, vi intheorthonormalcase.Orthonormal basesarethebackboneofmanymethodsinDSPandmachinelearning. WecanexpresstheEuclideannormusingtheinnerproduct: u 2 = √u u.Aninnerproductsatisfiesthefollowingproperties:

(1) Non-negativity: u v ≥ 0, (2) Symmetry: u v = v u, and

(3) Linearity: (αu) · v = α (u · v).

Thereisanintuitiveconnectionbetweendistanceandlength:assuming thatthemetricishomogeneous d (αu,αv)= |α| d (u, v) andtranslationinvariant d (u, v)= d (u + a, v + a),anormcanbedefinedasthe distancetotheorigin u = d (0, u).Acommonlyoccurringexample ofthistheso-called squared L2 weightednorm u 2 A =uT Au whichis justthesquaredMahalanobisdistance d (0, u)2 discussedearlierwith Σ 1 = A.

Ontheotherhand,thereisonesenseinwhicheverynorm induces an associatedmetricwiththeconstruction d (u, v)= u v .ThisconstructionenjoysextensiveuseinmachinelearningandstatisticalDSPto quantifythe“discrepancy”or“error”betweentwosignals.Infact,since normsare convex (discussedlater),itfollowsthatmetricsconstructed thiswayfromnorms,arealsoconvex,afactofcrucialimportancein practice.

Afinalproductwewillhaveneedforinlaterchaptersisthe elementwiseproduct w = u ◦ v whichisobtainedbymultiplyingeachelement inthevectortogether wn = unvn

Linearoperators

Alinearoperatoror map actsonvectorstocreateothervectors,and whiledoingso,preservetheoperationsofvectoradditionandscalar multiplication.Theyarehomomorphismsbetweenvectorspaces.Linearoperatorsarefundamentaltoclassicaldigitalsignalprocessingand statistics,andsofindheavyuseinmachinelearning.Linearoperators L havethe linearcombination property:

Whatthissaysisthattheoperatorcommuteswithscalarmultiplicationandvectoraddition:wegetthesameresultifwefirstscale,then addthevectors,andthenapplytheoperatortotheresult,or,applythe operatortoeachvector,thenscalethem,andthemadduptheresults (Figure1.6).

Matrices(whichwediscussnext),differentiationandintegration,and expectationinprobabilityareallexamplesoflinearoperators.The linearityofintegrationanddifferentiationarestandardruleswhichcan bederivedfromthebasicdefinitions.Linearmapsintwo-dimensional spacehaveanicegeometricinterpretation:straightlinesinthevector spacearemappedontootherstraightlines(orontoapointiftheyare degenerate maps).Thisideaextendstohigherdimensionalvectorspaces inthenaturalway.

Matrixalgebra

Whenvectorsare‘stackedtogether’theyformapowerfulstructure

Fig.1.6:A‘flowdiagram’depicting linearoperators.Alllinearoperatorssharethepropertythattheoperator L appliedtothescaledsum of(twoormore)vectors α1u1 + α2u2 (toppanel),isthesameasthescaled sumofthesameoperatorappliedto eachofthesevectorsfirst(bottom panel).Inotherwords,itdoesnot matterwhethertheoperatorisappliedbeforeorafterthescaledsum.

Turn static files into dynamic content formats.

Create a flipbook