NetworkAlgorithmics
SecondEdition
GeorgeVarghese
UCLADepartmentofComputerScience
LosAngeles,CA,UnitedStates
JunXu
SchoolofComputerScience
GeorgiaInstituteofTechnology
Atlanta,GA,UnitedStates
MorganKaufmannisanimprintofElsevier 50HampshireStreet,5thFloor,Cambridge,MA02139,UnitedStates
Copyright©2022ElsevierInc.Allrightsreserved.
Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans,electronicormechanical, includingphotocopying,recording,oranyinformationstorageandretrievalsystem,withoutpermissioninwritingfromthe publisher.Detailsonhowtoseekpermission,furtherinformationaboutthePublisher’spermissionspoliciesandour arrangementswithorganizationssuchastheCopyrightClearanceCenterandtheCopyrightLicensingAgency,canbefound atourwebsite: www.elsevier.com/permissions
ThisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightbythePublisher(otherthanasmay benotedherein).
Notices
Knowledgeandbestpracticeinthisfieldareconstantlychanging.Asnewresearchandexperiencebroadenour understanding,changesinresearchmethods,professionalpractices,ormedicaltreatmentmaybecomenecessary.
Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledgeinevaluatingandusingany information,methods,compounds,orexperimentsdescribedherein.Inusingsuchinformationormethodstheyshouldbe mindfuloftheirownsafetyandthesafetyofothers,includingpartiesforwhomtheyhaveaprofessionalresponsibility.
Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors,assumeanyliabilityforany injuryand/ordamagetopersonsorpropertyasamatterofproductsliability,negligenceorotherwise,orfromanyuseor operationofanymethods,products,instructions,orideascontainedinthematerialherein.
ISBN:978-0-12-809927-8
ForinformationonallMorganKaufmannandElsevierpublications visitourwebsiteat https://www.elsevier.com/books-and-journals
Publisher: MaraConner
EditorialProjectManager: LindsayC.Lawrence
ProductionProjectManager: ManchuMohan
CoverDesigner: MatthewLimbert
TypesetbyVTeX
ForAjuandTimandAndrew,whomadeallthispossible...
PART1THERULESOFTHEGAME
2.2.5Memorysubsystemdesigntechniques
2.2.6Component-leveldesign
2.2.7Finalhardwarelessons...................................32 2.3
2.3.1Endnodearchitecture..
2.6
2.4.2Infinitememoryviavirtualmemory.........................42
3.3.2Principlesformodularitywithefficiency..
3.3.3Principlesforspeedinguproutines
4.2 Schedulerforasynchronoustransfermodeflowcontrol
PART2PLAYINGWITHENDNODES
CHAPTER5Copyingdata
5.1 Whydatacopies
5.2 Reducingcopyingvialocalrestructuring
5.2.1Exploitingadaptormemory...
5.2.2Usingcopy-on-write.....................................117
5.2.3Fbufs:optimizingpageremapping
5.2.4Transparentlyemulatingcopysemantics......................123
5.2.5Arezerocopiesusedtoday?..
5.3 AvoidingcopyingusingremoteDMA..............................126
5.3.1Avoidingcopyinginacluster..............................127
5.3.2Modern-dayincarnationsofRDMA
5.4 Broadeningtofilesystems .......................................131
5.4.1Sharedmemory .........................................131
5.4.2IO-lite:aunifiedviewofbuffering. ..........................132
5.4.3AvoidingfilesystemcopiesviaI/Osplicing...
5.5 Broadeningbeyondcopies .......................................134
5.6 Broadeningbeyonddatamanipulations...
5.6.1Usingcacheseffectively ..................................137
5.6.2DirectmemoryaccessversusprogrammedI/O
5.7 Conclusions ..................................................142
5.8 Exercises....................................................143
............................................
6.1 Whycontroloverhead? .........................................147
6.2 Avoidingschedulingoverheadinnetworkingcode
6.2.1Makinguser-levelprotocolimplementationsreal
6.3 Avoidingcontext-switchingoverheadinapplications..
6.3.1Processperclient .......................................154
6.3.2Threadperclient........................................155
6.3.3Event-drivenscheduler ...................................155
6.3.4Event-drivenserverwithhelperprocesses.....................157
6.3.5Task-basedstructuring....................................158
6.4 ScalableI/ONotification ........................................159
6.4.1Aservermystery........................................159
6.4.2Problemswithimplementationsof select() .....................160
6.4.3Analysisof select() ......................................162
6.4.4Speedingup select() withoutchangingtheAPI
6.4.5Speedingup select() bychangingtheAPI .....................164
6.5 AvoidingsystemcallsorKernelBypass... ..........................166
6.5.1Thevirtualinterfacearchitectureproposal .....................169
6.5.2DataPlaneDevelopmentKit(DPDK)........................169
6.5.3SingleRootI/OVirtualization(SR-IOV)......................170
6.6 RadicalRestructuringofOperatingSystems..........................171
6.7 Reducinginterrupts ............................................172
6.7.1Avoidingreceiverlivelock .................................173
6.8 Conclusions ..................................................174
6.9 Exercises....................................................175
CHAPTER7Maintainingtimers
7.1 Whytimers?.................................................180
7.2 Modelandperformancemeasures .................................181
7.3 Simplesttimerschemes .........................................182
7.4 Timingwheels ................................................183
7.5 Hashedwheels ................................................185
7.6 Hierarchicalwheels ............................................187
7.7 BSDimplementation...........................................189
8.1 Opportunitiesandchallengesofearlydemultiplexing
8.2
8.3 CMU/Stanfordpacketfilter:pioneeringpacketfilters
8.4 Berkeleypacketfilter:enablinghigh-performancemonitoring.
8.5 Pathfinder:factoringoutcommonchecks
8.6 Dynamicpacketfilter:compilerstotherescue
8.7 Conclusions...
8.8
PART3PLAYINGWITHROUTERS
11.1.3Lookupmodel
11.2.3Statusoftagswitching,flowswitching,andmultiprotocollabel
11.3 Non-algorithmictechniquesforprefixmatching
12.4.6Content-addressablememories
12.5 Two-dimensionalschemes. ......................................304
12.5.1Fastsearchingusingset-pruningtries........................305
12.5.2Reducingmemoryusingbacktracking
12.5.3Thebestofbothworlds:gridoftries.........................308
12.6 Approachestogeneralrulesets ...................................311
12.6.1Geometricviewofclassification............................311
12.6.2Beyondtwodimensions:thebadnews
12.6.3Beyondtwodimensions:thegoodnews...
12.7 Extendingtwo-dimensionalschemes..
13.13.1Derive R(t) fromthearrivalgraph...........................353 13.13.2Merge R(t) with S(t 1)
13.17 Combinedinputandoutputqueueing..
13.18 Scalingtolargerandfasterswitches
13.18.2Adivide-and-conquerapproachtobuildinglargeswitches.
13.18.3Closnetworksformedium-sizedrouters......................366
13.18.4Benesnetworksforlargerrouters.
13.18.5Load-balancedswitching
13.19 Scalingtofasterlinkspeeds...
14.9.2RecursivedefinitionofGPSvirtualstartandfinishtimes..
14.9.3Anotherpacketarrivalscenario...
Weightedfairqueueing
14.12 ThedatastructureandalgorithmforefficientGPSclocktracking..
14.12.2Augmenteddatastructures................................411
14.12.3The“shape”datastructure .................................413
14.13 ImplementingWFQandWF2 Q...................................418 14.14 Quickfairqueueing(QFQ)...
14.14.1Fairroundrobin(FRR)algorithm.
14.14.2HowQFQimprovesuponFRR...
14.15 Towardsprogrammablepacketscheduling.
14.15.1Push-InFirst-Out(PIFO)framework.........................422
14.15.2Universalpacketscheduler ................................423
14.16 Scalablefairqueuing ...........................................424
14.16.1Randomaggregation.. ...................................425
PART4ENDGAME
16.15 Generatingbettertrafficlogsviadatastreaming
16.16 Countingthenumberofdistinctflows..............................477
16.16.1Themin-hashalgorithm..................................477
16.16.2Anextensionofmin-hashforestimating |A
16.16.3Applicationtodatamining................................479
16.16.4Bitmapsketch:aworthyalternativetomin-hash................480
16.17 Detectionofheavyhitters
16.18 Estimationofflow-sizedistribution................................483
TheTug-of-Waralgorithmforestimating
17.1 Searchingformultiplestringsinpacketpayloads
17.1.1IntegratedstringmatchingusingAho–Corasick
Approximatestringmatching.....................................495 17.3 IPtracebackviaprobabilisticmarking
A.2 Hardwaremodels
A.2.1Fromtransistorstologicgates..............................533
A.2.2Timingdelays..........................................534
A.2.3Hardwaredesignbuildingblocks
A.2.4Memories:theinsidescoop...
A.3.1MatchingalgorithmsforClosnetworkswith
A.4 TheinterconnectionnetworkZoo
Prefacetothesecondedition
Unlessotherwisestated,thisprefaceiswrittenbyauthorXu,whoisreferredtoas“I”inthefollowing. AuthorVargheseisreferredtoas“George”inthefollowing.
WhenGeorgeinvitedmein2015towritethesecondeditionandbecomeaco-authorofthislegendarybook,Ifeltbothhumbledandhonored.Ialsofeltadeepsenseofmission,asthebulkofmy researchandteachinghadbeenonnetworkalgorithmicssincethemid-1990sandstillis.
WhenIsignedtheneweditioncontractwiththepublisher,Iwasoriginallycommittedtosignificantlyrevisingonlythreechapters:Chapter 13 (Switching),Chapter 14 (Schedulingpackets),and Chapter 16 (Measuringnetworktraffic).InAugust2020,atthesuggestionofGeorge,Iagreedto addtwosections,onEarlyBird(Section 17.6)andCarousel(Section 17.7)respectively,toChapter 17 (Networksecurity),andonesectionond-Leftapproach(Section 10.3.3),toChapter 10 (Exact-match lookups).
IhavemadethefollowingmajorrevisionstoChapter 13 (Switching).First,Ihavemadetheuseof virtualoutputqueueing(VOQ)inswitchingaseparatesection(Section 13.7)insteadofanintegralpart oftheparalleliterativematching(PIM)algorithm(Section 13.7 inthefirstedition),sincetheconcept ofVOQwasintroducedafewyearsearlierthanPIM.Sections 13.11 through 13.16 areentirelynew, inwhichafewmoresingle-crossbarswitchingalgorithms,includingSample-and-Compare,SERENA, QPS,SB-QPS,andSW-QPS,areintroduced.Thisadditioniscritical,becausePIM–iSLIPwastheonly single-crossbarswitchingalgorithm“series”describedinthefirstedition,whichdidnotshowreaders anyalternativewayofdesigningsuchswitchingalgorithms.Thesenewlyaddedalgorithmsprovide suchanalternativeviewpoint.Eachofthemwasselected(tobeincludedinthisbook)fortheirconceptualsimplicity,lowcomputationalandcommunicationcomplexity,andelegance.InSection 13.17, Ihavedescribedthecombinedinputandoutputqueueing(CIOQ)proposalthatadvocatescombining switchingwithpacketschedulingforprovidingQoSguarantees.Finally,Ihaveaddedashortsection (13.18.5)onload-balancedswitching,aresearchtopicthatjustemergedwhenthefirsteditionwentinto print.Afewothersectionshavebeenupdatedwith“modernmaterials.”Forexample,inSection 13.18.3 (Closnetworksformedium-sizerouters),Ihaveaddedafewparagraphsdescribinghowa3-stageClos networkcanbeusedfordatacenternetworkingandswitching.
IhavemadethefollowingmajorrevisionstoChapter 14 (Schedulingpackets).InSection 14.3, Ihaveaddedapproximatefairdropping(AFD),alow-complexitytechniqueforfairbandwidthallocation.InSections 14.8 through 14.14,IhaveaddedGPS,WFQ,WF2Q,QFQ,andanefficient (O(log n) timecomplexityperpacket)algorithm,calledtheshapedatastructure(publishedin2004),fortrackingtheGPSclock,whichmakesWFQandWF2 Qefficientlyimplementable(alsowith O(log n) time complexityperpacket).ThefactthatWFQ,analgorithmthatisknowntoprovideverystrongfairness guarantees,isefficientlyimplementableusingtheshapedatastructureisextremelyimportantsince WFQhasbeenwidelybelievedtobenotefficientlyimplementable(withatimecomplexityof O(n) perpacket),eventothisday.InSection 14.15,Ihavedescribedtworesearchproposalstowardsmaking packetschedulingreprogrammableinswitchesandrouters.
IhavemadethefollowingmajorrevisionstoChapter 16 (Measuringnetworktraffic).Ihaveadded athirdSRAM/DRAMhybridcounterarrayschemeinSection 16.3 tothetwosuchschemesdescribed xvii
inSection 16.2.Theformerisverydifferentthanthelattertwo,astheformerisrandomized,whereas thelattertwoarebothdeterministic.However,allthreecounterschemesarepassiveinthesensethey allowfastincrementsbutdonotallowfastreads.Hence,inSections 16.4 and 16.5,Iintroducean activecounterarrayschemecalledBRICKandaflow-statelookupscheme(thatbydefinitionhastobe active)calledRIH.Finally,inSections 16.15 through 16.19,Iprovideacrashcourseonnetworkdata streamingandsketching(DSS).DSS,whichoriginatedfirstintheareaofdatabases,hasevolvedover thepasttwodecadesintoaboomingresearchsubtopicofnetworkmeasurementandmonitoring.For example,inthepastdecadeorso,SIGCOMMandNSDItogetheracceptseveralnetworkDSSpapers almosteveryyear.
GeorgehasmadesignificantupdatestoChapters 5 (Copyingdata); 6 (Transferringcontrol); 7 (Maintainingtimers); 11 (Prefix-matchlookups); 12 (Packetclassification); 15 (Routersasdistributed systems);and 18 (Conclusions).
MyworkinwritingthiseditionhasbeensupportedinpartbyUSNationalScienceFoundationthroughgrantsNeTS-1423182,CNS-1909048,andCNS-2007006.Ihavereportedmyeffortand progresseveryyearintheannualorfinalprojectreports.
Aspecialthankstomycurrentandformereditors,LindsayLawrenceandBrianRomerandTodd Green;tomyco-authorGeorge,whocameupwiththeingeniousterm“networkalgorithmics”that definedthebulkofmyresearchinthepast25years;tomyPh.D.advisor,MukeshSinghal,whotaught mehowtowriteresearchpapers;toallmycollaboratorsonvariousnetworkalgorithmicstopics,especiallytoBillLin;tomanycolleaguesatGeorgiaTech,especiallytoMostafaAmmarandEllenZegura; toformerandcurrentSchoolChairsLanceFortnowandVivekSarkarwhogavemeareducedservice loadforthisbook-writingeffort;toformerandcurrentPh.D.studentswhoadventuredinthefieldof networkalgorithmicswithmeandwhohelpedmewithdrawingfiguresandtables,proofreading,and fixing100+ “bookbugs”inthefirsteditioncollectedbyGeorge;toanonymousreviewersofthisbook; tomyparentsandmybrother;tomywifeLinda;andtomydaughterEllie.
Preface
Computernetworkshavebecomeanintegralpartofsociety.Wetakeforgrantedtheabilitytotransact commerceovertheInternetandthatuserscanavailthemselvesofaburgeoningsetofcommunication methods,whichrangefromfilesharingtoWeblogs.However,fornetworkstotaketheirplaceaspart ofthefundamentalinfrastructureofsociety,theymustprovideperformanceguarantees.
Wetakeforgrantedthatelectricitywillflowwhenaswitchisflickedandthattelephonecalls willberoutedonMother’sDay.ButtheperformanceofcomputernetworkssuchastheInternetis stillnotoriouslyunreliable.Whiletherearemanyfactorsthatgointoperformance,onemajorissue isthatofnetworkbottlenecks.Therearetwotypesofnetworkbottlenecks: resourcebottlenecks and implementationbottlenecks.
Resourcebottlenecksoccurwhennetworkperformanceislimitedbythespeedoftheunderlying hardware;examplesincludeslowprocessorsinserverplatformsandslowcommunicationlinks.Resourcebottleneckscanbeworkedaround,atsomecost,bybuyingfasterhardware.However,itisquite oftenthecasethattheunderlyinghardwareisperfectlyadequatebutthattherealbottleneckisadesignissueintheimplementation.Forexample,aWebserverrunningonthefastestprocessorsmay runslowlybecauseofredundantdatacopying.Similarly,arouterwithasimplepacketclassification algorithmmaystartdroppingpacketswhenthenumberofACLrulesgrowsbeyondalimit,thoughit keepsupwithlinkspeedswhenclassificationisturnedoff.Thisbookconcentratesonsuchnetwork implementationbottlenecks,especiallyatserversandrouters.
Beyondserversandrouters,newbreedsofnetworkingdevicesthatintroducenewperformance bottlenecksarebecomingpopular.Asnetworksbecomemoreintegrated,devicessuchasstoragearea networks(SANs)andmultimediaswitchesarebecomingcommon.Further,asnetworksgetmorecomplex,variousspecial-purposenetworkappliancesforfilesystemsandsecurityareproliferating.While thefirstgenerationofsuchdevicesjustifiedthemselvesbythenewfunctionstheyprovided,itisbecomingcriticalthatfuturenetworkapplianceskeepupwithlinkspeeds.
Thustheobjectiveofthisbookistoprovideasetoftechniquestoovercomeimplementationbottlenecksat all networkingdevicesandtoprovideasetofprinciplesandmodelstohelpovercomecurrent and future networkingbottlenecks.
Audience
Thisbookwaswrittentoansweraneedforatextonefficientprotocolimplementations.Thevast majorityofnetworkingbooksareonnetworkprotocols;eventheimplementationbooksare,forthe mostpart,detailedexplanationsoftheprotocol.Whileprotocolsformthefoundationofthefield,there arejustahandfuloffundamentalnetworkinfrastuctureprotocolsleft,suchasTCPandIP.Ontheother hand,therearemanyimplementationsasmostcompaniesandstart-upscustomizetheirproductsto gaincompetitiveadvantage.ThisisexacerbatedbythetendencytoplaceTCPandIPeverywhere,from bridgestoSANswitchestotoasters.
Thustherearemanymorepeopleimplementingprotocolsthandesigningthem. Thisisatextbook forimplementors,networkingstudents,andnetworkingresearchers,coveringgroundfromtheartof buildingafastWebservertobuildingafastrouterandbeyond.
Todoso,thisbookdescribesacollectionofefficientimplementationtechniques;infact,aninitial sectionofeachchapterconcludeswithaQuickReferenceGuideforimplementorsthatpointstothe mostusefultechniquesforeachtopic.However,thebookgoesfurtheranddistillsafundamentalmethod ofcraftingsolutionstonewnetworkbottlenecksthatwecall networkalgorithmics.Thisprovidesthe readertoolstodesigndifferentimplementationsforspecificcontextsandtodealwithnewbottlenecks thatwillundoubtedlyariseinachangingworld.
Hereisadetailedprofileofourintendedaudience.
• NetworkProtocolImplementors: Thisgroupincludesimplementorsofendnodenetworkingstacks forlargeservers,PCs,andworkstationsandfornetworkappliances.Italsoincludesimplementorsof classicnetworkinterconnectiondevices,suchasrouters,bridges,switches,andgateways,aswellas devicesthatmonitornetworksformeasurementandsecuritypurposes.Italsoincludesimplementors ofstorageareanetworks,distributedcomputinginfrastructures,multimediaswitchesandgateways, andothernewnetworkingdevices.Thisbookcanbeespeciallyusefulforimplementorsinstart-ups aswellasinestablishedcompanies,forwhomimprovedperformancecanprovideanedge.
• NetworkingStudents: Undergraduateandgraduatestudentswhohavemasteredthebasicsofnetwork protocolscanusethisbookasatextthatdescribeshowprotocolsshouldbeimplementedtoimprove performance,potentiallyanimportantaspectoftheirfuturejobs.
• Instructors: Instructorscanusethisbookasatextbookforaone-semestercourseonnetworkalgorithmics.
• SystemsResearchers: Networkingandothersystemsresearcherscanusethistextasareferenceand asastimulusforfurtherresearchinimprovingsystemperformance.Giventhatdistributedoperating systemsanddistributedcomputinginfrastructures(e.g.,theGrid)relyonanunderlyingnetworking corewhoseperformancecanbecritical,thisbookcanbeusefultogeneralsystemsresearchers.
Whatthisbookisabout
Chapter 1 providesamoredetailedintroductiontonetworkalgorithmics.Fornow,weinformallydefine networkalgorithmicsasinterdisciplinarysystemsapproachtostreamliningnetworkimplementations. Networkalgorithmicsis interdisciplinary,becauseitrequirestechniquesfromdiversefieldssuchas architecture,operatingsystems,hardwaredesign,andalgorithms.Networkalgorithmicsisalsoa systems approach,becauseroutersandserversaresystemsinwhichefficienciescanbegainedbymoving functionsintimeandspacebetweensubsystems.
Inessence,thisbookisaboutthreethings:fundamentalnetworkingimplementationbottlenecks, generalprinciplestoaddressnewbottlenecks,andtechniquesforspecificbottlenecksthatcanbederivedfromthegeneralprinciples.
FundamentalbottlenecksforanendnodesuchasaPCorworkstationincludedatacopying,controltransfer,demultiplexing,timers,bufferallocation,checksums,andprotocolprocessing.Similarly, fundamentalbottlenecksforinterconnectdevicessuchasroutersandSANswitchesincludeexactand
prefixlookups,packetclassification,switching,andtheimplementationofmeasurementandsecurity primitives.Chapter 1 goesintomoredetailabouttheinherentcausesofthesebottlenecks.
Thefundamentalmethodsthatencompassnetworkalgorithmicsincludeimplementationmodels (Chapter 2)and15implementationprinciples(Chapter 3).Theimplementationmodelsincludemodels ofoperatingsystems,protocols,hardware,andarchitecture.Theyareincludedbecausetheworldof networkprotocolimplementationrequirestheskillsofseveraldifferentcommunities,includingoperatingsystemexperts,protocolpundits,hardwaredesigners,andcomputerarchitects.Theimplementation modelsareanattempttobridgethegapsbetweenthesetraditionallyseparatecommunities.
Ontheotherhand,theimplementationprinciplesareanattempttoabstractthemainideasbehind manyspecificimplementationtechniques.Theyincludesuchwell-knownprinciplesas“Optimizethe expectedcase.”Theyalsoincludesomewhatlesswell-knownprinciples,suchas“CombineDRAM withSRAM,”whichisasurprisinglypowerfulprincipleforproducingfasthardwaredesignsfornetworkdevices.
WhilePart 1 ofthebooklaysoutthemethodologyofnetworkalgorithmics,Part 2 applies the methodologytospecificnetworkbottlenecksinendnodesandservers.Forexample,Part 2 discusses copyavoidancetechniques(suchaspassingvirtualmemorypointersandRDMA)andefficientcontrol transfermethods(suchasbypassingthekernel,asintheVIAproposal,andtechniquesforbuilding event-drivenservers).
Similarly,Part 3 ofthebookappliesthemethodologyofPart 1 tointerconnectdevices,suchas networkrouters.Forexample,Part 3 discussesefficientprefix-lookupschemes(suchasmultibitor compressedtries)andefficientswitchingschemes(suchasthosebasedonvirtualoutputqueuesand bipartitematching).
Finally,Part 4 ofthebookappliesthemethodologyofPart 1 tonewfunctionsforsecurityandmeasurementthatcouldbehousedineitherserversorinterconnectdevices.Forexample,Part 4 discusses efficientmethodstocompresslargetrafficreportsandefficientmethodstodetectattacks.
Organizationofthebook
Thisbookisorganizedintofouroverallparts.Eachpartismadeasself-containedaspossibletoallowdetailedstudy.ReadersthatarepressedfortimecanconsulttheindexorTableofContentsfor aparticulartopic(e.g.,IPlookups).Moreimportantly,theopeningsectionofeachchapterconcludes withaQuickReferenceGuidethatpointstothemostimportanttopicsforimplementors.TheQuick ReferenceGuidemaybethefastestguideforusefullyskimmingachapter.
Part 1 ofthebookaimstofamiliarizethereaderwiththerulesandphilosophyofnetworkalgorithmics.ItstartswithChapter 2,whichdescribessimplemodelsofprotocols,operatingsystems,hardware design,andendnodeandrouterarchitectures.Chapter 3 describesindetailthe15principlesusedas acornerstoneforthebook.Chapter 4 roundsoutthefirstpartbyproviding15examples,drawnfor themostpartfromrealimplementationproblems,toallowthereaderafirstopportunitytoseethe principlesinactiononrealproblems.
Part 2 ofthebook,called“PlayingwithEndnodes,”showshowtobuildfastendnodeimplementations,suchasWebservers,thatrunongeneral-purposeoperatingsystemsandstandardcomputer architectures.ItstartswithChapter 5,whichshowshowtoreduceoravoidextradatacopying.(Copyingoftenoccurswhennetworkdataispassedbetweenimplementationmodules)andhowtoincrease
cacheefficiency.Chapter 6 showshowtoreduceoravoidtheoverheadoftransferringcontrolbetween implementationmodules,suchasthedevicedriver,thekernel,andtheapplication.Chapter 7 describes howtoefficientlymanagethousandsofoutstandingtimers,acriticalissueforlargeservers.Chapter 8 describeshowtoefficientlydemultiplexdatatoreceivingapplicationsinasinglestep,allowing innovationssuchasuser-levelnetworking.Chapter 9 describeshowtoimplementspecificfunctions thatoftenrecurinspecificprotocolimplementations,suchasbufferallocation,checksums,sequence numberbookkeeping,andreassembly.AnoverviewofPart 2 canbefoundinFig. 1.1
Part 3 ofthebook,called“PlayingwithRouters,”showshowtobuildfastrouters,bridges,and gateways.Itbeginswiththreechaptersthatdescribestatelookupsofincreasingcomplexity.Chapter 10 describesexact-matchlookups,whichareessentialforthedesignofbridgesandARPcaches.Chapter 11 describesprefix-matchlookups,whichareusedbyInternetrouterstoforwardpackets.Chapter 12 describespacketclassification,amoresophisticatedformoflookuprequiredforsecurityandqualityof service.Chapter 13 describeshowtobuildcrossbarswitches,whichinterconnectinputandoutputlinks ofdevicessuchasrouters.Finally,Chapter 14 describespacket-schedulingalgorithms,whichareused toprovidequality-of-service,andChapter 15 discussesroutersasdistributedsystems,withexamples focusingonperformanceandtheuseofdesignandreasoningtechniquesfromdistributedalgorithms. Whilethislistoffunctionsseemsshort,onecanbuildafastrouterbydesigningafastlookupalgorithm, afastswitch,andfastpacket-schedulingalgorithms.Part 4,called“Endgame,”startsbyspeculatingon thepotentialneedforimplementingmorecomplextasksinthefuture.Forexample,Chapter 16 describesefficientimplementationtechniquesformeasurementprimitives,whileChapter 17 describes efficientimplementationtechniquesforsecurityprimitives.Thebookendswithashortchapter,Chapter 18,whichreachesclosurebydistillingtheunitiesthatunderlythemanydifferenttopicsinthisbook. Thischapteralsobrieflypresentsexamplesoftheuseofalgorithmicsinacanonicalrouter(theCisco GSR)andacanonicalserver(theFlashWebserver).AmoredetailedoverviewofParts 3 and 4 ofthe bookcanbefoundinFig. 1.2
Features
Thebookhasthefollowingfeaturesthatreaders,implementors,students,andinstructorscantake advantageof.
Intuitiveintroduction: TheintroductoryparagraphofeachchapterinParts 2, 3,and 4 usesanintuitive,real-worldanalogytomotivateeachbottleneck.Forexample,weusetheanalogyofmaking multiplephotocopiesofadocumentfordatacopyingandtheanalogyofaflightdatabaseforprefix lookups.
QuickReferenceGuide: Forreadersfamiliarwithatopicandpressedfortime,theopeningsection ofeachchapterconcludeswithaQuickReferenceGuidethatpointstothemostimportantimplementationideasandthecorrespondingsectionnumbers.
Chapterorganization: Tohelporientthereader,immediatelyaftertheQuickReferenceGuidein eachchapterisamapoftheentirechapter.
Summaryoftechniques: Toemphasizethecorrelationbetweenprinciplesandtechniques,atthestart ofeachchapterisatablethatsummarizesthetechniquesdescribed,togetherwiththecorresponding principles.
Consistentuseofprinciples: AfteradetaileddescriptioninChapter 3 of15principles,therestofthe bookconsistentlyusestheseprinciplesindescribingspecifictechniques.Forreference,theprinciplesaresummarizedinsidethefrontcover.Principlesarereferredtoconsistentlybynumber—for example, P9 forPrinciple9.Sinceprinciplenumbersarehardtoremember,threeaidsareprovided. Besidestheinsidefrontcoversummaryandthesummaryatthestartofeachchapter,thefirstuse ofaprincipleinanychapterisaccompaniedbyanexplicitstatementoftheprinciple.
Exercises: Chapter 4 ofthebookprovidesasetofreal-lifeexamplesofapplyingtheprinciplesthat havebeenenjoyedbypastattendeesoftutorialsonnetworkalgorithmics.Everysubsequentchapter throughChapter 17 isfollowedbyasetofexercises.Briefsolutionstotheseexercisescanbefound inaninstructor’smanualobtainablefromElsevier.
AdditionalTeacherandStudentResources: Helpfulancillarieshavebeenpreparedtoaidlearning andteaching.
•Forstudents,resourcesareavailablehere: https://www.elsevier.com/books-and-journals/bookcompanion/9780443136795.
•Forqualifiedprofessors,instructor-onlyteachingmaterials(includingLectureslidesinPDFfor mostchapters)canberequestedhere: https://educate.elsevier.com/book/details/9780128099278
Usage
Thisbookcanbeusedinmanyways.
Textbook: Studentsandinstructorscanusethisbookasthebasisofaone-semesterclass.Asemester classonnetworkalgorithmicscanincludemostofPart 1 andcansamplechaptersfromPart 2 (e.g., Chapter 5 oncopying,Chapter 6 oncontroloverhead)andfromPart 3 (e.g.,Chapter 11 onprefix lookups,Chapter 13 onswitching).
Implementationguide: ImplementorswhocareaboutperformancemaywishtoreadallofPart 1 and thensampleParts 2 and 3 accordingtotheirneeds.
Referencebook: Implementorsandstudentscanalsousethisbookasareferencebookinadditionto otherbooksonnetworkprotocols.
Whythisbookwaswritten
Theimpetusforthisbookcamefrommyacademicresearchintoefficientprotocolimplementation.It alsocamefromthreenetworkingproductsIworkedonwithcolleagues:thefirstbridge,theGigaswitch, andtheProcket40Gbpsrouter.Toproveitselfagainstdetractors,thefirstbridgewasdesignedto operateatwirespeed,anideathatspreadtoroutersandtheentireindustry.Myexperiencewatching theworkofMarkKempfonthefirstbridge(seeChapter 10)ledtoalastinginterestinspeedingup networkingdevices.
Next,theDECGigaswitchintroducedmetotheworldofswitching.Finally,theProcketrouter wasdesignedbyaninterdisciplinaryteamthatincludeddigitaldesignerswhohaddesignedprocessors, expertswhohadwrittenvastamountsofthesoftwareincorerouters,andsomepeoplelikemyselfwho
wereinterestedinalgorithms.Despitethevariedbackgrounds,theteamproducedinnovativenewideas, whichconvincedmeoftheimportanceofinterdisciplinarythinkingforperformancebreakthroughs. ThismotivatedthewritingofChapter 2 onimplementationmodels,anattempttobridgethegaps betweenthedifferentcommunitiesinvolvedinhigh-performancedesigns.
Forseveralyears,Itaughtaclassthatcollectedtogetherthesetechniques.The15principlesemerged asawaytobreakupthetechniquesmorefinelyandsystematically.Inretrospect,someprinciplesseem redundantandglib.However,theyserveaspartofafirstattempttoorganizeavastamountofmaterial.
Ihavetaughtfiveclassesandthreetutorialsbasedonthematerialinthisbook,andsothisbookhas beengreatlyinfluencedbystudentresponsesandideas.
Acknowledgments
Aspecialthankstomyeditors:KarenGettmanandRickAdamsandKarynJohnson;toallmyadvisors,whotaughtmesomuch:WushowChou,ArneNillson,BaruchAwerbuch,NancyLynch;toall mymentors:AlanKirby,RadiaPerlman,TonyLauck,BobThomas,BobSimcoe,JonTurner;tonumerouscolleagesatDECandothercompanies,especiallytoSharadMerhotra,BillLynch,andTony LiofProcketNetworks,whotaughtmeaboutrealrouters;tostudentswhoadventuredinthefieldof networkalgorithmicswithme;tonumerousreviewersofthisbookandespeciallytoJonSnader,Tony Lauck,BrianKernighan,CraigPartridge,andRadiaPerlmanfordetailedcomments;toKevinD’Souza, StefanoPrevidi,AneesShaikh,andDarrylVeitchfortheirreviewsandideas;tomyfamily,mymother, mywife’sfatherandmother,andmysister;and,ofcourse,tomywife,Aju,andmysons,Timand Andrew.
I’dliketoendbyacknowledgingmyheroes:fourteacherswhohaveinfluencedme.Thefirstis LeonardBernstein,whotaughtmeinhislecturesonmusicthatateacher’senthusiasmforthematerial canbeinfectious.ThesecondisGeorgePolya,whotaughtmeinhisbooksonproblemsolvingthat theprocessofdiscoveryisasimportantasthefinaldiscoveriesthemselves.ThethirdisSocrates,who taughtmethroughPlatothatitisworthalwaysquestioningassumptions.ThefourthisJesus,whohas taughtmethatlife,andindeedthisbook,isnotamatterofmeritbutofgraceandgift.