Network algorithmics: an interdisciplinary approach to designing fast networked devices george vargh

Page 1


https://ebookmass.com/product/network-algorithmics-an-

Instant digital products (PDF, ePub, MOBI) ready for you

Download now and discover formats that fit your needs...

Network Algorithmics George Varghese

https://ebookmass.com/product/network-algorithmics-george-varghese/

ebookmass.com

Network Algorithmics 2nd Edition Varghese

https://ebookmass.com/product/network-algorithmics-2nd-editionvarghese/

ebookmass.com

(eTextbook PDF) for Introduction to One Health: An Interdisciplinary Approach to Planetary Health

https://ebookmass.com/product/etextbook-pdf-for-introduction-to-onehealth-an-interdisciplinary-approach-to-planetary-health/

ebookmass.com

Principles of Rock Deformation and Tectonics Jean-Luc Bouchez

https://ebookmass.com/product/principles-of-rock-deformation-andtectonics-jean-luc-bouchez/

ebookmass.com

Oceaning:

Governing marine life with drones Adam Fish

https://ebookmass.com/product/oceaning-governing-marine-life-withdrones-adam-fish/

ebookmass.com

Phenomenalism: A Metaphysics of Chance and Experience 1st Edition Michael Pelczar

https://ebookmass.com/product/phenomenalism-a-metaphysics-of-chanceand-experience-1st-edition-michael-pelczar/

ebookmass.com

Race, Class, & Gender: An Anthology 9TH Edition

https://ebookmass.com/product/race-class-gender-an-anthology-9thedition/

ebookmass.com

Inky the Octopus: Based on a Real-Life Aquatic Escape! Erin Guendelsberger & David Leonard

https://ebookmass.com/product/inky-the-octopus-based-on-a-real-lifeaquatic-escape-erin-guendelsberger-david-leonard/

ebookmass.com

Consumer

Behavior: Buying, Having, and Being (11th Edition ) 11th Edition

https://ebookmass.com/product/consumer-behavior-buying-having-andbeing-11th-edition-11th-edition/

ebookmass.com

Computing: Concepts, Technology, Security, and Architecture, Second Edition Thomas Erl

https://ebookmass.com/product/cloud-computing-concepts-technologysecurity-and-architecture-second-edition-thomas-erl/

ebookmass.com

NetworkAlgorithmics

SecondEdition

GeorgeVarghese

UCLADepartmentofComputerScience

LosAngeles,CA,UnitedStates

JunXu

SchoolofComputerScience

GeorgiaInstituteofTechnology

Atlanta,GA,UnitedStates

MorganKaufmannisanimprintofElsevier 50HampshireStreet,5thFloor,Cambridge,MA02139,UnitedStates

Copyright©2022ElsevierInc.Allrightsreserved.

Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans,electronicormechanical, includingphotocopying,recording,oranyinformationstorageandretrievalsystem,withoutpermissioninwritingfromthe publisher.Detailsonhowtoseekpermission,furtherinformationaboutthePublisher’spermissionspoliciesandour arrangementswithorganizationssuchastheCopyrightClearanceCenterandtheCopyrightLicensingAgency,canbefound atourwebsite: www.elsevier.com/permissions

ThisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightbythePublisher(otherthanasmay benotedherein).

Notices

Knowledgeandbestpracticeinthisfieldareconstantlychanging.Asnewresearchandexperiencebroadenour understanding,changesinresearchmethods,professionalpractices,ormedicaltreatmentmaybecomenecessary.

Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledgeinevaluatingandusingany information,methods,compounds,orexperimentsdescribedherein.Inusingsuchinformationormethodstheyshouldbe mindfuloftheirownsafetyandthesafetyofothers,includingpartiesforwhomtheyhaveaprofessionalresponsibility.

Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors,assumeanyliabilityforany injuryand/ordamagetopersonsorpropertyasamatterofproductsliability,negligenceorotherwise,orfromanyuseor operationofanymethods,products,instructions,orideascontainedinthematerialherein.

ISBN:978-0-12-809927-8

ForinformationonallMorganKaufmannandElsevierpublications visitourwebsiteat https://www.elsevier.com/books-and-journals

Publisher: MaraConner

EditorialProjectManager: LindsayC.Lawrence

ProductionProjectManager: ManchuMohan

CoverDesigner: MatthewLimbert

TypesetbyVTeX

ForAjuandTimandAndrew,whomadeallthispossible...

PART1THERULESOFTHEGAME

2.2.5Memorysubsystemdesigntechniques

2.2.6Component-leveldesign

2.2.7Finalhardwarelessons...................................32 2.3

2.3.1Endnodearchitecture..

2.6

2.4.2Infinitememoryviavirtualmemory.........................42

3.3.2Principlesformodularitywithefficiency..

3.3.3Principlesforspeedinguproutines

4.2 Schedulerforasynchronoustransfermodeflowcontrol

PART2PLAYINGWITHENDNODES

CHAPTER5Copyingdata

5.1 Whydatacopies

5.2 Reducingcopyingvialocalrestructuring

5.2.1Exploitingadaptormemory...

5.2.2Usingcopy-on-write.....................................117

5.2.3Fbufs:optimizingpageremapping

5.2.4Transparentlyemulatingcopysemantics......................123

5.2.5Arezerocopiesusedtoday?..

5.3 AvoidingcopyingusingremoteDMA..............................126

5.3.1Avoidingcopyinginacluster..............................127

5.3.2Modern-dayincarnationsofRDMA

5.4 Broadeningtofilesystems .......................................131

5.4.1Sharedmemory .........................................131

5.4.2IO-lite:aunifiedviewofbuffering. ..........................132

5.4.3AvoidingfilesystemcopiesviaI/Osplicing...

5.5 Broadeningbeyondcopies .......................................134

5.6 Broadeningbeyonddatamanipulations...

5.6.1Usingcacheseffectively ..................................137

5.6.2DirectmemoryaccessversusprogrammedI/O

5.7 Conclusions ..................................................142

5.8 Exercises....................................................143

............................................

6.1 Whycontroloverhead? .........................................147

6.2 Avoidingschedulingoverheadinnetworkingcode

6.2.1Makinguser-levelprotocolimplementationsreal

6.3 Avoidingcontext-switchingoverheadinapplications..

6.3.1Processperclient .......................................154

6.3.2Threadperclient........................................155

6.3.3Event-drivenscheduler ...................................155

6.3.4Event-drivenserverwithhelperprocesses.....................157

6.3.5Task-basedstructuring....................................158

6.4 ScalableI/ONotification ........................................159

6.4.1Aservermystery........................................159

6.4.2Problemswithimplementationsof select() .....................160

6.4.3Analysisof select() ......................................162

6.4.4Speedingup select() withoutchangingtheAPI

6.4.5Speedingup select() bychangingtheAPI .....................164

6.5 AvoidingsystemcallsorKernelBypass... ..........................166

6.5.1Thevirtualinterfacearchitectureproposal .....................169

6.5.2DataPlaneDevelopmentKit(DPDK)........................169

6.5.3SingleRootI/OVirtualization(SR-IOV)......................170

6.6 RadicalRestructuringofOperatingSystems..........................171

6.7 Reducinginterrupts ............................................172

6.7.1Avoidingreceiverlivelock .................................173

6.8 Conclusions ..................................................174

6.9 Exercises....................................................175

CHAPTER7Maintainingtimers

7.1 Whytimers?.................................................180

7.2 Modelandperformancemeasures .................................181

7.3 Simplesttimerschemes .........................................182

7.4 Timingwheels ................................................183

7.5 Hashedwheels ................................................185

7.6 Hierarchicalwheels ............................................187

7.7 BSDimplementation...........................................189

8.1 Opportunitiesandchallengesofearlydemultiplexing

8.2

8.3 CMU/Stanfordpacketfilter:pioneeringpacketfilters

8.4 Berkeleypacketfilter:enablinghigh-performancemonitoring.

8.5 Pathfinder:factoringoutcommonchecks

8.6 Dynamicpacketfilter:compilerstotherescue

8.7 Conclusions...

8.8

PART3PLAYINGWITHROUTERS

11.1.3Lookupmodel

11.2.3Statusoftagswitching,flowswitching,andmultiprotocollabel

11.3 Non-algorithmictechniquesforprefixmatching

12.4.6Content-addressablememories

12.5 Two-dimensionalschemes. ......................................304

12.5.1Fastsearchingusingset-pruningtries........................305

12.5.2Reducingmemoryusingbacktracking

12.5.3Thebestofbothworlds:gridoftries.........................308

12.6 Approachestogeneralrulesets ...................................311

12.6.1Geometricviewofclassification............................311

12.6.2Beyondtwodimensions:thebadnews

12.6.3Beyondtwodimensions:thegoodnews...

12.7 Extendingtwo-dimensionalschemes..

13.13.1Derive R(t) fromthearrivalgraph...........................353 13.13.2Merge R(t) with S(t 1)

13.17 Combinedinputandoutputqueueing..

13.18 Scalingtolargerandfasterswitches

13.18.2Adivide-and-conquerapproachtobuildinglargeswitches.

13.18.3Closnetworksformedium-sizedrouters......................366

13.18.4Benesnetworksforlargerrouters.

13.18.5Load-balancedswitching

13.19 Scalingtofasterlinkspeeds...

14.9.2RecursivedefinitionofGPSvirtualstartandfinishtimes..

14.9.3Anotherpacketarrivalscenario...

Weightedfairqueueing

14.12 ThedatastructureandalgorithmforefficientGPSclocktracking..

14.12.2Augmenteddatastructures................................411

14.12.3The“shape”datastructure .................................413

14.13 ImplementingWFQandWF2 Q...................................418 14.14 Quickfairqueueing(QFQ)...

14.14.1Fairroundrobin(FRR)algorithm.

14.14.2HowQFQimprovesuponFRR...

14.15 Towardsprogrammablepacketscheduling.

14.15.1Push-InFirst-Out(PIFO)framework.........................422

14.15.2Universalpacketscheduler ................................423

14.16 Scalablefairqueuing ...........................................424

14.16.1Randomaggregation.. ...................................425

PART4ENDGAME

16.15 Generatingbettertrafficlogsviadatastreaming

16.16 Countingthenumberofdistinctflows..............................477

16.16.1Themin-hashalgorithm..................................477

16.16.2Anextensionofmin-hashforestimating |A

16.16.3Applicationtodatamining................................479

16.16.4Bitmapsketch:aworthyalternativetomin-hash................480

16.17 Detectionofheavyhitters

16.18 Estimationofflow-sizedistribution................................483

TheTug-of-Waralgorithmforestimating

17.1 Searchingformultiplestringsinpacketpayloads

17.1.1IntegratedstringmatchingusingAho–Corasick

Approximatestringmatching.....................................495 17.3 IPtracebackviaprobabilisticmarking

A.2 Hardwaremodels

A.2.1Fromtransistorstologicgates..............................533

A.2.2Timingdelays..........................................534

A.2.3Hardwaredesignbuildingblocks

A.2.4Memories:theinsidescoop...

A.3.1MatchingalgorithmsforClosnetworkswith

A.4 TheinterconnectionnetworkZoo

Prefacetothesecondedition

Unlessotherwisestated,thisprefaceiswrittenbyauthorXu,whoisreferredtoas“I”inthefollowing. AuthorVargheseisreferredtoas“George”inthefollowing.

WhenGeorgeinvitedmein2015towritethesecondeditionandbecomeaco-authorofthislegendarybook,Ifeltbothhumbledandhonored.Ialsofeltadeepsenseofmission,asthebulkofmy researchandteachinghadbeenonnetworkalgorithmicssincethemid-1990sandstillis.

WhenIsignedtheneweditioncontractwiththepublisher,Iwasoriginallycommittedtosignificantlyrevisingonlythreechapters:Chapter 13 (Switching),Chapter 14 (Schedulingpackets),and Chapter 16 (Measuringnetworktraffic).InAugust2020,atthesuggestionofGeorge,Iagreedto addtwosections,onEarlyBird(Section 17.6)andCarousel(Section 17.7)respectively,toChapter 17 (Networksecurity),andonesectionond-Leftapproach(Section 10.3.3),toChapter 10 (Exact-match lookups).

IhavemadethefollowingmajorrevisionstoChapter 13 (Switching).First,Ihavemadetheuseof virtualoutputqueueing(VOQ)inswitchingaseparatesection(Section 13.7)insteadofanintegralpart oftheparalleliterativematching(PIM)algorithm(Section 13.7 inthefirstedition),sincetheconcept ofVOQwasintroducedafewyearsearlierthanPIM.Sections 13.11 through 13.16 areentirelynew, inwhichafewmoresingle-crossbarswitchingalgorithms,includingSample-and-Compare,SERENA, QPS,SB-QPS,andSW-QPS,areintroduced.Thisadditioniscritical,becausePIM–iSLIPwastheonly single-crossbarswitchingalgorithm“series”describedinthefirstedition,whichdidnotshowreaders anyalternativewayofdesigningsuchswitchingalgorithms.Thesenewlyaddedalgorithmsprovide suchanalternativeviewpoint.Eachofthemwasselected(tobeincludedinthisbook)fortheirconceptualsimplicity,lowcomputationalandcommunicationcomplexity,andelegance.InSection 13.17, Ihavedescribedthecombinedinputandoutputqueueing(CIOQ)proposalthatadvocatescombining switchingwithpacketschedulingforprovidingQoSguarantees.Finally,Ihaveaddedashortsection (13.18.5)onload-balancedswitching,aresearchtopicthatjustemergedwhenthefirsteditionwentinto print.Afewothersectionshavebeenupdatedwith“modernmaterials.”Forexample,inSection 13.18.3 (Closnetworksformedium-sizerouters),Ihaveaddedafewparagraphsdescribinghowa3-stageClos networkcanbeusedfordatacenternetworkingandswitching.

IhavemadethefollowingmajorrevisionstoChapter 14 (Schedulingpackets).InSection 14.3, Ihaveaddedapproximatefairdropping(AFD),alow-complexitytechniqueforfairbandwidthallocation.InSections 14.8 through 14.14,IhaveaddedGPS,WFQ,WF2Q,QFQ,andanefficient (O(log n) timecomplexityperpacket)algorithm,calledtheshapedatastructure(publishedin2004),fortrackingtheGPSclock,whichmakesWFQandWF2 Qefficientlyimplementable(alsowith O(log n) time complexityperpacket).ThefactthatWFQ,analgorithmthatisknowntoprovideverystrongfairness guarantees,isefficientlyimplementableusingtheshapedatastructureisextremelyimportantsince WFQhasbeenwidelybelievedtobenotefficientlyimplementable(withatimecomplexityof O(n) perpacket),eventothisday.InSection 14.15,Ihavedescribedtworesearchproposalstowardsmaking packetschedulingreprogrammableinswitchesandrouters.

IhavemadethefollowingmajorrevisionstoChapter 16 (Measuringnetworktraffic).Ihaveadded athirdSRAM/DRAMhybridcounterarrayschemeinSection 16.3 tothetwosuchschemesdescribed xvii

inSection 16.2.Theformerisverydifferentthanthelattertwo,astheformerisrandomized,whereas thelattertwoarebothdeterministic.However,allthreecounterschemesarepassiveinthesensethey allowfastincrementsbutdonotallowfastreads.Hence,inSections 16.4 and 16.5,Iintroducean activecounterarrayschemecalledBRICKandaflow-statelookupscheme(thatbydefinitionhastobe active)calledRIH.Finally,inSections 16.15 through 16.19,Iprovideacrashcourseonnetworkdata streamingandsketching(DSS).DSS,whichoriginatedfirstintheareaofdatabases,hasevolvedover thepasttwodecadesintoaboomingresearchsubtopicofnetworkmeasurementandmonitoring.For example,inthepastdecadeorso,SIGCOMMandNSDItogetheracceptseveralnetworkDSSpapers almosteveryyear.

GeorgehasmadesignificantupdatestoChapters 5 (Copyingdata); 6 (Transferringcontrol); 7 (Maintainingtimers); 11 (Prefix-matchlookups); 12 (Packetclassification); 15 (Routersasdistributed systems);and 18 (Conclusions).

MyworkinwritingthiseditionhasbeensupportedinpartbyUSNationalScienceFoundationthroughgrantsNeTS-1423182,CNS-1909048,andCNS-2007006.Ihavereportedmyeffortand progresseveryyearintheannualorfinalprojectreports.

Aspecialthankstomycurrentandformereditors,LindsayLawrenceandBrianRomerandTodd Green;tomyco-authorGeorge,whocameupwiththeingeniousterm“networkalgorithmics”that definedthebulkofmyresearchinthepast25years;tomyPh.D.advisor,MukeshSinghal,whotaught mehowtowriteresearchpapers;toallmycollaboratorsonvariousnetworkalgorithmicstopics,especiallytoBillLin;tomanycolleaguesatGeorgiaTech,especiallytoMostafaAmmarandEllenZegura; toformerandcurrentSchoolChairsLanceFortnowandVivekSarkarwhogavemeareducedservice loadforthisbook-writingeffort;toformerandcurrentPh.D.studentswhoadventuredinthefieldof networkalgorithmicswithmeandwhohelpedmewithdrawingfiguresandtables,proofreading,and fixing100+ “bookbugs”inthefirsteditioncollectedbyGeorge;toanonymousreviewersofthisbook; tomyparentsandmybrother;tomywifeLinda;andtomydaughterEllie.

Preface

Computernetworkshavebecomeanintegralpartofsociety.Wetakeforgrantedtheabilitytotransact commerceovertheInternetandthatuserscanavailthemselvesofaburgeoningsetofcommunication methods,whichrangefromfilesharingtoWeblogs.However,fornetworkstotaketheirplaceaspart ofthefundamentalinfrastructureofsociety,theymustprovideperformanceguarantees.

Wetakeforgrantedthatelectricitywillflowwhenaswitchisflickedandthattelephonecalls willberoutedonMother’sDay.ButtheperformanceofcomputernetworkssuchastheInternetis stillnotoriouslyunreliable.Whiletherearemanyfactorsthatgointoperformance,onemajorissue isthatofnetworkbottlenecks.Therearetwotypesofnetworkbottlenecks: resourcebottlenecks and implementationbottlenecks.

Resourcebottlenecksoccurwhennetworkperformanceislimitedbythespeedoftheunderlying hardware;examplesincludeslowprocessorsinserverplatformsandslowcommunicationlinks.Resourcebottleneckscanbeworkedaround,atsomecost,bybuyingfasterhardware.However,itisquite oftenthecasethattheunderlyinghardwareisperfectlyadequatebutthattherealbottleneckisadesignissueintheimplementation.Forexample,aWebserverrunningonthefastestprocessorsmay runslowlybecauseofredundantdatacopying.Similarly,arouterwithasimplepacketclassification algorithmmaystartdroppingpacketswhenthenumberofACLrulesgrowsbeyondalimit,thoughit keepsupwithlinkspeedswhenclassificationisturnedoff.Thisbookconcentratesonsuchnetwork implementationbottlenecks,especiallyatserversandrouters.

Beyondserversandrouters,newbreedsofnetworkingdevicesthatintroducenewperformance bottlenecksarebecomingpopular.Asnetworksbecomemoreintegrated,devicessuchasstoragearea networks(SANs)andmultimediaswitchesarebecomingcommon.Further,asnetworksgetmorecomplex,variousspecial-purposenetworkappliancesforfilesystemsandsecurityareproliferating.While thefirstgenerationofsuchdevicesjustifiedthemselvesbythenewfunctionstheyprovided,itisbecomingcriticalthatfuturenetworkapplianceskeepupwithlinkspeeds.

Thustheobjectiveofthisbookistoprovideasetoftechniquestoovercomeimplementationbottlenecksat all networkingdevicesandtoprovideasetofprinciplesandmodelstohelpovercomecurrent and future networkingbottlenecks.

Audience

Thisbookwaswrittentoansweraneedforatextonefficientprotocolimplementations.Thevast majorityofnetworkingbooksareonnetworkprotocols;eventheimplementationbooksare,forthe mostpart,detailedexplanationsoftheprotocol.Whileprotocolsformthefoundationofthefield,there arejustahandfuloffundamentalnetworkinfrastuctureprotocolsleft,suchasTCPandIP.Ontheother hand,therearemanyimplementationsasmostcompaniesandstart-upscustomizetheirproductsto gaincompetitiveadvantage.ThisisexacerbatedbythetendencytoplaceTCPandIPeverywhere,from bridgestoSANswitchestotoasters.

Thustherearemanymorepeopleimplementingprotocolsthandesigningthem. Thisisatextbook forimplementors,networkingstudents,andnetworkingresearchers,coveringgroundfromtheartof buildingafastWebservertobuildingafastrouterandbeyond.

Todoso,thisbookdescribesacollectionofefficientimplementationtechniques;infact,aninitial sectionofeachchapterconcludeswithaQuickReferenceGuideforimplementorsthatpointstothe mostusefultechniquesforeachtopic.However,thebookgoesfurtheranddistillsafundamentalmethod ofcraftingsolutionstonewnetworkbottlenecksthatwecall networkalgorithmics.Thisprovidesthe readertoolstodesigndifferentimplementationsforspecificcontextsandtodealwithnewbottlenecks thatwillundoubtedlyariseinachangingworld.

Hereisadetailedprofileofourintendedaudience.

• NetworkProtocolImplementors: Thisgroupincludesimplementorsofendnodenetworkingstacks forlargeservers,PCs,andworkstationsandfornetworkappliances.Italsoincludesimplementorsof classicnetworkinterconnectiondevices,suchasrouters,bridges,switches,andgateways,aswellas devicesthatmonitornetworksformeasurementandsecuritypurposes.Italsoincludesimplementors ofstorageareanetworks,distributedcomputinginfrastructures,multimediaswitchesandgateways, andothernewnetworkingdevices.Thisbookcanbeespeciallyusefulforimplementorsinstart-ups aswellasinestablishedcompanies,forwhomimprovedperformancecanprovideanedge.

• NetworkingStudents: Undergraduateandgraduatestudentswhohavemasteredthebasicsofnetwork protocolscanusethisbookasatextthatdescribeshowprotocolsshouldbeimplementedtoimprove performance,potentiallyanimportantaspectoftheirfuturejobs.

• Instructors: Instructorscanusethisbookasatextbookforaone-semestercourseonnetworkalgorithmics.

• SystemsResearchers: Networkingandothersystemsresearcherscanusethistextasareferenceand asastimulusforfurtherresearchinimprovingsystemperformance.Giventhatdistributedoperating systemsanddistributedcomputinginfrastructures(e.g.,theGrid)relyonanunderlyingnetworking corewhoseperformancecanbecritical,thisbookcanbeusefultogeneralsystemsresearchers.

Whatthisbookisabout

Chapter 1 providesamoredetailedintroductiontonetworkalgorithmics.Fornow,weinformallydefine networkalgorithmicsasinterdisciplinarysystemsapproachtostreamliningnetworkimplementations. Networkalgorithmicsis interdisciplinary,becauseitrequirestechniquesfromdiversefieldssuchas architecture,operatingsystems,hardwaredesign,andalgorithms.Networkalgorithmicsisalsoa systems approach,becauseroutersandserversaresystemsinwhichefficienciescanbegainedbymoving functionsintimeandspacebetweensubsystems.

Inessence,thisbookisaboutthreethings:fundamentalnetworkingimplementationbottlenecks, generalprinciplestoaddressnewbottlenecks,andtechniquesforspecificbottlenecksthatcanbederivedfromthegeneralprinciples.

FundamentalbottlenecksforanendnodesuchasaPCorworkstationincludedatacopying,controltransfer,demultiplexing,timers,bufferallocation,checksums,andprotocolprocessing.Similarly, fundamentalbottlenecksforinterconnectdevicessuchasroutersandSANswitchesincludeexactand

prefixlookups,packetclassification,switching,andtheimplementationofmeasurementandsecurity primitives.Chapter 1 goesintomoredetailabouttheinherentcausesofthesebottlenecks.

Thefundamentalmethodsthatencompassnetworkalgorithmicsincludeimplementationmodels (Chapter 2)and15implementationprinciples(Chapter 3).Theimplementationmodelsincludemodels ofoperatingsystems,protocols,hardware,andarchitecture.Theyareincludedbecausetheworldof networkprotocolimplementationrequirestheskillsofseveraldifferentcommunities,includingoperatingsystemexperts,protocolpundits,hardwaredesigners,andcomputerarchitects.Theimplementation modelsareanattempttobridgethegapsbetweenthesetraditionallyseparatecommunities.

Ontheotherhand,theimplementationprinciplesareanattempttoabstractthemainideasbehind manyspecificimplementationtechniques.Theyincludesuchwell-knownprinciplesas“Optimizethe expectedcase.”Theyalsoincludesomewhatlesswell-knownprinciples,suchas“CombineDRAM withSRAM,”whichisasurprisinglypowerfulprincipleforproducingfasthardwaredesignsfornetworkdevices.

WhilePart 1 ofthebooklaysoutthemethodologyofnetworkalgorithmics,Part 2 applies the methodologytospecificnetworkbottlenecksinendnodesandservers.Forexample,Part 2 discusses copyavoidancetechniques(suchaspassingvirtualmemorypointersandRDMA)andefficientcontrol transfermethods(suchasbypassingthekernel,asintheVIAproposal,andtechniquesforbuilding event-drivenservers).

Similarly,Part 3 ofthebookappliesthemethodologyofPart 1 tointerconnectdevices,suchas networkrouters.Forexample,Part 3 discussesefficientprefix-lookupschemes(suchasmultibitor compressedtries)andefficientswitchingschemes(suchasthosebasedonvirtualoutputqueuesand bipartitematching).

Finally,Part 4 ofthebookappliesthemethodologyofPart 1 tonewfunctionsforsecurityandmeasurementthatcouldbehousedineitherserversorinterconnectdevices.Forexample,Part 4 discusses efficientmethodstocompresslargetrafficreportsandefficientmethodstodetectattacks.

Organizationofthebook

Thisbookisorganizedintofouroverallparts.Eachpartismadeasself-containedaspossibletoallowdetailedstudy.ReadersthatarepressedfortimecanconsulttheindexorTableofContentsfor aparticulartopic(e.g.,IPlookups).Moreimportantly,theopeningsectionofeachchapterconcludes withaQuickReferenceGuidethatpointstothemostimportanttopicsforimplementors.TheQuick ReferenceGuidemaybethefastestguideforusefullyskimmingachapter.

Part 1 ofthebookaimstofamiliarizethereaderwiththerulesandphilosophyofnetworkalgorithmics.ItstartswithChapter 2,whichdescribessimplemodelsofprotocols,operatingsystems,hardware design,andendnodeandrouterarchitectures.Chapter 3 describesindetailthe15principlesusedas acornerstoneforthebook.Chapter 4 roundsoutthefirstpartbyproviding15examples,drawnfor themostpartfromrealimplementationproblems,toallowthereaderafirstopportunitytoseethe principlesinactiononrealproblems.

Part 2 ofthebook,called“PlayingwithEndnodes,”showshowtobuildfastendnodeimplementations,suchasWebservers,thatrunongeneral-purposeoperatingsystemsandstandardcomputer architectures.ItstartswithChapter 5,whichshowshowtoreduceoravoidextradatacopying.(Copyingoftenoccurswhennetworkdataispassedbetweenimplementationmodules)andhowtoincrease

cacheefficiency.Chapter 6 showshowtoreduceoravoidtheoverheadoftransferringcontrolbetween implementationmodules,suchasthedevicedriver,thekernel,andtheapplication.Chapter 7 describes howtoefficientlymanagethousandsofoutstandingtimers,acriticalissueforlargeservers.Chapter 8 describeshowtoefficientlydemultiplexdatatoreceivingapplicationsinasinglestep,allowing innovationssuchasuser-levelnetworking.Chapter 9 describeshowtoimplementspecificfunctions thatoftenrecurinspecificprotocolimplementations,suchasbufferallocation,checksums,sequence numberbookkeeping,andreassembly.AnoverviewofPart 2 canbefoundinFig. 1.1

Part 3 ofthebook,called“PlayingwithRouters,”showshowtobuildfastrouters,bridges,and gateways.Itbeginswiththreechaptersthatdescribestatelookupsofincreasingcomplexity.Chapter 10 describesexact-matchlookups,whichareessentialforthedesignofbridgesandARPcaches.Chapter 11 describesprefix-matchlookups,whichareusedbyInternetrouterstoforwardpackets.Chapter 12 describespacketclassification,amoresophisticatedformoflookuprequiredforsecurityandqualityof service.Chapter 13 describeshowtobuildcrossbarswitches,whichinterconnectinputandoutputlinks ofdevicessuchasrouters.Finally,Chapter 14 describespacket-schedulingalgorithms,whichareused toprovidequality-of-service,andChapter 15 discussesroutersasdistributedsystems,withexamples focusingonperformanceandtheuseofdesignandreasoningtechniquesfromdistributedalgorithms. Whilethislistoffunctionsseemsshort,onecanbuildafastrouterbydesigningafastlookupalgorithm, afastswitch,andfastpacket-schedulingalgorithms.Part 4,called“Endgame,”startsbyspeculatingon thepotentialneedforimplementingmorecomplextasksinthefuture.Forexample,Chapter 16 describesefficientimplementationtechniquesformeasurementprimitives,whileChapter 17 describes efficientimplementationtechniquesforsecurityprimitives.Thebookendswithashortchapter,Chapter 18,whichreachesclosurebydistillingtheunitiesthatunderlythemanydifferenttopicsinthisbook. Thischapteralsobrieflypresentsexamplesoftheuseofalgorithmicsinacanonicalrouter(theCisco GSR)andacanonicalserver(theFlashWebserver).AmoredetailedoverviewofParts 3 and 4 ofthe bookcanbefoundinFig. 1.2

Features

Thebookhasthefollowingfeaturesthatreaders,implementors,students,andinstructorscantake advantageof.

Intuitiveintroduction: TheintroductoryparagraphofeachchapterinParts 2, 3,and 4 usesanintuitive,real-worldanalogytomotivateeachbottleneck.Forexample,weusetheanalogyofmaking multiplephotocopiesofadocumentfordatacopyingandtheanalogyofaflightdatabaseforprefix lookups.

QuickReferenceGuide: Forreadersfamiliarwithatopicandpressedfortime,theopeningsection ofeachchapterconcludeswithaQuickReferenceGuidethatpointstothemostimportantimplementationideasandthecorrespondingsectionnumbers.

Chapterorganization: Tohelporientthereader,immediatelyaftertheQuickReferenceGuidein eachchapterisamapoftheentirechapter.

Summaryoftechniques: Toemphasizethecorrelationbetweenprinciplesandtechniques,atthestart ofeachchapterisatablethatsummarizesthetechniquesdescribed,togetherwiththecorresponding principles.

Consistentuseofprinciples: AfteradetaileddescriptioninChapter 3 of15principles,therestofthe bookconsistentlyusestheseprinciplesindescribingspecifictechniques.Forreference,theprinciplesaresummarizedinsidethefrontcover.Principlesarereferredtoconsistentlybynumber—for example, P9 forPrinciple9.Sinceprinciplenumbersarehardtoremember,threeaidsareprovided. Besidestheinsidefrontcoversummaryandthesummaryatthestartofeachchapter,thefirstuse ofaprincipleinanychapterisaccompaniedbyanexplicitstatementoftheprinciple.

Exercises: Chapter 4 ofthebookprovidesasetofreal-lifeexamplesofapplyingtheprinciplesthat havebeenenjoyedbypastattendeesoftutorialsonnetworkalgorithmics.Everysubsequentchapter throughChapter 17 isfollowedbyasetofexercises.Briefsolutionstotheseexercisescanbefound inaninstructor’smanualobtainablefromElsevier.

AdditionalTeacherandStudentResources: Helpfulancillarieshavebeenpreparedtoaidlearning andteaching.

•Forstudents,resourcesareavailablehere: https://www.elsevier.com/books-and-journals/bookcompanion/9780443136795.

•Forqualifiedprofessors,instructor-onlyteachingmaterials(includingLectureslidesinPDFfor mostchapters)canberequestedhere: https://educate.elsevier.com/book/details/9780128099278

Usage

Thisbookcanbeusedinmanyways.

Textbook: Studentsandinstructorscanusethisbookasthebasisofaone-semesterclass.Asemester classonnetworkalgorithmicscanincludemostofPart 1 andcansamplechaptersfromPart 2 (e.g., Chapter 5 oncopying,Chapter 6 oncontroloverhead)andfromPart 3 (e.g.,Chapter 11 onprefix lookups,Chapter 13 onswitching).

Implementationguide: ImplementorswhocareaboutperformancemaywishtoreadallofPart 1 and thensampleParts 2 and 3 accordingtotheirneeds.

Referencebook: Implementorsandstudentscanalsousethisbookasareferencebookinadditionto otherbooksonnetworkprotocols.

Whythisbookwaswritten

Theimpetusforthisbookcamefrommyacademicresearchintoefficientprotocolimplementation.It alsocamefromthreenetworkingproductsIworkedonwithcolleagues:thefirstbridge,theGigaswitch, andtheProcket40Gbpsrouter.Toproveitselfagainstdetractors,thefirstbridgewasdesignedto operateatwirespeed,anideathatspreadtoroutersandtheentireindustry.Myexperiencewatching theworkofMarkKempfonthefirstbridge(seeChapter 10)ledtoalastinginterestinspeedingup networkingdevices.

Next,theDECGigaswitchintroducedmetotheworldofswitching.Finally,theProcketrouter wasdesignedbyaninterdisciplinaryteamthatincludeddigitaldesignerswhohaddesignedprocessors, expertswhohadwrittenvastamountsofthesoftwareincorerouters,andsomepeoplelikemyselfwho

wereinterestedinalgorithms.Despitethevariedbackgrounds,theteamproducedinnovativenewideas, whichconvincedmeoftheimportanceofinterdisciplinarythinkingforperformancebreakthroughs. ThismotivatedthewritingofChapter 2 onimplementationmodels,anattempttobridgethegaps betweenthedifferentcommunitiesinvolvedinhigh-performancedesigns.

Forseveralyears,Itaughtaclassthatcollectedtogetherthesetechniques.The15principlesemerged asawaytobreakupthetechniquesmorefinelyandsystematically.Inretrospect,someprinciplesseem redundantandglib.However,theyserveaspartofafirstattempttoorganizeavastamountofmaterial.

Ihavetaughtfiveclassesandthreetutorialsbasedonthematerialinthisbook,andsothisbookhas beengreatlyinfluencedbystudentresponsesandideas.

Acknowledgments

Aspecialthankstomyeditors:KarenGettmanandRickAdamsandKarynJohnson;toallmyadvisors,whotaughtmesomuch:WushowChou,ArneNillson,BaruchAwerbuch,NancyLynch;toall mymentors:AlanKirby,RadiaPerlman,TonyLauck,BobThomas,BobSimcoe,JonTurner;tonumerouscolleagesatDECandothercompanies,especiallytoSharadMerhotra,BillLynch,andTony LiofProcketNetworks,whotaughtmeaboutrealrouters;tostudentswhoadventuredinthefieldof networkalgorithmicswithme;tonumerousreviewersofthisbookandespeciallytoJonSnader,Tony Lauck,BrianKernighan,CraigPartridge,andRadiaPerlmanfordetailedcomments;toKevinD’Souza, StefanoPrevidi,AneesShaikh,andDarrylVeitchfortheirreviewsandideas;tomyfamily,mymother, mywife’sfatherandmother,andmysister;and,ofcourse,tomywife,Aju,andmysons,Timand Andrew.

I’dliketoendbyacknowledgingmyheroes:fourteacherswhohaveinfluencedme.Thefirstis LeonardBernstein,whotaughtmeinhislecturesonmusicthatateacher’senthusiasmforthematerial canbeinfectious.ThesecondisGeorgePolya,whotaughtmeinhisbooksonproblemsolvingthat theprocessofdiscoveryisasimportantasthefinaldiscoveriesthemselves.ThethirdisSocrates,who taughtmethroughPlatothatitisworthalwaysquestioningassumptions.ThefourthisJesus,whohas taughtmethatlife,andindeedthisbook,isnotamatterofmeritbutofgraceandgift.

15principlesusedtoovercomenetwork bottlenecks

Number

UsedIn/NetworkingExample

P1 Avoidobviouswaste Zero-copyinterfaces

P2 Shiftcomputationintime

P2a Precompute Applicationdevicechannels

P2b Evaluatelazily Copy-on-write

P2c Shareexpenses,batch Integratedlayerprocessing

P3 Relaxsystemrequirements

P3a Tradecertaintyfortime Stochasticfairqueueing

P3b Tradeaccuracyfortime Switchloadbalancing

P3c Shiftcomputationinspace IPv6fragmentation

P4 Leverageoffsystemcomponents

P4a Exploitlocality Locality-drivenreceiver

P4b Tradememoryforspeed Processing;LuleaIPlookups

P4c Exploitexistinghardware FastTCPchecksum

P5 Addhardwareg

P5a Usememoryinterleavingandpipelinin PipelinedIPlookups

P5b Usewidewordparallelism Sharedmemoryswitches

P5c CombineDRAMandSRAMeffectively Maintainingcounters

P6 Createefficientspecializedroutines UDPchecksums

P7 Avoidunnecessarygenerality Fbufs

P8 Don’tbetiedtoreferenceimplementation Upcalls

P9 Passhintsinlayerinterfaces Packetfilters

P10 Passhintsinprotocolheaders Tagswitching

P11 Optimizetheexpectedcase Headerprediction

P11a Usecaches Fbufs

P12 Addstateforspeed ActiveVClist

P12a Computeincrementally RecomputingCRCs

P13 Optimizedegreesoffreedom IPtrielookups

P14 Usebucketsorting,bitmaps Timingwheels

P15 Createefficientdatastructures Level-4switching

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.