RenuTewari
IBMT.J.WatsonResearchCenter
MichaelDahlin,HarrickM.VinDepartmentofComputerSciencesJonathanS.Kay
CephalapodProliferationists,Inc.
UniversityofTexasatAustintewarir@watson.ibm.com
dahlin,vin@cs.utexas.edu
Abstract
Inthispaper,wedescribethedesignandimplementa-tionofanintegratedarchitectureforcachesystemsthatscaletohundredsorthousandsofcacheswiththousandstomillionsofusers.Ratherthansimplytrytomaximizehitrates,wetakeanend-to-endapproachtoimprovingre-sponsetimebyalsoconsideringhittimesandmisstimes.WebeginbystudyingseveralInternetcachesandwork-loads,andwederivethreecoredesignprinciplesforlargescaledistributedcaches:(1)minimizethenumberofhopstolocateandaccessdataonbothhitsandmisses,(2)sharedataamongmanyusersandscaletomanycaches,and(3)cachedataclosetoclients.Ourstrategiesforaddressingtheseissuesarebuiltaroundascalable,high-performancedata-locationservicethattrackswhereobjectsarerepli-cated.Wedescribehowtoconstructsuchaserviceandhowtousethisservicetoprovidedirectaccesstoremotedataandpush-baseddatareplication.Weevaluateoursystemthroughtrace-drivensimulationandfindthatthesestrate-giestogetherprovideresponsetimespeedupsof1.27to2.43comparedtoatraditionalthree-levelcachehierarchyforarangeoftraceworkloadsandsimulatedenvironments.
1.Introduction
ThegrowthoftheInternetandtheWorldWideWeballowincreasingnumberofuserstoaccessvastamountsofin-
jkay@cs.utexas.edu
amongmanyusersandscaletomanycaches,and(3)cachedataclosedtoclients.Althoughtheseprinciplesmayseemobviousinretrospect,currentcachearchitecturesroutinelyviolatethematasignificantperformancecost.Forexample,hierarchicalcachesintheUnitedStatesareoftenseenasawaytoreducebandwidthconsumptionratherthanasawaytoimproveresponsetime.
Toaddresstheseprinciples,wedesignascalable,high-performancedata-locationservicethattrackswhereobjectsarereplicated.Wedescribehowtoconstructsuchaserviceandhowtousethisservicetomeetourdesignprinciplesviadirectaccesstoremotedataandpush-baseddatarepli-cation.Throughsimulationusingarangeofworkloadsandnetworkenvironments,wefindthatdirectaccesstoremotedatacanachievespeedupsof1.3to2.3comparedtoastan-dardhierarchy.Wealsofindthatpushingadditionalreplicasofdataprovidesadditionalspeedupsof1.12to1.25.
Weconstructourdata-locationserviceusingascalablehinthierarchyinwhicheachnodetracksthenearestlo-cationofeachobject.Scalabilityandperformanceofthehinthierarchycomesfromfoursources.First,weusesim-ple,compactdatastructurestoalloweachnode’sviewofthehinthierarchytotrackthelocationmanyobjects.Sec-ond,thelocationsystemsatisfiesallon-linerequestslo-callyusingthehintcache;thesystemonlysendsnetworkmessagesthroughthehierarchytopropagateinformationinthebackground—offthecriticalpathforend-userre-quests.Third,thehierarchyprunesupdatessothatupdatesarepropagatedonlytotheaffectednodes.Fourth,weadaptPlaxton’salgorithm[24]tobuildascalable,faulttoleranthierarchyfordistributinginformation.
Wehaveimplementedaprototypeofoursystembyaugmentingthewidely-deployedSquidproxycache[34].1Itimplementshintcaches,push-on-write,andself-configuringdynamichierarchies.
Therestofthepaperisorganizedasfollows.Section2evaluatestheperformanceoftraditionalcachehierarchiesandexaminesthecharacteristicsofseverallargeworkloadsandthenderivesasetofbasicdesignprinciplesforlarge-scale,distributedcaches.Section3providesanoverviewofourdesign,andSection4discussesimplementationdetailsandevaluatessystemperformance.Section5surveysre-latedwork,andSection6summarizesourconclusionsandoutlinesareasforfutureresearch.
#ofAccessesDistinct(millions)
Dates#of
DEC[6]
Berkeley[13]Prodigy
2.Evaluatingtraditionalcachehierarchies
Inthissection,weevaluatetheperformanceoftraditionalcachehierarchiesusingmeasurementsofseveralcachesontheInternetandtrace-drivensimulations,withthegoalofunderstandingthefactorsthatlimitcacheperformance.
cacheconsistencybyinvalidatingallcachedcopieswhen-everdatachange.Wedothisfortworeasons.First,tech-niquesforapproximatingorprovidingstrongcacheconsis-tencyinthisenvironmentareimproving[36],soweexpectthisassumptiontobeagoodreflectionofachievablefuturecachetechnology.Second,weakconsistencydistortsper-formanceeitherbyincreasingapparenthitratesbycounting“hits”tostaledataorbyreducingapparenthitratesbydis-cardinggooddata.Ineithercase,thiswouldaddasourceof“noise”toourresults.
Thesetraceshavetwoprimarylimitationsthataffectourresults.First,althoughweusetraceswiththousandsofclients,itstillrepresentsonlyasmallfractionoftheclientpopulationontheweb.Severalstudies[3,7,13]suggestthathitrateswillimproveasmoreclientsareincludedinacachesystem.Thesecondlimitationofthesetracesisthattheyaregatheredatproxiesratherthanatclients.Thus,allofthesetracesdisplaylesslocalityandlowertotalhitratesthanwouldbeseenbyclients.
2.1.2.Sourcesofcachemisses
Figure1showsthebreakdownofcachemissratesandbytemissratesforaglobalcachesharedbyallclientsinthesys-temascachesizeisvaried.ThecacheusesLRUreplace-ment.Missesfallintofourcategories:
1.Compulsorymisses.Thesemissescorrespondtothefirstaccesstoanobject.Thetwokeystrategiesforreducingcompulsorymissesareincreasingthenum-berofclientssharingacachesystemandprefetching.Facilitatingsharingisanimportantfactorindesigninglargescalecaches,andwediscusssharingindetailinthenextsubsection.Wedonotaddressprefetchinginthispaper.2.Capacitymisses.Thesemissesoccurwhenthesystemreferencesanobjectthatithaspreviouslydiscardedfromthecachetomakespaceforanotherobject.Thedatasuggestthatforsharedcaches,capacitymissesarearelativelyminorproblemthatcanbeadequatelyaddressedbybuildingcachenodeswithareasonableamountofdiskspace.3.Communication/consistency.Thesemissesoccurwhenacacheholdsastalecopyofdatathathasbeenmodifiedsinceitwasreadintothecache.Providingef-ficientcacheconsistencyinlargesystemsisacurrentresearchtopic[36],andwedonotfocusonthatprob-lemhere.Wedonote,however,thatthedatalocationabstractionweconstructcouldalsobeabuildingblockforascalableconsistencysystem.4.Uncachable/error.Objectsaremarked“uncachable”orencountererrorsforanumberofreasons,someofwhichmightbeaddressedbymoresophisticated
3
cacheprotocolsthatsupportbettercacheconsis-tency,cachingdynamicallygeneratedresults[31],dynamicallyreplicatingservers[33],negativeresultcaching[5],andcachingprogramsalongwithdata[4,32].Wedonotaddresssuchprotocolextensionshere.Also,becauseweareinterestedinstudyingtheeffec-tivenessofcachingstrategies,fortheremainderofthisstudy,wedonotinclude“Uncachable”or“Error”re-questsinourresults.
Forallofthetraces,evenanidealcachewillsufferasig-nificantnumberofmisses.Thus,onekeydesignprincipleisthatinadditiontohavinggoodhitratesandgoodhittimes,cachesystemsshouldnotslowdownmisses.
2.1.3.Sharing
Figure2illustratestheimportanceofenablingwidespreadsharinginlargecachesystems.Inthisexperiment,wecon-figurethesystemasathree-levelhierarchywith256clientssharingaL1proxy,eightL1proxies(2048clients)shar-ingaL2proxy,andallL2proxiessharinganL3proxy.Asmoreclientsshareacache,thecompulsorymissrateforthatcachefallsbecauseitbecomeslesslikelythatanygivenaccesstoanobjectisthefirstaccesstothatobject.Forex-ample,intheDECtracesgoingfroma256-clientsharedcachetoa16,336-clientsharedcacheimprovesthebytehitratebynearlyafactoroftwo.Priorstudies[3,7,13]havealsoreachedsimilarconclusions.Thischaracteristicoftheworkloadsuggeststhatcachesystemsshouldaccom-modatelargenumbersofclientsandtherebyreducecom-pulsorymissrates.
Forcacheswithdifferentdegreesofsharing,Figure3depictsthevariationintherequestresponsetimeswithin-creaseinthedistancebetweenclientsandthesharedcache.Itillustratesadilemmafacedbythedesignersofsharedproxycaches:althoughitisimportanttoshareacacheamongmanyclients,itisalsoimportantthatthesharedcachebeclosetoclients.Forexample,a256-clientcachewithanaveragehittimeof50mscanoutperforma16,336-clientcachethataverages300msperaccess.
2.2.Analyzingtraditionalcachehierarchies
Hierarchicalcachesattempttoresolvethedilemmaofsharingversuslocalityandscalabilitybyputtingdifferentcachesneardifferentgroupsofclients,andsendingmissesfromtheseL1cachesthroughadditionallayersofcacheex-ploitsharing.Thissubsectionexamineshowsucharrange-mentsworkinlarge-scalesystems.
TraditionalhierarchicalcachearchitecturessuchasHar-vest[5]orSquid[34]defineparent-childrelationshipsamongcaches.Eachcacheinthehierarchyissharedbyagroupofclientsoragroupofchildrencaches.Dataac-cessproceedsasfollows:ifthelowest-levelcachecontainsthedatarequestedbyaclient,itsendsthedatatotheclient.
10.8TotalCompulsoryCapacityCommunicationErrorUncachable10.8TotalCompulsoryCapacityCommunicationErrorUncachable10.8TotalCompulsoryCapacityCommunicationErrorUncachableMiss Ratio (Berkeley)0.60.40.200.60.40.20Miss Ratio (Prodigy)35Miss Ratio (DEC)0.60.40.200510152025303505101520253005101520253035Size of Cache (GB)10.80.60.40.20TotalCompulsoryCapacityCommunication10.80.60.40.20(a)Per-readmissrate
1TotalCompulsoryCapacityCommunication0.80.60.40.20Size of Cache (GB)Size of Cache (GB)Byte Miss Ratio (Berkeley)0510152025303505101520253035Byte Miss Ratio (Prodigy)Byte Miss Ratio (DEC)TotalCompulsoryCapacityCommunication05101520253035Size of Cache (GB)(b)Per-bytemissrate.
Size of Cache (GB)Size of Cache (GB)Figure1.Requestmissratesandbytemissratesforthetracesexaminedinthisstudy.Foreachgraph,wecategorizemissesascapacityforaccessestodatathathavebeendiscardedfromthecachetomakespaceforotherdata,communicationforaccessestodatathathavebeeninvalidatedfromthecacheduetoanupdate,errorforrequeststhatgenerateanerrorreply,uncachableforrequeststhatspecifythatthecachemustcontacttheservertoretrievetheresult,fornon-GETrequests,orforrequeststhataredesignatedasCGIrequestsinthetraces,andcompulsoryforthefirstaccesstoanobject.
Otherwise,thecacheaskseachofitsneighborsforthedata.Ifnoneoftheneighborspossessthedata,thenthecachesendsarequesttoitsparent.Thisprocessrecursivelycon-tinuesupthehierarchyuntilthedataislocatedortherootcachefetchesthedatafromtheserver.Thecachesthensendthedatadownthehierarchyandeachcachealongthepathstoresthedata.
AlthoughhierarchiessuchasHarvestandSquidwerede-signedundertheassumptionthatcachescouldbelayeredwithoutaddingmuchdelay[5],wehypothesizethattwoas-pectsofthisarchitectureasappliedtoInternetcachescansignificantlylimitperformance.First,thecostofaccessingaseriesofcachesinthehierarchyaddssignificant“store-and-forward”delays2tohigher-levelcachehitsandtocachemisses[22].Second,whenhigh-levelcachesservicealargenumberofclientsdistributedoveralargeregion,thenet-workdelaysbetweenaclientandahigh-levelcachemaybelarge,whichreducesthebenefitofhitstothesecaches.Toexaminetheseeffects,weusetwosourcesofinfor-mation.First,tounderstandthedetailsofperformanceinacontrolledenvironment,weconstructatesthierarchyandexamineitunderasyntheticworkload.Second,tounder-standhowsuchsystemsperforminrealhierarchiesandun-derrealworkloads,weexamineRousskov’smeasurementsofseveralSquidcachesdeployedatdifferentlevelsofahi-erarchy[26].AlthoughSquidsupportstheInternetCache
1.00.8HRHierarchy Level 3Hierarchy Level 2Hierarchy Level 1Hit Ratio0.60.40.20.0DECrate(BHR)withininfiniteL1cachessharedby256clients,infiniteL2cachessharedby2048,andinfiniteL3cachessharedbyallclientsinthetrace.Assharingincreases,sodoestheachievablehitrate.
Figure2.Overallper-readhitrate(HR)andper-bytehit
1000 1 Client256 Clients2048 Clients16336 Clients800Min Response Time (ms)60040020000100200300400500600700Distance from Single Shared Cache (msec.)Figure3.Responsetimeforcacheswithdifferentde-greesofsharingasthedistancebetweenclientsandcachesincrease
LocationUCBerkeleyUCBerkeleyUCSanDiegoUTAustin
CornellUniversity
100000CLN--L1CLN--L1--L2CLN--L1--L2--L3CLN--L1--L2--L3--SRV10000100000CLN--L1CLN--L2CLN--L3CLN--SRV10000100000CLN--L1CLN--L1--L2CLN--L1--L3CLN--L1--SRV10000Response Time (ms)Response Time (ms)10001000Response Time (ms)2481632641282565121024100010010010010248163264128256512102410102481632641282565121024Size of Data (KB)(a)
Size of Data (KB)(b)
Size of Data (KB)(c)
(b)objectsfetcheddirectlyfromeachcacheandserver;and(c)allrequestsgothroughtheL1proxyandthendirectlytotheproxyorserver.
ClientConnectminmax
Leaf
50
Root
Miss
550100
min:550,
72Disk
135
70
650
max:3200
1050
271981
2767
320
7217
2850
641
3417
ProxyReplyminmax
Figure4.Measuredaccesstimesinthetestbedforobjectsofvarioussizes:(a)objectsaccessedthroughathree-levelhierarchy;
minmax
163
352
min271
max2767
Table3.SummaryofSquidcachehierarchyperformancebasedonRousskov’smeasurements.Allthemeasurementsareinms.
3.Designoverview
Intheprevioussection,wederivedthreebasicdesignprin-ciplesforlarge-scalecaches:(1)minimizethenumberof
hopstolocateandaccessdataonbothhitsandmisses,(2)sharedataamongmanyusersandscaletomanycaches,and(3)cachedataclosetoclients.Inthissection,wedescribethestrategieswepursuetoaddressthesedesignprinciplesandprovideahigh-leveldescriptionofourspecificimple-mentationofthesestrategies.Thenextsectionwillprovideamoredetailed,quantitativeanalysisofthesystemandourdesigndecisions.
Oursystemisbuiltaroundascalabledata-locationser-vicecalledthehinthierarchywhichallowseachnodetoknowthenearestlocationofeachobject.Thehinthierar-chyprovidestheinformationneededbytwobasicstrategiesforprovidinggoodperformance.
Thefirstbasicstrategyisdirectaccessofremotelycacheddata.Undertheidealdirectaccessstrategy,aclientlocatesthenearestcopyofdatausinganoracleandsendsitsrequestdirectlytothenearestsharedcachethatcontainsthedesireddataordirectlytotheserverifnocachecon-tainsthedata.Cachesandserversthatreceivesuchrequestssendthedatadirectlybacktotheclient.Asweillustratedintheprevioussection,traditionalsingle-levelcachescanlimitsharingwhilemulti-levelcachessufferhighper-hopcostswhentheyforcehitsandmissestotraversemultiplelayersofahierarchy.Thisidealdirectaccessstrategy,ontheotherhand,minimizesthenumberofhopstolocateandaccessdata(principle1)whileallowingwidespreadsharing(principle2).
Thehinthierarchyallowsustoapproximatethisdirectaccessstrategyusinghintcaches.Clustersofnearbyclients
sendtheirrequeststoasharedL1proxycache.TheproxyconsultsalocalhintcachethatcontainsamappingfromtheIDoftheobjectbeingrequestedtotheIDofthenearestproxycachethatcontainstheobject.Thefirstproxysendstherequestdirectlytothesecond,whichsendsthedatadi-rectlyback.Theclient’sproxythensendsthedatatotheclientthatissuedtherequest.3
Oursecondbasicstrategyispush.Asthedatainthepre-vioussectionindicate,evenwithdirectaccess,thecostofaccessingdatafromanearbycacheismuchlowerthanthecostofaccessingdatafromadistantcache.Forexample,inthetestbedhierarchyL1cacheaccessesfor8KBobjectsare4.7timesfasterthandirectaccessestocachesthatareasfarawayasL2cachesand6.2timesfasterthandirectaccesstocachesthatareasfarawayasL3caches.Thegoalofpushistoimprovehittimebypushingobjectsnearcachesthatarelikelytoreferencetheminthefuture.Theidealpushal-gorithmpostulatesthatwheneveranobjectisbroughtintothesystem,itismagicallypushedtoallL1cachesinthesystemthatwillreferencethedatainthefuture;thusallL2andL3hitsbecomeL1hits.
Weexamineseveralsimple,practicalapproximationstothisidealthatareenabledbytheinformationinthehinthi-erarchy.Push-on-updateisbasedontheobservationthatwhenanobjectismodified,theproxiescachingtheoldver-sionoftheobjectareagoodlistofcandidatestoreferencethenewversionoftheobject.Thus,whenacachefetchesanobjectduetoacacheconsistencymiss,itpushestheob-jecttocachesstoringoldversionsofit.Thepush-shared
algorithmisbasedontheintuitionthatiftwosubtreesofanodeinthemetadatahierarchyaccessanitem,itislikelythatmanysubtreesinthehierarchywillaccesstheitem.Thus,whenoneproxyfetchesdatafromanother,italsopushesthedatatoasubsetofproxiesthatshareacommonmetadataancestor.Bothofthesealgorithmsaresimple,andtheyachieveasignificantfractionofthetotalperformanceachievedbytheidealstrategy.However,thereisroomformoresophisticatedalgorithmstoachievefurthergains.
4.Detaileddesignandevaluation
Thissectiondetailsourdesignandimplementation.Foreachaspectofthedesign,weexamineoverallperformanceandmajordesigndecisions,andwediscussareasofpoten-tialconcern.Whenappropriate,weusesimulationresultstoexamineourdecisions.
Scalabilityandperformanceofthehinthierarchycomesfromfoursources.First,weusesimple,compactdatastruc-turestoalloweachnode’sviewofthehinthierarchytotrackthelocationofmanyobjects.Second,thelocationsystemsatisfiesallon-linerequestslocallyusingthehintcache;thesystemonlysendsnetworkmessagesthroughthehierarchytopropagateinformationinthebackground—offthecriticalpathforend-userrequests.Third,thehierarchyprunesup-datesbypropagatingthemonlytotheaffectednodestosup-portaglobalindexwhileusingminimalbandwidth.Fourth,weadaptPlaxton’salgorithm[24]tobuildascalable,faulttoleranthierarchyfordistributinginformation.
Wehaveimplementedaprototypeofoursystemthatimplementsahinthierarchy,hintcaching,andpushcaching[16].ItisbasedonSquid1.1.20[34]andnamedCuttlefish.
4.1.Hintcachesanddirectaccess
Localhintcachesallowcachestoapproximatethedirectaccessstrategy.AhintisanobjectId,nodeIdpairwherenodeIdidentifiestheclosestcachethathasacopyofobjec-tId.Inordertofacilitatetheprinciplesofminimizinghopsandallowingwidespreadsharing,thehintcachedesignal-lowscachestostorelargenumbersofhintsandtoaccessthemquickly.
Animportantimplementationdetailinourprototypeisourdecisiontousesmall,fixed-sizedrecordsandtostorehintsinasimplearraymanagedasak-wayassociativecache.Inparticular,ourdesignstoresanode’shintcacheinamemorymappedfileconsistingofanarrayofsmall,fixed-sizedentries.Eachentryconsumes16bytes:an8-bytehashofaURLandan8-bytemachineidentifier.
WestoreaURLhashratherthanacompleteURLtosavespaceinthetableandtoensurethatentriesareoffixedsize.Smallrecordsimproveperformancebyallowinganodetostorehintentriesforalargenumberofobjects;thisfacil-itatestheprincipleofmaximizingsharing.Smallrecords
alsoreducethenetworkcostofpropagatinghints,andtheyallowalargerfractionofthehinttabletobecachedinagivenamountofphysicalmemorytoavoiddiskaccesses.IftwodistinctURLshashtothesamevalue,thehintforoneofthemwillbeincorrectandanodemaywastetimeaskinganothercachefordataitmaynothave.Inthatcase,becausethereadrequestcontainsthefullURL,theremotenodereturnsanerrorandthefirstnodetreatstherequestasamiss.With64-bithashkeysbasedonMD5signaturesofURLs,wedonotanticipatethathashcollisionswillhurtperformance.Infact,somehintcacheimplementorsmayconsiderreducingthekeysto32bitstoincreasethereachofthecacheforagivenhintcachesize.
Fixed-sizedrecordssimplifyandspeedupaccesswhenthehintcachedoesnotfitinphysicalmemory.Thesystemcurrentlystoreshintsinanarraythatitmanagesasa4-wayassociativecacheindexedbytheURLhash,anditmapsthisarraytoafile.Thus,ifaneededhintisnotalreadycachedinmemory,thesystemcanlocateandreaditwithasinglediskaccess.4
Westoreentriesina4-wayassociativecacheratherthan,forinstance,maintainingafully-associativeLRUlistofen-triestoreducethecostofmaintainingthehintcachewhenitdoesnotfitinmemory.Weincludeamodestamountofas-sociativitytoguardagainstthecasewhenseveralhotURLslandinthesamehashbucket.
Ourimplementationofhintcachesapproximatestheidealofdirectaccess.However,threeaspectsofthedesignmaylimititsperformance.First,tosimplifydeployment,wefocusonaproxy-basedimplementationofhintcachesratherthanaconfigurationwhereallclientsmaintainlocalhintcaches.Second,thehintcachedatastructureissmallerandslowerthantheidealoracle.Third,hintcachesmaycontainanoutofdatepictureofwhereobjectsarecached.4.1.1.Overallperformanceandconfiguration
Wewillusethefollowingsimulationconfigurationthrough-outthepaper.AsinFigure2,weassumethatwhenthesystemisconfiguredasa3-levelhierarchy,clustersof256nearbyclientsshareanL1cache,groupsof2048clientsshareanL2cache,andallclientsinthetraceshareanL3cache.WeparameterizethedistancebetweenclientsinthesecategoriesusingtheTestbedtimesshowninFigures4andtheMinandMaxaccesstimesmeasuredbyRousskov.Ourdirectaccessandhintconfigurationsaresimilar,buttheyusetheTotalDirectandTotalviaL1timesfromthosefigures.Wesimulatedperformanceforbothinfinite-sizedcachesandfinite,5GBcaches.Forthefinitecachecase,wereserved10%ofthespaceforhintcacheswhenappro-priate.Duetospacelimitations,unlessotherwisenotedwe
10.8Hit Ratio 0.60.40.2000.1110100InfSize of Hint Cache (MB)Figure6.Hitrateassumingthatgroupsof256clients
fromtheDECtraceeachaccessaninfiniteproxycacheandthateachproxycachecanaccessotherproxycachesviaahintcacheofthespecifiedsize.Eachentryinthehintcachetakes16bytesandtheyarestoredina4-waysetassociativearraywithtotalsizespecifiedinMBonthex-axis.
displaythegraphforinfinitecachesandomitthegraphsfinite-caches.AsFigure1suggests,wefoundthatcachesizeisasecondaryfactorforperformance,andourfinitecacheresultsarequalitativelythesameasourinfinitecacheresults.
Toquantifytheoverallperformanceofthesystemandexaminetheimpactofassumingaproxy-basedimplemen-tation,Figure5showsthesimulatedperformancefortheDEC,Berkeley,andProdigytracesunderaseriesofalgo-rithms.Thefigureprovidestwobaselinesforperformance.Itincludesatraditional3-levelHierarchyandanunreal-izablebestcase,SharedAll,whichisanL1cachethatissharedbyallclientsinthesystem,butthatisasclosetoallclientsasthesmallerL1cachesusedinthestandardconfig-uration.
First,notethatthisconfigurationisdesignedtoletusex-plorehowdifferentarchitecturesbalancesharing,locality,andscaling.AsthebarsfortheSharedAllL1cacheshow,iftheunderlyingphysicalnetworkandcachemachinesal-lowsharingtobeincreasedwithoutincreasingcacheaccessdelays,thebestconfigurationisalarge,sharedcache.
Directaccesssignificantlyoutperformthetraditional3-levelhierarchieswithspeedupsrangingfrom1.3to2.9.Theadditionalnetworkhopfortheproxy-basedHintCachingconfigurationhurtsperformancemodestlycomparedtotheIdealForwardingsystem.Therealisticimplementationachievesspeedupsof1.3to2.3.Forthissetofnetworktopologiesandworkloads,anotherreasonablealternativeisforeachgroupof256clientstoshareaL1cachesbutnottosharecachesfurther.Suchanapproachfallsshortoftheidealdirectaccessprotocolbyasmuchasafactorof1.53andfallsshortofthehintcacheprotocolbyasmuchasafactorof1.3.
4.1.2.Hintcachesize
Oneobviousworryassociatedwithhintsisthestoragere-quirements.Howmuchstorageisrequiredtotrackthenear-estcopyallobjectsinalargecachesystem?Thisnumber
8
turnsouttobesurprisinglyreasonable.
Anode’shintcachewillonlybeeffectiveifitcanin-dexsignificantlymoredataobjectsthanthenodecanstorelocally,anditwillobservethedesignprincipleofmaximiz-ingsharingifitcanindexmostorallofthedatastoredbythecachesinthecachesystem.AswedescribeindetailinSection4.1,oursystemcompresseshintsto16-byte,fixed-sizedrecords.Atthissize,eachhintisalmostthreeordersofmagnitudesmallerthananaverage10KBdataobjectstoredinacache[2].Thus,ifacachededicates10%ofitscapacityforhints,itshintcachewillindexabouttwoordersofmagnitudemoredatathanitcanstorelocally.Eveniftherewerenooverlapofwhatdifferentcachesstore,suchadirectorywouldallowanodetodirectlyaccessthecontentofabout63nearbycaches.But,becausethehintcacheneedonlystoreoneentrywhenanobjectisstoredinmultiplere-motecaches,coverageshouldbemuchbroaderinpractice.Anotherwayofviewingcapacityistoconsiderthereachofa500MBindex(10%ofamodest,5GBproxycache).Suchanindexcouldtrackthelocationofover30millionuniqueobjectsstoredinacachesystem.FinallyconsiderthatinOctober1998a6GBdiskcostsunder$160,sug-gestingthatflatindicesofhundredsofmillionstobillionsofobjectsarefeasible.
Figure6showshowthesizeofthehintcacheaffectsoverallperformanceintheDECtrace.Verysmallhintcachesprovidelittleimprovementbecausetheyindexlit-tlemorethanwhatisstoredlocally.Forthisworkload,hintcachessmallerthan10MBprovidelittleadditional“reach”beyondwhatisalreadycachedlocally,buta100MBhintcachecantrackalmostalldatainthesystem.
4.1.3.Delayedhintpropagation
Althoughthesystemusesdirectrequestsanddirectdatatransfersforclientrequests,itpropagateshintsthroughahierarchy.WedeferdetaileddiscussionofhintpropagationuntilafterourdescriptionofhowthesystemconfiguresthehierarchyinSection4.3.Here,weexaminethegeneralis-sueofdelayedhintpropagation.Whereastheoracleintheidealizeddirectaccessprotocolalwaysknowswheretofindthenearestcopyofdata,actualhintupdatestaketimetopropagatethroughthesystem,socachesmaymakesubop-timaldecisionsaboutwheretosendtheirrequests.
Figure7quantifiesthedependenceofglobalhitrateontheamountoftimeittakestoupdatehintsinthesystem.Inthisexperiment,weassumethatwheneveranobjectisdroppedfromacachetomakespaceforanotherobjectoranobjectisaddedtoacache,noneofthehintcacheslearnofthechangeforanamountoftimespecifiedonthex-axisinthefigure.Thisexperimentsuggeststhattheperformanceofhintcacheswillbegoodaslongasupdatescanbeprop-agatedthroughthesystemwithinafewminutes.AsSec-tion4.3willdetail,tofacilitatefastpropagation,oursystem
3000DEC3000Berkeley4000ProdigyMean Response Time (ms)Mean Response Time (ms)Mean Response Time (ms)2000HierarchyHint CachingIdeal ForwardingShared All2000HierarchyHint CachingIdeal ForwardingShared All3000200010000HierarchyHint CachingIdeal ForwardingShared All100010000Max0MaxMaxFigure5.SimulatedperformanceforDEC,Berkeley,andProdigytraces.Thethreegroupsofbarsforeachtraceshowtheperfor-mancewhentheaccesstimesareparameterizedbytheTestbedtimesshowninFigures4ortheMinandMaxaccesstimesmeasuredbyRousskovshownintheTotalHierarchicalandTotalviaL1columnsofTable3.Withineachgroup,weshowperformancefora3-levelhierarchy,idealizedforwarding,hintcaches,andanidealizedcachesharedbyallclientsinthesystemandasclosetoeachastheirnormalL1cache.
10.8Hit Ratio 0.60.40.2001101001000Hint Propagation Delay (minutes)Figure7.Hitrateassumingthatgroupsof256clients
fromtheDECtraceeachaccessaninfiniteproxycacheandthateachproxycachecanaccessotherproxycachesviaahintcachethatisupdatedthespecifiednumberofminutesafterobjectsappearordisappearfromcaches.
usesametadatahierarchythatpreserveslocalityforhintup-dates,itusesascalablehierarchytoavoidbottlenecks,anditusessmallhintrecordstoreducethenetworkoverheadsofpropagation.
4.2.Push
Theidealpushalgorithmusesfutureknowledgetoredis-tributedata,thustransformingallL2andL3hitsintoL1hits.5Actualpushalgorithmsmustapproximatethisbysendingdatatheypredictwillbereferencednearcachestheypredictwillreferenceit.6
4.2.1.Algorithms
Becauseofthelargenumbersofobjectsthatpassthroughthesystem,wefocusonsimplealgorithmsthatdonotex-plicitlytrackthereferencefrequencyofindividualobjects.
3000250Mean Response Time (ms)BW Consumed (KB/sec)2000HierarchyDirectory HintsUpdate PushPush-1Push-halfPush-allPush-ideal200150100500PushedDemand Fetch10000MaxFigure9.SimulatedresponsetimeforDECtracework-
DemandUpdatesloadforsixalgorithms:nopush(datahierarchy),nopush(hinthierarchy),updatepush,push-1(sendacopyto1nodeineacheligiblesubtree),push-half(sendacopytohalfofthenodesineacheligiblesubtree),andpush-all(sendacopytoallnodesinaneligiblesubtree).
Figure10.Bandwidthofpushalgorithms(DECtrace.)Figure10showsthebandwidthconsumedbythealgo-rithms.Thepush-sharedalgorithmsincreasethebandwidthconsumedbyuptoafactoroffourcomparedtothedemand-onlycase.Thismaybeanacceptabletrade-offofbandwidthforlatencyinmanyenvironments.
popularity,itresultsinthedesiredeffectthatpopularitemsaremorewidelyreplicatedthanunpopularones.Alsonotethatifthereislocalitywithinsubtrees,itemspopularinonesubtreebutnotanotherwillbemorewidelyreplicatedinthesubtreewheretheyarepopular.
4.2.2.Evaluation
Actualpushalgorithmsmayfallshortofidealonesintwoways.First,theymayfailtopushtherightdataneartherightcaches,solatencywillsuffer.Second,theymaypushdataunnecessarily,therebyconsumingexcessbandwidth.Figures9and10comparesimulationsoftheidealpushal-gorithmandbasecasealgorithmswithoutpushtopush-on-updateandpush-shared.
Althoughwedonotexpectcapacitymissestobeama-jorfactorinsystemsthatonlyreplicatedataon-demand,inpush-basedsystemsspeculativereplicationmaydisplacemorevaluabledatafromcaches.Tomonitorthiseffectinoursimulations,wereportresultsforthespace-constrainedconfigurationinwhicheachofthe64L1cacheshas5GBofcapacity.Duetotimeconstraintsandmemorylimitationsofoursimulationmachines,theseresultsuseonlythefirstsevendaysoftheDECandBerkeleytraces.
Figure9showsthesimulatedresponsetimefortheDECtraceunderarangeofpushoptions.Thisexperimentsug-geststhatanidealpushalgorithmcouldachievespeedupsof1.54to2.63comparedtotheno-pushdatahierarchy,andspeedupsof1.21to1.62comparedtotheno-pushhinthier-archy;thelargestspeedupscomewhenthecostofaccessingremotedataishighsuchastheMaxvalueinRousskov’smeasurements(seeTable3).Thepush-sharedalgorithmsdescribedhereachievesspeedupsof1.42to2.03comparedtotheno-pushdatahierarchy,andspeedupsof1.12to1.25comparedtotheno-pushhinthierarchy.Althoughalargefractionofobjectspushedbypush-updatesareusedbytheirdestinations,updatesareinfrequentsoitsoverallperfor-mancegainsaresmall.
10
4.3.Hinthierarchy
Tomaximizesharing,ourgoalistoallowthesystemtoscaletolargenumbersofcaches.Thediscussionabovede-scribedhowthehinthierarchyusessimple,compactdatastructurestotrackthelocationmanyobjectsandhowhintcachessatisfyon-linerequestslocallytoavoidthenetworkdelaysencounteredwhenlocatingdatainatraditionalhier-archyorusingICP[35].Thissectionexaminestworemain-ingaspectsofthehierarchy’sdesign.
First,althoughthehierachycanbevisualizedasatree,inrealityweuseamorescalabledatastructurethatdynami-callyconfiguressubtreestoreflectnetworklocalityandthatmapseachnodeofasubtreeacrossthesubtree’sleavestoprovidescalabilityandfaulttoleranceusinganalgorithmdevelopedbyPlaxtonet.al[24].Duetospacelimitations,weomitdetaileddescriptionofthismapping.Detailsofourimplementationmaybefoundintheextendedversionofthispaper[29].
Second,toreducetheamountofinformationsentglob-ally,thehierarchyprunesupdatessothatupdatesareprop-agatedonlytotheaffectednodes.Wheneveracacheloadsanewobject(ordiscardsapreviouslycachedone),itin-formsitsparent.Theparentpropagatesthatinformationtoitschildrenoritsparentorbothusingalimitedfloodingal-gorithminwhichnodesonlypropagatechangesrelatingtothenearestcopiesofdata.Forexample,ifanodealreadyknowsthatoneofitschildrenhasacopyofanobject,thenonreceivingahintfromitsparentaboutanewcopyoftheobject,thenodedoesnotpropagatethehinttoitschildren.Thisisbecause,assumingthatthehierarchy’stopologyre-flectsnetworklocality,propagatingthehintwillnotaltertheinformationregardingthenearestcopyoftheobjectwithrespecttoanyofitschildrennodes.Table4examineshoweffectivethemetadatahierarchyisatfilteringtraffic.
Onepotentialdangeristhatwhenanobjectisfirst
Organization
Centralizeddirectory
2.1
Peak14.4
6.2
90%7.4
References
[1]A.RousskovandD.Wessels.CacheDigests.InProceed-ingsofthe3rdInternationalWWWCachingWorkshop,June1998.[2]M.ArlittandC.Williamson.WebServerWorkload
Characterization:TheSearchforInvariants.InPro-ceedingsoftheSIGMETRICSConferenceonMeasure-mentandModelingofComputerSystems,May1996.http://www.cs.usask.ca/projects/discus/discus
[35]D.WesselsandK.Claffy.ApplicationofInter-netCacheProtocol(ICP),version2.Requestfor
CommentsRFC-2186,NetworkWorkingGroup,1997.http://ds.internic.net/rfc/rfc2187.txt.
[36]J.Yin,L.Alvisi,M.Dahlin,andC.Lin.VolumeLeasesto
SupportConsistencyinLarge-ScaleSystems.IEEETransac-tionsonKnowledgeandDataEngineering,February1999.[37]L.Zhang,S.Floyd,andV.Jacobson.AdaptiveWebCaching.
InProceedingsoftheNLANRWebCacheWorkshop,June1997.http://ircache.nlanr.net/Cache/Workshop97/.
13
因篇幅问题不能全部显示,请点此查看更多更全内容