您的当前位置：首页 LETTER Communicated by Geoffrey Hinton Equivalence of Backpropagation and Contrastive Hebbi

LETTER Communicated by Geoffrey Hinton Equivalence of Backpropagation and Contrastive Hebbi

来源：小侦探旅游网

LETTERCommunicatedbyGeoffreyHinton

EquivalenceofBackpropagationandContrastiveHebbianLearninginaLayeredNetwork

XiaohuiXiexhx@ai.mit.edu

DepartmentofBrainandCognitiveSciences,MassachusettsInstituteofTechnology,Cambridge,MA02139,U.S.A.

H.SebastianSeungseung@mit.edu

HowardHughesMedicalInstituteandDepartmentofBrainandCognitiveSciences,MassachusettsInstituteofTechnology,Cambridge,MA02139,U.S.A.

BackpropagationandcontrastiveHebbianlearningaretwomethodsoftrainingnetworkswithhiddenneurons.Backpropagationcomputesanerrorsignalfortheoutputneuronsandspreadsitoverthehiddenneurons.ContrastiveHebbianlearninginvolvesclampingtheoutputneuronsatdesiredvaluesandlettingtheeffectspreadthroughfeedbackconnectionsovertheentirenetwork.Toinvestigatetherelationshipbetweenthesetwoformsoflearning,weconsideraspecialcaseinwhichtheyareidentical:amultilayerperceptronwithlinearoutputunits,towhichweakfeed-backconnectionshavebeenadded.Inthiscase,thechangeinnetworkstatecausedbyclampingtheoutputneuronsturnsouttobethesameastheerrorsignalspreadbybackpropagation,exceptforascalarprefactor.ThissuggeststhatthefunctionalityofbackpropagationcanberealizedalternativelybyaHebbian-typelearningalgorithm,whichissuitableforimplementationinbiologicalnetworks.1Introduction

BackpropagationandcontrastiveHebbianlearning(CHL)aretwosuper-visedlearningalgorithmsfortrainingnetworkswithhiddenneurons.Theyareofinterest,becausetheyaregenerallyapplicabletowideclassesofnet-workarchitectures.Inbackpropagation(Rumelhart,Hinton,&Williams,1986b,1986a),anerrorsignalfortheoutputneuronsiscomputedandprop-agatedbackintothehiddenneuronsthroughaseparateteachernetwork.Synapticweightsareupdatedbasedontheproductbetweentheerrorsig-nalandnetworkactivities.CHLupdatesthesynapticweightsbasedonthesteadystatesofneuronsintwodifferentphases:onewiththeoutputneu-ronsclampedtothedesiredvaluesandtheotherwiththeoutputneuronsfree(Movellan,1990;Baldi&Pineda,1991).Clampingtheoutputneurons

NeuralComputation15,441–454(2003)

°c2002MassachusettsInstituteofTechnology

442X.XieandH.Seung

causesthehiddenneuronstochangetheiractivities,andthischangecon-stitutesthebasisfortheCHLupdaterule.

CHLwasoriginallyformulatedfortheBoltzmannmachine(Ackley,Hinton,&Sejnowski,1985)andwasextendedlatertodeterministicnet-works(Peterson&Anderson,1987;Hinton,1989),inwhichcaseitcanbeinterpretedasamean-󰁅eldapproximationoftheBoltzmannmachinelearn-ingalgorithm.However,thisinterpretationisnotnecessary,andCHLcanbeformulatedpurelyfordeterministicnetworks(Movellan,1990;Baldi&Pineda,1991).Comparedtobackpropagation,CHLappearstobequitedif-ferent.Backpropagationistypicallyimplementedinfeedforwardnetworks,whereasCHLisimplementedinnetworkswithfeedback.Backpropagationisanalgorithmdrivenbyerror,whereasCHLisaHebbian-typealgorithm,withupdaterulesbasedonthecorrelationofpre-andpostsynapticactivi-ties.CHLhasbeenshowntobeequivalenttobackpropagationfornetworkswithonlyasinglelayer(Movellan,1990),andtherehasbeensomeotherworktorelateCHLtothegeneralframeworkofbackpropagation(Hop-󰁅eld,1987;O’Reilly,1996;Hinton&McClelland,1988).However,adirectlinkbetweenthemfornetworkswithhiddenneuronshasbeenlacking.Toinvestigatetherelationshipbetweenthesetwoalgorithms,wecon-sideraspecialnetworkforwhichCHLandbackpropagationareequivalent.Thisisamultilayerperceptrontowhichweakfeedbackconnectionshavebeenaddedandwithoutputneuronsthatarelinear.TheequivalenceholdsbecauseinCHL,clampingtheoutputneuronsattheirdesiredvaluescausesthehiddenneuronstochangetheiractivities,andthischangeturnsouttobeequaltotheerrorsignalspreadbybackpropagation,exceptforascalarfactor.

2TheLearningAlgorithms

Inthissection,wedescribethebackpropagationandCHLalgorithms.Back-propagationisinthestandardform,implementedinamultilayerpercep-tron(Rumelhartetal.,1986b).CHLisformulatedinalayerednetworkwithfeedbackconnectionsbetweenneighboringlayersofneurons.Itisanex-tensionofthetypicalCHLalgorithmformulatedforrecurrentsymmetricnetworks(Movellan,1990).

2.1Backpropagation.ConsideramultilayerperceptronwithLC1lay-

ersofneuronsandLlayersofsynapticweights(seeFigure1A).Theactivitiesofthekthlayerofneuronsaredenotedbythevectorxk,theirbiasesbythevectorbk,andthesynapticconnectionsfromlayerk¡1tolayerkbythematrixWk.Allneuronsinthekthlayerareassumedtohavethesametrans-ferfunctionfk,butthistransferfunctionmayvaryfromlayertolayer.Inparticular,wewillbeinterestedinthecasewherefLislinear,thoughtheothertransferfunctionsmaybenonlinear.Inthebasicde󰁅nition,fkactsonascalarandreturnsascalar.However,wewillgenerallyusefktoactona

BackpropagationandContrastiveHebbianLearning443

Figure1:Diagramonthenetworkstructuresofthe(A)multilayerperceptronandthe(B)layerednetworkwithfeedbackconnections.Layer0istheinput,layerListheoutput,andtheothersarehiddenlayers.Theforwardconnectionsarethesameforbothnetworks.InB,thereexistfeedbackconnectionsbetweenneighboringlayersofneurons.

vector,inwhichcaseitreturnsavector,operatingcomponentbycomponent

byfk.fkisthederivativeoffkwithrespecttoitsargument.Similartofk,

whenfkactsonavector,itreturnsavectoraswell,operatedbyfkineachcomponent.Weassumethatfkismonotonicallyincreasing.

Backpropagationlearningisimplementedbyrepeatingthefollowingstepsforeachexampleinatrainingsetofinput-outputpairs:1.Intheforwardpass,

xkDfk(Wkxk¡1Cbk)

(2.1)

isevaluatedforkD1toL,therebymappingtheinputx0totheout-putxL.

444X.XieandH.Seung

2.Thedesiredoutputdofthenetwork,providedbysometeacher,iscomparedwiththeactualoutputxLtocomputeanerrorsignal,

yLDDL(d¡xL).

(2.2)

3.Theerrorsignalispropagatedbackwardfromtheoutputlayerbyevaluating

yk¡1DDk¡1WTkyk

forkDLto2.4.Theweightupdate

(2.3)

0(ThematrixDk´diagffkWkxk¡1Cbk)gisde󰁅nedbyplacingthecom-0

ponentsofthevectorfk(Wkxk¡1Cbk)inthediagonalentriesofamatrix.

DWk

DgykxTk¡1

(2.4)

ismadeforkD1toL,whereg>0isaparametercontrollingthe

learningrate.

2.2ContrastiveHebbianLearning.ToformulateCHL,weconsidera

modi󰁅ednetworkinwhich,inadditiontothefeedforwardconnectionsfromlayerk¡1tolayerk,therearealsofeedbackconnectionsbetweenneighboringlayers(seeFigure1B).Thefeedbackconnectionsareassumedtobesymmetricwiththefeedforwardconnections,exceptthattheyaremultipliedbyapositivefactorc.Inotherwords,thematrixcWTcontainskthefeedbackconnectionsfromlayerkbacktolayerk¡1.

CHLisimplementedbyrepeatingthefollowingstepsforeachexampleofthetrainingset:

1.Theinputlayerx0isheld󰁅xed,andthedynamicalequations

dxk

)CxkDfk(Wkxk¡1CcWTkC1xkC1Cbk

(2.5)

forkD1toLarerununtilconvergencetoa󰁅xedpoint.ThecasekDLisde󰁅nedbysettingxLC1D0andWLC1D0.Convergencetoa󰁅xedpointisguaranteedunderrathergeneralconditions,tobeshownlater.

LkfortheThisiscalledthefreestateofthenetworkandisdenotedbyx

kthlayerneurons.2.Theanti-Hebbianupdate,

DWk

LkxLTD¡gck¡Lxk¡1,

(2.6)

ismadeforkD1,...,L.

BackpropagationandContrastiveHebbianLearning445

3.TheoutputlayerxLisclampedatthedesiredvalued,andthedynam-icalequation2.5forkD1toL¡1isrununtilconvergencetoa󰁅xed

Okforthekthpoint.Thisiscalledtheclampedstateandisdenotedbyx

layerneurons.4.TheHebbianupdate,

DWkDgc

k¡L

OkxOTxk¡1,

(2.7)

ismadeforkD1,...,L.

Alternatively,theweightupdatescouldbecombinedandmadeafter

bothclampedandfreestatesarecomputed,

DWkDgc

k¡L(O

OTLLT)xkxk¡1¡xkxk¡1.

(2.8)

Thisformistheoneusedinouranalysis.

ThisversionofCHLshouldlookfamiliartoanyonewhoknowsthecon-ventionalversion,implementedinsymmetricnetworks.Itwillbederivedinsection4,but󰁅rstweproveitsequivalencetothebackpropagationalgo-rithm.

3EquivalenceintheLimitofWeakFeedback

Next,weprovethatCHLinequation2.8isequivalenttothebackpropa-gationalgorithminequation2.4,providedthatthefeedbackissuf󰁅cientlyweakandtheoutputneuronsarelinear.

Ok,andxLkrepresentthekthlayeractivitiesofthefeed-Innotation,xk,x

forwardnetwork,theclampedstate,andthefreestate,respectively.Weconsiderthecaseofweakfeedbackconnections,c¿1,anduse¼tomeanthattermsofhigherorderinchavebeenneglectedand»todenotetheorder.

Theproofconsistsofthefollowingfoursteps:1.Showthatthedifferencebetweenthefeedforwardandfreestatesisofordercineachcomponent,

Lk´xLk¡xk»c,dx

forallkD1,...,L.

2.Showthatinthelimitofweakfeedback,thedifferencebetweenthe

clampedandfreestatessatis󰁅esthefollowingiterativerelationship,

Ok¡xLkDcDkWTCO(cdxk´xkC1dxkC1

LL.forkD1,...,L¡1,anddxLDd¡x

L¡kC1)

(3.1)

,(3.2)

446X.XieandH.Seung

3.Showthatiftheoutputneuronsarelinear,dxkisrelatedtotheerrorsignalinbackpropagationthrough

dxkDc

L¡k

ykCO(c

L¡kC1)

.(3.3)

4.ShowthattheCHLupdatecanbeapproximatedby

DWk

()DgykxTk¡1COc.(3.4)

IntheCHLalgorithm,clampingtheoutputlayercauseschangesinthe

outputneuronstospreadbackwardtothehiddenlayersbecauseofthefeedbackconnections.Hence,thenewclampedstatediffersfromthefreestateovertheentirenetwork,includingthehiddenneurons.Equation3.2statesthatdxkdecaysexponentiallywithdistancefromtheoutputlayerofthenetwork.Thisisbecausethefeedbackisweak,sothatdxkisreducedfromdxkC1byafactorofc.

Remarkably,asindicatedinequation3.3,thedifferencebetweentheclampedandfreestatesisequivalenttotheerrorsignalykcomputedinthebackwardpassofbackpropagation,exceptforafactorofcL¡k,whentheoutputneuronsarelinear.Moreover,thisfactorannihilatesthefactorofck¡LintheCHLruleofequation2.8,resultingintheupdateruleequation3.4.

3.1Proof.Toprovethe󰁅rststep,westartfromthesteady-stateequation

ofthefreephase,

LkDfk(WkxLk¡1CbkCcWTLkC1),xkC1x

(3.5)

forkD1,...,L¡1.Subtractingthisequationfromequation2.1andper-formingTaylorexpansion,wederive

Lk´xLk¡xkdx

(3.6)(3.7)

Lk¡1CbkCcWTLkC1)¡fk(Wkxk¡1Cbk)Dfk(WkxkC1x

2)TLTL(Lk¡1CcDkWkLk¡1CcWk(3.8)DDkWkdxC1xkC1COkWkdxC1xkC1k

LLDDLWLdxLL¡1CO(kWLdxLL¡1k2)fortheoutputforallhiddenlayers,anddx

layer.Inequation3.7,theexpansionisdonearoundWkxk¡1Cbk.Sincethe

L0D0,undertheaboveiterativezerothlayeris󰁅xedwiththeinput,dx

Lk»cineachcomponent,thatis,dxLkisintherelationships,wemusthavedx

orderofc,forallkD1,...,L.

Toproveequation3.2,wecomparethe󰁅xed-pointequationsoftheclampedandfreestates,

Lk)DWkxLk¡1CbkCcWTLf¡1(xkC1xkC1

(3.9)(3.10)

Ok)DWkxOk¡1CbkCcWTOf¡1(xkC1xkC1,

BackpropagationandContrastiveHebbianLearning447

forkD1,...,L¡1,wheref¡1istheinversefunctionoffineachcompo-Lk.Recallthenent.Subtractthem,andperformTaylorexpansionaroundx

Lk¡xOk.Wehavede󰁅nitionofdxk´x

¡1T(xOk)¡f¡1(xLk)Wkdxk¡1CcWkC1dxkC1Df

(3.11)(3.12)

DJkdxkCO(kdxkk2),

Lk)/@xLkg.SincexLk¡xk»c,totheleadingwherethematrixJk´diagf@f¡1(x

orderinc,matrixJkcanbeapproximatedbyJk¼diagf@f¡1(xk)/@xkgDD¡,k

andJkdxkDD¡dxkCO(ckdxkk).Substitutingthisbacktoequation3.12,kweget

2))()(dxkDDk(Wkdxk¡1CcWTkC1dxkC1COckdxkkCOkdxkk.

(3.13)

LetusstartfromkD1.Sincetheinputis󰁅xed(dx0D0),wehavedx1D

TdTdcD1W2x2CO(ckdx1k),andtherefore,dx1DcD1W2x2CO(c2kdx2k).

TdNext,wecheckforkD2,inwhichcasedx2DD2W2dx1CcD2W3x3C

TO(ckdx2k).Substitutingdx1,wehavedx2DcD2W3dx3CO(ckdx2k),and

Tdtherefore,dx2DcD2W3x3CO(c2kdx3k).Followingthisiteratively,we

have

(2)dxkDcDkWTkC1dxkC1COckdxkC1k,

(3.14)

LLDd¡xLCO(c)isoforderforallkD1,...,L¡1.NoticethatdxLDd¡xT21.Hence,dxL¡1DcDL¡1WLdxLCO(c),anditeratively,wehave

CO(cdxkDcDkWTkC1dxkC1

L¡kC1)

,(3.15)

forkD1,...,L¡1.Therefore,equation3.2isproved.Thisequationindicates

thatdxk»cdxkC1anddxk»cL¡kineachcomponent.

Iftheoutputneuronsarelinear,thenyLDdxLCO(c).Consequently,dxkDcL¡kykCO(cL¡kC1)forallkD1,...,L.

Finally,theweightupdateruleofCHLfollows:

DWk

)OkxOTLkxLTDgckL(xk¡1¡xk¡1

k¡LLk¡LLTdxkdxTDgck¡LdxkxxkdxTk¡1Cgck¡1Cgck¡1¡

LT()DgckLdxkxk¡1COc

(3.16)(3.17)(3.18)(3.19)

()DgykxTk¡1COc.

Lk¡1¡xk¡1»c.ThisresultshowsThelastapproximationismadebecausex

thattheCHLalgorithminthelayerednetworkwithlinearoutputneuronsisidenticaltothebackpropagationasc!0.

448Remark.

X.XieandH.Seung

Fornonlinearoutputneurons,dxLofCHLisdifferentfromyL

inequation2.2computedinbackpropagation.However,theyarewithin90degreesifviewedasvectors.Moreover,iftheactivationfunctionfortheoutputneuronsistakentobethesigmoidalfunction,thatis,fL(z)D1/(1Cexp(¡z)),theCHLalgorithmisequivalenttobackpropagationbasedonthecostfunction,

¡dTlog(xL)¡(1¡d)Tlog(1¡xL),

(3.20)

sinceinthiscase,yLDd¡xL.Intheabove,functionlogactsonavectorandreturnsavectoraswell.

4ContrastiveFunction

TheCHLalgorithmstatedinsection2.2canbeshowntoperformgradientdescentonacontrastivefunctionthatisde󰁅nedasthedifferenceofthenetwork’sLyapunovfunctionsbetweenclampedandfreestates(Movellan,1990;Baldi&Pineda,1991).

SupposeE(x)isaLyapunovfunctionofthedynamicsinequation2.5.

O)¡E(xL),wherexOandxLareConstructthecontrastivefunctionC(W)´E(x

steadystatesofthewholenetworkintheclampedandfreephase,respec-tively,andW´fW1,...,WLg.Forsimplicity,letus󰁅rstassumethatE(x)hasauniqueglobalminimumintherangeofxandnolocalminima.Ac-Listheglobalminimumcordingtothede󰁅nitionofLyapunovfunctions,x

O,butundertheextraconstraintsthattheoutputneuronsareofEandsoisx

O)¡E(xL)¸0andachieveszeroifandclampedatd.Therefore,C(W)DE(x

LDxO,thatis,whentheoutputneuronsreachthedesiredvalues.onlyifx

PerforminggradientdescentonC(W)leadstotheCHLalgorithm.Onthe

LandxOmaybeonlyotherhand,ifE(x)doesnothaveauniqueminimum,x

Oislocalminima.However,theabovediscussionstillholds,providedthatx

Lunderthefreephasedynamics.Thisimposesinthebasinofattractionofx

someconstraintsonhowtoresettheinitialstateofthenetworkaftereachphase.Onestrategyistolettheclampedphasesettletothesteadystate󰁅rstandthenrunthefreephasewithoutresettinghiddenneurons.ThiswillguaranteethatC(W)isalwaysnonnegativeandconstitutesapropercostfunction.

Next,weintroduceaLyapunovfunctionforthenetworkdynamicsinequation2.5,

E(x)D

LXkD1

k¡L[1T

Nk(xk)¡xTWkxk¡1¡bTxk],Fkk

(4.1)

Nkisde󰁅nedsothatFN0(x)Df¡1(x).x´fx1,...,xLgrepre-wherefunctionFkk

sentsthestatesofalllayersofthenetwork.Equation4.1isextendedfrom

BackpropagationandContrastiveHebbianLearning449

Lyapunovfunctionspreviouslyintroducedforrecurrentnetworks(Hop-󰁅eld,1984;Cohen&Grossberg,1983).

ForE(x)tobeaLyapunovfunction,itmustbenonincreasingunderthedynamicsequation2.5.Thiscanbeshownby

PDE

´TL󰀂X@E

kD1LXkD1LXkD1

xkc

Pkx

TP¡Wkxk¡1¡cWTkkC1xkC1¡bk]x¡1

(xPkCxk)]T[xk¡(xPkCxk)]¡fk

(4.2)

k¡L[¡1()fkxk

(4.3)

¡c

k¡L[¡1()fkxk

(4.4)(4.5)

·0,

wherethelastinequalityholdsbecausefkismonotonicaswehaveassumed.

Therefore,E(x)isnonincreasingfollowingthedynamicsandstationaryifandonlyifatthe󰁅xedpoints.Furthermore,withappropriatelychosenfk,suchassigmoidfunctions,E(x)isalsoboundedbelow,inwhichcaseE(x)isaLyapunovfunction.

GiventheLyapunovfunction,wecanformthecontrastivefunctionC(W)andderivethegradient-descentalgorithmonCaccordingly.

O)withrespecttoWkisThederivativeofE(x

X@E@xO)OkdE(x@ECD

Ok@WkdWk@Wk@xk

@E@Wk

(4.6)(4.7)(4.8)

OkxOTD¡ck¡Lxk¡1,

OkD0forallkatthesteadywherethesecondequalityholdsbecause@E/@x

states.Similarly,wederive

L)dE(x

D¡cdWk

k¡LL

LTxkxk¡1.

(4.9)

Combiningequations4.8and4.9,we󰁅ndthederivativeofC(W)withre-specttoWkshallread

O)L)dCdE(xdE(x¡DD¡c

dWkdWkdWk

k¡L(

)OkxOTLkxLTxk¡1¡xk¡1.

(4.10)

Withasuitablelearningrate,gradientdescentonC(W)leadstotheCHLalgorithminequation2.8.

450

5EquivalenceofCostFunctions

X.XieandH.Seung

Insection3,weprovedthattheCHLalgorithminthelayerednetworkwithlinearoutputneuronsisequivalenttobackpropagationintheweakfeedbacklimit.Sincebothalgorithmsperformgradientdescentonsomecostfunction,theequivalenceintheupdateruleimpliesthattheircostfunctionsshouldbeequal,uptoamultiplicativeoranadditiveconstantdifference.Next,wedemonstratethisdirectlybycomparingthecostfunctionsofthesetwoalgorithms.

Thebackpropagationlearningalgorithmisgradientdescentonthesquareddifference,kd¡xLk2/2,betweenthedesiredandactualoutputsofthenetwork.

FortheCHLalgorithm,thecostfunctionisthedifferenceofLyapunovfunctionsbetweentheclampedandfreestates,asshownintheprevioussection.Afterreordering,itcanbewrittenas

kD1

k¡L

TNk(xNk(x(Lk¡1Cbk)¡dxT[1T(FOk)¡FLk))¡dxTOk].(5.1)kWkxk¡1Wkx

Recallthatdxk»cL¡k.Therefore,thedxktermabovemultipliedbythe

factorck¡Lisoforder1,whereasthedxk¡1multipliedbythesamefactorisoforderc,andthuscanbeneglectedintheleading-orderapproximation.Afterthis,weget

LXkD1

k¡L[1T(

Nk(xNk(x(Lk¡1Cbk)]CO(c).Ok)¡FLk))¡dxTFkWkx

(5.2)

NL(x)DxTx/2andIftheoutputneuronsarelinear(fL(x)Dx),thenF

LL¡1CbLDxLL.SubstitutingthemintoCandseparatingtermsoftheWLx

outputandhiddenlayers,wederive

1T(xOLxOL¡xLTLL¡dxTLL)LxLx2

L¡1XkD1

k¡L

[¡1(xLk)¡WkxLk¡1¡bk]CO(c)dxTkfk

(5.3)

kd¡xLk2CO(c),2

wherethesecondtermwiththesumvanishesbecauseofthe󰁅xed-pointequations.

Inconclusion,totheleadingorderinc,thecontrastivefunctioninCHLisequaltothesquarederrorcostfunctionofbackpropagation.Thedemon-strationontheequalityofthecostfunctionsprovidesanotherperspectiveontheequivalencebetweenthesetwoformsoflearningalgorithms.

BackpropagationandContrastiveHebbianLearning451

Sofar,wehavealwaysassumedthattheoutputneuronsarelinear.Ifthisisnottrue,howdifferentisthecostfunctionofCHLfromthatofbackprop-agation?Repeatingtheabovederivation,wegetthecostfunctionofCHLfornonlinearoutputneurons,

¡1NL(xNL(xOL)¡1TFLL)¡dxTLL)CO(c)CD1TFLfL(x

(5.4)(5.5)

NL(xL)¡dxTf¡1(xL)C1TFNL(d)CO(c).D¡1TFLL

NL(z)Dxlog(z)Withsigmoidal-typeactivationfunctionforoutputneurons,F

C(1¡z)log(1¡z).Substitutingthisintoequation5.5,we󰁅nd

NL(d)CO(c),CD¡dTlog(xL)¡(1¡d)Tlog(1¡xL)C1TF

(5.6)

whichisthesameasthecostfunctioninequation3.20inthesmallclimit,

exceptaconstantdifference.

6Simulation

Inthissection,weusethebackpropagationandtheCHLalgorithmtotraina784-10-10three-layernetworktoperformhandwrittendigitrecognition.ThedataweuseareabridgedfromtheMNISTdatabasecontaining5000trainingexamplesand1000testingexamples.

Weusethesigmoidalfunctionfk(x)D1/(1Cexp(¡x))forthehiddenandoutputlayers.Thebackpropagationalgorithmisbasedonthecostfunctioninequation3.20.WesimulatetheCHLalgorithminthelayerednetworkwiththreedifferentfeedbackstrengths:cD0.05,0.1,and0.5.

Thelabelofaninputexampleisdeterminedbytheindexofthelargestoutput.Aftereachepochofon-linetraining,theclassi󰁅cationandsquarederrorforbothtrainingandtestingexamplesarecomputedandplottedinFigure2.Theclassi󰁅cationerrorisde󰁅nedasthepercentageofexamplesclassi󰁅edincorrectly.Thesquarederrorisde󰁅nedasthemeansquareddifferencebetweentheactualanddesiredoutputforalltrainingortestingexamples.Thedesiredoutputis1ontheoutputneuronwhoseindexisthesameasthelabelandzeroontheoutputneuronsotherwise.

ThelearningcurveofCHLalgorithmisverysimilartothoseofback-propagationforcD0.05and0.1,anditdeviatesfromthelearningcurveofbackpropagationforcD0.5(seeFigure2).Inthesimulation,we󰁅ndtheoverallsimulationtimeoftheCHLalgorithmisnotsigni󰁅cantlylongerthanthatofthebackpropagation.Thisisbecausethelayerednetworktendstoconvergetoasteadystatefastinthecaseofweakfeedback.

7Discussion

WehaveshownthatbackpropagationinmultilayerperceptronscanbeequivalentlyimplementedbytheCHLalgorithmifweakfeedbackisadded.

452X.XieandH.Seung

Figure2:ComparisonofperformancebetweenthebackpropagationandtheCHLalgorithm.Thetwoalgorithmsareusedtotraina784-10-10three-layernetworktoperformhandwrittendigitrecognition.TheCHLalgorithmisusedtotrainthethree-layernetworkwithfeedbackcD0.05,0.1,and0.5added.(A,B)Classi󰁅cationandsquarederrorsfortrainingexamples.(C,D)Testexamples.

Thisisdemonstratedfromtwoperspectives:evaluatingthetwoalgorithmsdirectlyandcomparingtheircostfunctions.Theessencebehindthisequiv-alenceisthatCHLeffectivelyextractstheerrorsignalofbackpropagationfromthedifferencebetweentheclampedandfreesteadystates.

TheequivalencebetweenCHLandbackpropagationinlayerednetworksholdsinthelimitofweakfeedback,whichistruemathematically.This,how-ever,doesnotimplythatinengineeringproblemsolving,weshouldsub-stituteCHLforbackpropagationtotrainneuralnetworks.Thisisbecauseinnetworkswithmanyhiddenlayers,thedifferencebetweentheclampedandfreestatesinthe󰁅rstfewlayerswouldbecomeverysmallinthelimitofweakfeedback,andthereforeCHLwillnotberobustagainstnoiseduringtraining.Inpractice,whenCHLisusedfortrainingthelayerednetworkswithmanyhiddenlayers,thefeedbackstrengthshouldnotbechosentobetoosmall,inwhichcasetheapproximationofCHLtobackpropagationalgorithmwillbeinaccurate.

BackpropagationandContrastiveHebbianLearning453

TheinvestigationontherelationshipbetweenbackpropagationandCHLismotivatedbyresearchlookingforbiologicallyplausiblelearningalgo-rithms.Itisbelievedbymanythatbackpropagationisnotbiologicallyreal-istic.However,inaninterestingstudyoncoordinatetransforminposteriorparietalcortexofmonkeys,ZipserandAnderson(1988)showthathiddenneuronsinanetworkmodeltrainedbybackpropagationshareverysimilarpropertiestorealneuronsrecordedfromthatarea.Thisworkpromptedthesearchforalearningalgorithm,whichhassimilarfunctionalityasback-propagation(Crick,1989;Mazzoni,Andersen,&Jordan,1991)andatthesametimeisbiologicallyplausible.CHLisaHebbian-typelearningalgo-rithm,relyingononlypre-andpostsynapticactivities.TheimplementationofbackpropagationequivalentlybyCHLsuggeststhatCHLcouldbeapossiblesolutiontothisproblem.

Mazzonietal.(1991)alsoproposedabiologicallyplausiblelearningruleasanalternativetobackpropagation.Theiralgorithmisareinforcement-typelearningalgorithm,whichisusuallyslow,haslargevariance,andde-pendsonglobalsignals.Incontrast,theCHLalgorithmisadeterministicalgorithm,whichcouldbemuchfasterthanreinforcementlearningalgo-rithms.However,adisadvantageofCHLisitsdependenceonspecialnet-workstructures,suchasthelayerednetworkinourcase.Whethereitheralgorithmisusedbybiologicalsystemsisanimportantquestionthatneedsfurtherinvestigationinexperimentsandtheory.

Acknowledgments

WeacknowledgehelpfuldiscussionswithJ.J.Hop󰁅eldandS.Roweis.WethankJ.Movellanforsuggestingtheerrorfunctionfornonlinearoutputneurons.

References

Ackley,D.H.,Hinton,G.E.,&Sejnowski,T.J.(1985).AlearningalgorithmforBoltzmannMachines.CognitiveScience,9,147–169.

Baldi,P.,&Pineda,F.(1991).Contrastivelearningandneuraloscillator.NeuralComputation,3,526–545.

Cohen,M.A.,&Grossberg,S.(1983).Absolutestabilityofglobalpatternfor-mationandparallelmemorystoragebycompetitiveneuralnetworks.IEEETrans.onSystems,Man,andCybernetics,13,815–826.

Crick,F.(1989).Therecentexcitementaboutneuralnetworks.Nature,337,129–132.

Hinton,G.E.(1989).DeterministicBoltzmannlearningperformssteepestde-scentinweight-space.NeuralComputation,1,143–150.

Hinton,G.E.,&McClelland,J.(1988).Learningrepresentationsbyrecirculation.InD.Z.Anderson(Ed.),NeuralInformationProcessingSystems.NewYork:AmericanInstituteofPhysics.

454X.XieandH.Seung

Hop󰁅eld,J.J.(1984).Neuronswithgradedresponsehavecollectivecomputa-tionalpropertieslikethoseoftwo-stateneurons.Proc.Natl.Acad.Sci.USA,81,3088–3092.

Hop󰁅eld,J.J.(1987).Learningalgorithmsandprobabilitydistributionsinfeed-forwardandfeed-backnetworks.Proc.Natl.Acad.Sci.USA,84,8429–8433.Mazzoni,P.,Andersen,R.A.,&Jordan,M.I.(1991).Amorebiologicallyplausi-blelearningruleforneuralnetworks.Proc.Natl.Acad.Sci.USA,88,4433–4437.Movellan,J.(1990).ContrastiveHebbianlearninginthecontinuousHop󰁅eldmodel.InD.Touretzky,J.Elman,T.Sejnowski,&G.Hinton(Eds.),Proceedingsofthe1990ConnectionistModelsSummerSchool(pp.10–17).SanMateo,CA:MorganKaufmann.

O’Reilly,R.(1996).Biologicallyplausibleerror-drivenlearningusinglocalac-tivationdifferences:Thegeneralizedrecirculationalgorithm.NeuralCompu-tation,8,895–938.

Peterson,C.,&Anderson,J.(1987).Amean󰁅eldtheorylearningalgorithmforneuralnetworks.ComplexSystems,1,995–1019.

Rumelhart,D.E.,Hinton,G.E.,&Williams,R.J.(1986a).Learninginternalrepresentationsbyerrorpropagation.InD.E.Rumelhart,&J.L.McClel-land(Eds.),Paralleldistributedprocessing:Explorationsinthemicrostructureofcognition.Vol.1:Foundations(pp.318–362).Cambridge,MA:MITPress.Rumelhart,D.E.,Hinton,G.E.,&Williams,R.J.(1986b).Learningrepresenta-tionsbyback–propagatingerrors.Nature,323,533–536.

Zipser,D.,&Andersen,R.A.(1988).Aback-propagationprogrammednetworkthatsimulatesresponsepropertiesofasubsetofposteriorparietalneurons.Nature(London),331,679–684.

ReceivedJanuary25,2002;acceptedAugust1,2002.

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文