EquivalenceofBackpropagationandContrastiveHebbianLearninginaLayeredNetwork
XiaohuiXiexhx@ai.mit.edu
DepartmentofBrainandCognitiveSciences,MassachusettsInstituteofTechnology,Cambridge,MA02139,U.S.A.
H.SebastianSeungseung@mit.edu
HowardHughesMedicalInstituteandDepartmentofBrainandCognitiveSciences,MassachusettsInstituteofTechnology,Cambridge,MA02139,U.S.A.
BackpropagationandcontrastiveHebbianlearningaretwomethodsoftrainingnetworkswithhiddenneurons.Backpropagationcomputesanerrorsignalfortheoutputneuronsandspreadsitoverthehiddenneurons.ContrastiveHebbianlearninginvolvesclampingtheoutputneuronsatdesiredvaluesandlettingtheeffectspreadthroughfeedbackconnectionsovertheentirenetwork.Toinvestigatetherelationshipbetweenthesetwoformsoflearning,weconsideraspecialcaseinwhichtheyareidentical:amultilayerperceptronwithlinearoutputunits,towhichweakfeed-backconnectionshavebeenadded.Inthiscase,thechangeinnetworkstatecausedbyclampingtheoutputneuronsturnsouttobethesameastheerrorsignalspreadbybackpropagation,exceptforascalarprefactor.ThissuggeststhatthefunctionalityofbackpropagationcanberealizedalternativelybyaHebbian-typelearningalgorithm,whichissuitableforimplementationinbiologicalnetworks.1Introduction
BackpropagationandcontrastiveHebbianlearning(CHL)aretwosuper-visedlearningalgorithmsfortrainingnetworkswithhiddenneurons.Theyareofinterest,becausetheyaregenerallyapplicabletowideclassesofnet-workarchitectures.Inbackpropagation(Rumelhart,Hinton,&Williams,1986b,1986a),anerrorsignalfortheoutputneuronsiscomputedandprop-agatedbackintothehiddenneuronsthroughaseparateteachernetwork.Synapticweightsareupdatedbasedontheproductbetweentheerrorsig-nalandnetworkactivities.CHLupdatesthesynapticweightsbasedonthesteadystatesofneuronsintwodifferentphases:onewiththeoutputneu-ronsclampedtothedesiredvaluesandtheotherwiththeoutputneuronsfree(Movellan,1990;Baldi&Pineda,1991).Clampingtheoutputneurons
NeuralComputation15,441–454(2003)
°c2002MassachusettsInstituteofTechnology
442X.XieandH.Seung
causesthehiddenneuronstochangetheiractivities,andthischangecon-stitutesthebasisfortheCHLupdaterule.
CHLwasoriginallyformulatedfortheBoltzmannmachine(Ackley,Hinton,&Sejnowski,1985)andwasextendedlatertodeterministicnet-works(Peterson&Anderson,1987;Hinton,1989),inwhichcaseitcanbeinterpretedasamean-eldapproximationoftheBoltzmannmachinelearn-ingalgorithm.However,thisinterpretationisnotnecessary,andCHLcanbeformulatedpurelyfordeterministicnetworks(Movellan,1990;Baldi&Pineda,1991).Comparedtobackpropagation,CHLappearstobequitedif-ferent.Backpropagationistypicallyimplementedinfeedforwardnetworks,whereasCHLisimplementedinnetworkswithfeedback.Backpropagationisanalgorithmdrivenbyerror,whereasCHLisaHebbian-typealgorithm,withupdaterulesbasedonthecorrelationofpre-andpostsynapticactivi-ties.CHLhasbeenshowntobeequivalenttobackpropagationfornetworkswithonlyasinglelayer(Movellan,1990),andtherehasbeensomeotherworktorelateCHLtothegeneralframeworkofbackpropagation(Hop-eld,1987;O’Reilly,1996;Hinton&McClelland,1988).However,adirectlinkbetweenthemfornetworkswithhiddenneuronshasbeenlacking.Toinvestigatetherelationshipbetweenthesetwoalgorithms,wecon-sideraspecialnetworkforwhichCHLandbackpropagationareequivalent.Thisisamultilayerperceptrontowhichweakfeedbackconnectionshavebeenaddedandwithoutputneuronsthatarelinear.TheequivalenceholdsbecauseinCHL,clampingtheoutputneuronsattheirdesiredvaluescausesthehiddenneuronstochangetheiractivities,andthischangeturnsouttobeequaltotheerrorsignalspreadbybackpropagation,exceptforascalarfactor.
2TheLearningAlgorithms
Inthissection,wedescribethebackpropagationandCHLalgorithms.Back-propagationisinthestandardform,implementedinamultilayerpercep-tron(Rumelhartetal.,1986b).CHLisformulatedinalayerednetworkwithfeedbackconnectionsbetweenneighboringlayersofneurons.Itisanex-tensionofthetypicalCHLalgorithmformulatedforrecurrentsymmetricnetworks(Movellan,1990).
2.1Backpropagation.ConsideramultilayerperceptronwithLC1lay-
ersofneuronsandLlayersofsynapticweights(seeFigure1A).Theactivitiesofthekthlayerofneuronsaredenotedbythevectorxk,theirbiasesbythevectorbk,andthesynapticconnectionsfromlayerk¡1tolayerkbythematrixWk.Allneuronsinthekthlayerareassumedtohavethesametrans-ferfunctionfk,butthistransferfunctionmayvaryfromlayertolayer.Inparticular,wewillbeinterestedinthecasewherefLislinear,thoughtheothertransferfunctionsmaybenonlinear.Inthebasicdenition,fkactsonascalarandreturnsascalar.However,wewillgenerallyusefktoactona
BackpropagationandContrastiveHebbianLearning443
Figure1:Diagramonthenetworkstructuresofthe(A)multilayerperceptronandthe(B)layerednetworkwithfeedbackconnections.Layer0istheinput,layerListheoutput,andtheothersarehiddenlayers.Theforwardconnectionsarethesameforbothnetworks.InB,thereexistfeedbackconnectionsbetweenneighboringlayersofneurons.
vector,inwhichcaseitreturnsavector,operatingcomponentbycomponent
0
byfk.fkisthederivativeoffkwithrespecttoitsargument.Similartofk,
00
whenfkactsonavector,itreturnsavectoraswell,operatedbyfkineachcomponent.Weassumethatfkismonotonicallyincreasing.
Backpropagationlearningisimplementedbyrepeatingthefollowingstepsforeachexampleinatrainingsetofinput-outputpairs:1.Intheforwardpass,
xkDfk(Wkxk¡1Cbk)
(2.1)
isevaluatedforkD1toL,therebymappingtheinputx0totheout-putxL.
444X.XieandH.Seung
2.Thedesiredoutputdofthenetwork,providedbysometeacher,iscomparedwiththeactualoutputxLtocomputeanerrorsignal,
yLDDL(d¡xL).
(2.2)
3.Theerrorsignalispropagatedbackwardfromtheoutputlayerbyevaluating
yk¡1DDk¡1WTkyk
forkDLto2.4.Theweightupdate
(2.3)
0(ThematrixDk´diagffkWkxk¡1Cbk)gisdenedbyplacingthecom-0
ponentsofthevectorfk(Wkxk¡1Cbk)inthediagonalentriesofamatrix.
DWk
DgykxTk¡1
(2.4)
ismadeforkD1toL,whereg>0isaparametercontrollingthe
learningrate.
2.2ContrastiveHebbianLearning.ToformulateCHL,weconsidera
modiednetworkinwhich,inadditiontothefeedforwardconnectionsfromlayerk¡1tolayerk,therearealsofeedbackconnectionsbetweenneighboringlayers(seeFigure1B).Thefeedbackconnectionsareassumedtobesymmetricwiththefeedforwardconnections,exceptthattheyaremultipliedbyapositivefactorc.Inotherwords,thematrixcWTcontainskthefeedbackconnectionsfromlayerkbacktolayerk¡1.
CHLisimplementedbyrepeatingthefollowingstepsforeachexampleofthetrainingset:
1.Theinputlayerx0isheldxed,andthedynamicalequations
dxk
)CxkDfk(Wkxk¡1CcWTkC1xkC1Cbk
dt
(2.5)
forkD1toLarerununtilconvergencetoaxedpoint.ThecasekDLisdenedbysettingxLC1D0andWLC1D0.Convergencetoaxedpointisguaranteedunderrathergeneralconditions,tobeshownlater.
LkfortheThisiscalledthefreestateofthenetworkandisdenotedbyx
kthlayerneurons.2.Theanti-Hebbianupdate,
DWk
LkxLTD¡gck¡Lxk¡1,
(2.6)
ismadeforkD1,...,L.
BackpropagationandContrastiveHebbianLearning445
3.TheoutputlayerxLisclampedatthedesiredvalued,andthedynam-icalequation2.5forkD1toL¡1isrununtilconvergencetoaxed
Okforthekthpoint.Thisiscalledtheclampedstateandisdenotedbyx
layerneurons.4.TheHebbianupdate,
DWkDgc
k¡L
OkxOTxk¡1,
(2.7)
ismadeforkD1,...,L.
Alternatively,theweightupdatescouldbecombinedandmadeafter
bothclampedandfreestatesarecomputed,
DWkDgc
k¡L(O
OTLLT)xkxk¡1¡xkxk¡1.
(2.8)
Thisformistheoneusedinouranalysis.
ThisversionofCHLshouldlookfamiliartoanyonewhoknowsthecon-ventionalversion,implementedinsymmetricnetworks.Itwillbederivedinsection4,butrstweproveitsequivalencetothebackpropagationalgo-rithm.
3EquivalenceintheLimitofWeakFeedback
Next,weprovethatCHLinequation2.8isequivalenttothebackpropa-gationalgorithminequation2.4,providedthatthefeedbackissufcientlyweakandtheoutputneuronsarelinear.
Ok,andxLkrepresentthekthlayeractivitiesofthefeed-Innotation,xk,x
forwardnetwork,theclampedstate,andthefreestate,respectively.Weconsiderthecaseofweakfeedbackconnections,c¿1,anduse¼tomeanthattermsofhigherorderinchavebeenneglectedand»todenotetheorder.
Theproofconsistsofthefollowingfoursteps:1.Showthatthedifferencebetweenthefeedforwardandfreestatesisofordercineachcomponent,
Lk´xLk¡xk»c,dx
forallkD1,...,L.
2.Showthatinthelimitofweakfeedback,thedifferencebetweenthe
clampedandfreestatessatisesthefollowingiterativerelationship,
Ok¡xLkDcDkWTCO(cdxk´xkC1dxkC1
LL.forkD1,...,L¡1,anddxLDd¡x
L¡kC1)
(3.1)
,(3.2)
446X.XieandH.Seung
3.Showthatiftheoutputneuronsarelinear,dxkisrelatedtotheerrorsignalinbackpropagationthrough
dxkDc
L¡k
ykCO(c
L¡kC1)
.(3.3)
4.ShowthattheCHLupdatecanbeapproximatedby
DWk
()DgykxTk¡1COc.(3.4)
IntheCHLalgorithm,clampingtheoutputlayercauseschangesinthe
outputneuronstospreadbackwardtothehiddenlayersbecauseofthefeedbackconnections.Hence,thenewclampedstatediffersfromthefreestateovertheentirenetwork,includingthehiddenneurons.Equation3.2statesthatdxkdecaysexponentiallywithdistancefromtheoutputlayerofthenetwork.Thisisbecausethefeedbackisweak,sothatdxkisreducedfromdxkC1byafactorofc.
Remarkably,asindicatedinequation3.3,thedifferencebetweentheclampedandfreestatesisequivalenttotheerrorsignalykcomputedinthebackwardpassofbackpropagation,exceptforafactorofcL¡k,whentheoutputneuronsarelinear.Moreover,thisfactorannihilatesthefactorofck¡LintheCHLruleofequation2.8,resultingintheupdateruleequation3.4.
3.1Proof.Toprovetherststep,westartfromthesteady-stateequation
ofthefreephase,
LkDfk(WkxLk¡1CbkCcWTLkC1),xkC1x
(3.5)
forkD1,...,L¡1.Subtractingthisequationfromequation2.1andper-formingTaylorexpansion,wederive
Lk´xLk¡xkdx
(3.6)(3.7)
Lk¡1CbkCcWTLkC1)¡fk(Wkxk¡1Cbk)Dfk(WkxkC1x
2)TLTL(Lk¡1CcDkWkLk¡1CcWk(3.8)DDkWkdxC1xkC1COkWkdxC1xkC1k
LLDDLWLdxLL¡1CO(kWLdxLL¡1k2)fortheoutputforallhiddenlayers,anddx
layer.Inequation3.7,theexpansionisdonearoundWkxk¡1Cbk.Sincethe
L0D0,undertheaboveiterativezerothlayerisxedwiththeinput,dx
Lk»cineachcomponent,thatis,dxLkisintherelationships,wemusthavedx
orderofc,forallkD1,...,L.
Toproveequation3.2,wecomparethexed-pointequationsoftheclampedandfreestates,
Lk)DWkxLk¡1CbkCcWTLf¡1(xkC1xkC1
(3.9)(3.10)
Ok)DWkxOk¡1CbkCcWTOf¡1(xkC1xkC1,
BackpropagationandContrastiveHebbianLearning447
forkD1,...,L¡1,wheref¡1istheinversefunctionoffineachcompo-Lk.Recallthenent.Subtractthem,andperformTaylorexpansionaroundx
Lk¡xOk.Wehavedenitionofdxk´x
¡1T(xOk)¡f¡1(xLk)Wkdxk¡1CcWkC1dxkC1Df
(3.11)(3.12)
DJkdxkCO(kdxkk2),
Lk)/@xLkg.SincexLk¡xk»c,totheleadingwherethematrixJk´diagf@f¡1(x
1
orderinc,matrixJkcanbeapproximatedbyJk¼diagf@f¡1(xk)/@xkgDD¡,k
1
andJkdxkDD¡dxkCO(ckdxkk).Substitutingthisbacktoequation3.12,kweget
2))()(dxkDDk(Wkdxk¡1CcWTkC1dxkC1COckdxkkCOkdxkk.
(3.13)
LetusstartfromkD1.Sincetheinputisxed(dx0D0),wehavedx1D
TdTdcD1W2x2CO(ckdx1k),andtherefore,dx1DcD1W2x2CO(c2kdx2k).
TdNext,wecheckforkD2,inwhichcasedx2DD2W2dx1CcD2W3x3C
TO(ckdx2k).Substitutingdx1,wehavedx2DcD2W3dx3CO(ckdx2k),and
Tdtherefore,dx2DcD2W3x3CO(c2kdx3k).Followingthisiteratively,we
have
(2)dxkDcDkWTkC1dxkC1COckdxkC1k,
(3.14)
LLDd¡xLCO(c)isoforderforallkD1,...,L¡1.NoticethatdxLDd¡xT21.Hence,dxL¡1DcDL¡1WLdxLCO(c),anditeratively,wehave
CO(cdxkDcDkWTkC1dxkC1
L¡kC1)
,(3.15)
forkD1,...,L¡1.Therefore,equation3.2isproved.Thisequationindicates
thatdxk»cdxkC1anddxk»cL¡kineachcomponent.
Iftheoutputneuronsarelinear,thenyLDdxLCO(c).Consequently,dxkDcL¡kykCO(cL¡kC1)forallkD1,...,L.
Finally,theweightupdateruleofCHLfollows:
DWk
¡
)OkxOTLkxLTDgckL(xk¡1¡xk¡1
k¡LLk¡LLTdxkdxTDgck¡LdxkxxkdxTk¡1Cgck¡1Cgck¡1¡
LT()DgckLdxkxk¡1COc
(3.16)(3.17)(3.18)(3.19)
()DgykxTk¡1COc.
Lk¡1¡xk¡1»c.ThisresultshowsThelastapproximationismadebecausex
thattheCHLalgorithminthelayerednetworkwithlinearoutputneuronsisidenticaltothebackpropagationasc!0.
448Remark.
X.XieandH.Seung
Fornonlinearoutputneurons,dxLofCHLisdifferentfromyL
inequation2.2computedinbackpropagation.However,theyarewithin90degreesifviewedasvectors.Moreover,iftheactivationfunctionfortheoutputneuronsistakentobethesigmoidalfunction,thatis,fL(z)D1/(1Cexp(¡z)),theCHLalgorithmisequivalenttobackpropagationbasedonthecostfunction,
¡dTlog(xL)¡(1¡d)Tlog(1¡xL),
(3.20)
sinceinthiscase,yLDd¡xL.Intheabove,functionlogactsonavectorandreturnsavectoraswell.
4ContrastiveFunction
TheCHLalgorithmstatedinsection2.2canbeshowntoperformgradientdescentonacontrastivefunctionthatisdenedasthedifferenceofthenetwork’sLyapunovfunctionsbetweenclampedandfreestates(Movellan,1990;Baldi&Pineda,1991).
SupposeE(x)isaLyapunovfunctionofthedynamicsinequation2.5.
O)¡E(xL),wherexOandxLareConstructthecontrastivefunctionC(W)´E(x
steadystatesofthewholenetworkintheclampedandfreephase,respec-tively,andW´fW1,...,WLg.Forsimplicity,letusrstassumethatE(x)hasauniqueglobalminimumintherangeofxandnolocalminima.Ac-ListheglobalminimumcordingtothedenitionofLyapunovfunctions,x
O,butundertheextraconstraintsthattheoutputneuronsareofEandsoisx
O)¡E(xL)¸0andachieveszeroifandclampedatd.Therefore,C(W)DE(x
LDxO,thatis,whentheoutputneuronsreachthedesiredvalues.onlyifx
PerforminggradientdescentonC(W)leadstotheCHLalgorithm.Onthe
LandxOmaybeonlyotherhand,ifE(x)doesnothaveauniqueminimum,x
Oislocalminima.However,theabovediscussionstillholds,providedthatx
Lunderthefreephasedynamics.Thisimposesinthebasinofattractionofx
someconstraintsonhowtoresettheinitialstateofthenetworkaftereachphase.Onestrategyistolettheclampedphasesettletothesteadystaterstandthenrunthefreephasewithoutresettinghiddenneurons.ThiswillguaranteethatC(W)isalwaysnonnegativeandconstitutesapropercostfunction.
Next,weintroduceaLyapunovfunctionforthenetworkdynamicsinequation2.5,
E(x)D
LXkD1
c
k¡L[1T
Nk(xk)¡xTWkxk¡1¡bTxk],Fkk
(4.1)
NkisdenedsothatFN0(x)Df¡1(x).x´fx1,...,xLgrepre-wherefunctionFkk
sentsthestatesofalllayersofthenetwork.Equation4.1isextendedfrom
BackpropagationandContrastiveHebbianLearning449
Lyapunovfunctionspreviouslyintroducedforrecurrentnetworks(Hop-eld,1984;Cohen&Grossberg,1983).
ForE(x)tobeaLyapunovfunction,itmustbenonincreasingunderthedynamicsequation2.5.Thiscanbeshownby
PDE
DD
´TLX@E
kD1LXkD1LXkD1
xkc
Pkx
TP¡Wkxk¡1¡cWTkkC1xkC1¡bk]x¡1
(xPkCxk)]T[xk¡(xPkCxk)]¡fk
(4.2)
k¡L[¡1()fkxk
(4.3)
¡c
k¡L[¡1()fkxk
(4.4)(4.5)
·0,
wherethelastinequalityholdsbecausefkismonotonicaswehaveassumed.
Therefore,E(x)isnonincreasingfollowingthedynamicsandstationaryifandonlyifatthexedpoints.Furthermore,withappropriatelychosenfk,suchassigmoidfunctions,E(x)isalsoboundedbelow,inwhichcaseE(x)isaLyapunovfunction.
GiventheLyapunovfunction,wecanformthecontrastivefunctionC(W)andderivethegradient-descentalgorithmonCaccordingly.
O)withrespecttoWkisThederivativeofE(x
X@E@xO)OkdE(x@ECD
Ok@WkdWk@Wk@xk
D
@E@Wk
(4.6)(4.7)(4.8)
OkxOTD¡ck¡Lxk¡1,
OkD0forallkatthesteadywherethesecondequalityholdsbecause@E/@x
states.Similarly,wederive
L)dE(x
D¡cdWk
k¡LL
LTxkxk¡1.
(4.9)
Combiningequations4.8and4.9,wendthederivativeofC(W)withre-specttoWkshallread
O)L)dCdE(xdE(x¡DD¡c
dWkdWkdWk
k¡L(
)OkxOTLkxLTxk¡1¡xk¡1.
(4.10)
Withasuitablelearningrate,gradientdescentonC(W)leadstotheCHLalgorithminequation2.8.
450
5EquivalenceofCostFunctions
X.XieandH.Seung
Insection3,weprovedthattheCHLalgorithminthelayerednetworkwithlinearoutputneuronsisequivalenttobackpropagationintheweakfeedbacklimit.Sincebothalgorithmsperformgradientdescentonsomecostfunction,theequivalenceintheupdateruleimpliesthattheircostfunctionsshouldbeequal,uptoamultiplicativeoranadditiveconstantdifference.Next,wedemonstratethisdirectlybycomparingthecostfunctionsofthesetwoalgorithms.
Thebackpropagationlearningalgorithmisgradientdescentonthesquareddifference,kd¡xLk2/2,betweenthedesiredandactualoutputsofthenetwork.
FortheCHLalgorithm,thecostfunctionisthedifferenceofLyapunovfunctionsbetweentheclampedandfreestates,asshownintheprevioussection.Afterreordering,itcanbewrittenas
LX
CD
kD1
c
k¡L
TNk(xNk(x(Lk¡1Cbk)¡dxT[1T(FOk)¡FLk))¡dxTOk].(5.1)kWkxk¡1Wkx
Recallthatdxk»cL¡k.Therefore,thedxktermabovemultipliedbythe
factorck¡Lisoforder1,whereasthedxk¡1multipliedbythesamefactorisoforderc,andthuscanbeneglectedintheleading-orderapproximation.Afterthis,weget
CD
LXkD1
c
k¡L[1T(
Nk(xNk(x(Lk¡1Cbk)]CO(c).Ok)¡FLk))¡dxTFkWkx
(5.2)
NL(x)DxTx/2andIftheoutputneuronsarelinear(fL(x)Dx),thenF
LL¡1CbLDxLL.SubstitutingthemintoCandseparatingtermsoftheWLx
outputandhiddenlayers,wederive
CD
1T(xOLxOL¡xLTLL¡dxTLL)LxLx2
C
L¡1XkD1
c
k¡L
[¡1(xLk)¡WkxLk¡1¡bk]CO(c)dxTkfk
(5.3)
D
1
kd¡xLk2CO(c),2
wherethesecondtermwiththesumvanishesbecauseofthexed-pointequations.
Inconclusion,totheleadingorderinc,thecontrastivefunctioninCHLisequaltothesquarederrorcostfunctionofbackpropagation.Thedemon-strationontheequalityofthecostfunctionsprovidesanotherperspectiveontheequivalencebetweenthesetwoformsoflearningalgorithms.
BackpropagationandContrastiveHebbianLearning451
Sofar,wehavealwaysassumedthattheoutputneuronsarelinear.Ifthisisnottrue,howdifferentisthecostfunctionofCHLfromthatofbackprop-agation?Repeatingtheabovederivation,wegetthecostfunctionofCHLfornonlinearoutputneurons,
¡1NL(xNL(xOL)¡1TFLL)¡dxTLL)CO(c)CD1TFLfL(x
(5.4)(5.5)
NL(xL)¡dxTf¡1(xL)C1TFNL(d)CO(c).D¡1TFLL
NL(z)Dxlog(z)Withsigmoidal-typeactivationfunctionforoutputneurons,F
C(1¡z)log(1¡z).Substitutingthisintoequation5.5,wend
NL(d)CO(c),CD¡dTlog(xL)¡(1¡d)Tlog(1¡xL)C1TF
(5.6)
whichisthesameasthecostfunctioninequation3.20inthesmallclimit,
exceptaconstantdifference.
6Simulation
Inthissection,weusethebackpropagationandtheCHLalgorithmtotraina784-10-10three-layernetworktoperformhandwrittendigitrecognition.ThedataweuseareabridgedfromtheMNISTdatabasecontaining5000trainingexamplesand1000testingexamples.
Weusethesigmoidalfunctionfk(x)D1/(1Cexp(¡x))forthehiddenandoutputlayers.Thebackpropagationalgorithmisbasedonthecostfunctioninequation3.20.WesimulatetheCHLalgorithminthelayerednetworkwiththreedifferentfeedbackstrengths:cD0.05,0.1,and0.5.
Thelabelofaninputexampleisdeterminedbytheindexofthelargestoutput.Aftereachepochofon-linetraining,theclassicationandsquarederrorforbothtrainingandtestingexamplesarecomputedandplottedinFigure2.Theclassicationerrorisdenedasthepercentageofexamplesclassiedincorrectly.Thesquarederrorisdenedasthemeansquareddifferencebetweentheactualanddesiredoutputforalltrainingortestingexamples.Thedesiredoutputis1ontheoutputneuronwhoseindexisthesameasthelabelandzeroontheoutputneuronsotherwise.
ThelearningcurveofCHLalgorithmisverysimilartothoseofback-propagationforcD0.05and0.1,anditdeviatesfromthelearningcurveofbackpropagationforcD0.5(seeFigure2).Inthesimulation,wendtheoverallsimulationtimeoftheCHLalgorithmisnotsignicantlylongerthanthatofthebackpropagation.Thisisbecausethelayerednetworktendstoconvergetoasteadystatefastinthecaseofweakfeedback.
7Discussion
WehaveshownthatbackpropagationinmultilayerperceptronscanbeequivalentlyimplementedbytheCHLalgorithmifweakfeedbackisadded.
452X.XieandH.Seung
Figure2:ComparisonofperformancebetweenthebackpropagationandtheCHLalgorithm.Thetwoalgorithmsareusedtotraina784-10-10three-layernetworktoperformhandwrittendigitrecognition.TheCHLalgorithmisusedtotrainthethree-layernetworkwithfeedbackcD0.05,0.1,and0.5added.(A,B)Classicationandsquarederrorsfortrainingexamples.(C,D)Testexamples.
Thisisdemonstratedfromtwoperspectives:evaluatingthetwoalgorithmsdirectlyandcomparingtheircostfunctions.Theessencebehindthisequiv-alenceisthatCHLeffectivelyextractstheerrorsignalofbackpropagationfromthedifferencebetweentheclampedandfreesteadystates.
TheequivalencebetweenCHLandbackpropagationinlayerednetworksholdsinthelimitofweakfeedback,whichistruemathematically.This,how-ever,doesnotimplythatinengineeringproblemsolving,weshouldsub-stituteCHLforbackpropagationtotrainneuralnetworks.Thisisbecauseinnetworkswithmanyhiddenlayers,thedifferencebetweentheclampedandfreestatesintherstfewlayerswouldbecomeverysmallinthelimitofweakfeedback,andthereforeCHLwillnotberobustagainstnoiseduringtraining.Inpractice,whenCHLisusedfortrainingthelayerednetworkswithmanyhiddenlayers,thefeedbackstrengthshouldnotbechosentobetoosmall,inwhichcasetheapproximationofCHLtobackpropagationalgorithmwillbeinaccurate.
BackpropagationandContrastiveHebbianLearning453
TheinvestigationontherelationshipbetweenbackpropagationandCHLismotivatedbyresearchlookingforbiologicallyplausiblelearningalgo-rithms.Itisbelievedbymanythatbackpropagationisnotbiologicallyreal-istic.However,inaninterestingstudyoncoordinatetransforminposteriorparietalcortexofmonkeys,ZipserandAnderson(1988)showthathiddenneuronsinanetworkmodeltrainedbybackpropagationshareverysimilarpropertiestorealneuronsrecordedfromthatarea.Thisworkpromptedthesearchforalearningalgorithm,whichhassimilarfunctionalityasback-propagation(Crick,1989;Mazzoni,Andersen,&Jordan,1991)andatthesametimeisbiologicallyplausible.CHLisaHebbian-typelearningalgo-rithm,relyingononlypre-andpostsynapticactivities.TheimplementationofbackpropagationequivalentlybyCHLsuggeststhatCHLcouldbeapossiblesolutiontothisproblem.
Mazzonietal.(1991)alsoproposedabiologicallyplausiblelearningruleasanalternativetobackpropagation.Theiralgorithmisareinforcement-typelearningalgorithm,whichisusuallyslow,haslargevariance,andde-pendsonglobalsignals.Incontrast,theCHLalgorithmisadeterministicalgorithm,whichcouldbemuchfasterthanreinforcementlearningalgo-rithms.However,adisadvantageofCHLisitsdependenceonspecialnet-workstructures,suchasthelayerednetworkinourcase.Whethereitheralgorithmisusedbybiologicalsystemsisanimportantquestionthatneedsfurtherinvestigationinexperimentsandtheory.
Acknowledgments
WeacknowledgehelpfuldiscussionswithJ.J.HopeldandS.Roweis.WethankJ.Movellanforsuggestingtheerrorfunctionfornonlinearoutputneurons.
References
Ackley,D.H.,Hinton,G.E.,&Sejnowski,T.J.(1985).AlearningalgorithmforBoltzmannMachines.CognitiveScience,9,147–169.
Baldi,P.,&Pineda,F.(1991).Contrastivelearningandneuraloscillator.NeuralComputation,3,526–545.
Cohen,M.A.,&Grossberg,S.(1983).Absolutestabilityofglobalpatternfor-mationandparallelmemorystoragebycompetitiveneuralnetworks.IEEETrans.onSystems,Man,andCybernetics,13,815–826.
Crick,F.(1989).Therecentexcitementaboutneuralnetworks.Nature,337,129–132.
Hinton,G.E.(1989).DeterministicBoltzmannlearningperformssteepestde-scentinweight-space.NeuralComputation,1,143–150.
Hinton,G.E.,&McClelland,J.(1988).Learningrepresentationsbyrecirculation.InD.Z.Anderson(Ed.),NeuralInformationProcessingSystems.NewYork:AmericanInstituteofPhysics.
454X.XieandH.Seung
Hopeld,J.J.(1984).Neuronswithgradedresponsehavecollectivecomputa-tionalpropertieslikethoseoftwo-stateneurons.Proc.Natl.Acad.Sci.USA,81,3088–3092.
Hopeld,J.J.(1987).Learningalgorithmsandprobabilitydistributionsinfeed-forwardandfeed-backnetworks.Proc.Natl.Acad.Sci.USA,84,8429–8433.Mazzoni,P.,Andersen,R.A.,&Jordan,M.I.(1991).Amorebiologicallyplausi-blelearningruleforneuralnetworks.Proc.Natl.Acad.Sci.USA,88,4433–4437.Movellan,J.(1990).ContrastiveHebbianlearninginthecontinuousHopeldmodel.InD.Touretzky,J.Elman,T.Sejnowski,&G.Hinton(Eds.),Proceedingsofthe1990ConnectionistModelsSummerSchool(pp.10–17).SanMateo,CA:MorganKaufmann.
O’Reilly,R.(1996).Biologicallyplausibleerror-drivenlearningusinglocalac-tivationdifferences:Thegeneralizedrecirculationalgorithm.NeuralCompu-tation,8,895–938.
Peterson,C.,&Anderson,J.(1987).Ameaneldtheorylearningalgorithmforneuralnetworks.ComplexSystems,1,995–1019.
Rumelhart,D.E.,Hinton,G.E.,&Williams,R.J.(1986a).Learninginternalrepresentationsbyerrorpropagation.InD.E.Rumelhart,&J.L.McClel-land(Eds.),Paralleldistributedprocessing:Explorationsinthemicrostructureofcognition.Vol.1:Foundations(pp.318–362).Cambridge,MA:MITPress.Rumelhart,D.E.,Hinton,G.E.,&Williams,R.J.(1986b).Learningrepresenta-tionsbyback–propagatingerrors.Nature,323,533–536.
Zipser,D.,&Andersen,R.A.(1988).Aback-propagationprogrammednetworkthatsimulatesresponsepropertiesofasubsetofposteriorparietalneurons.Nature(London),331,679–684.
ReceivedJanuary25,2002;acceptedAugust1,2002.
因篇幅问题不能全部显示,请点此查看更多更全内容