The error says you are trying to update W with 2+ different expressions. You may only provide one update per shared variable, maybe what you want is to sum them?
Post by David ChikFile "code/mlp_momentum.py", line 465, in <module>
test_mlp()
File "code/mlp_momentum.py", line 313, in test_mlp
y: train_set_y[index * batch_size:(index + 1) * batch_size]})
File "/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/function.py", line 223, in function
profile=profile)
File "/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py", line 490, in pfunc
no_default_updates=no_default_updates)
File "/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py", line 198, in rebuild_collect_shared
(store_into, update_d[store_into]))
ValueError: ('this shared variable already has an update expression', (W, Elemwise{sub,no_inplace}.0))
updates = []
delta = eta * gparam + momentum * oldparam
updates.append((param, param - delta))
updates.append((oldparam, gparam))
Now it works, and the training runs more smoothly with less oscillations in the error, as expected.
Thanks for the help guys!
Al
Yes. Also, although I've almost never played with momentum myself, I'd expect the momentum term to be equal to the previous update rather than the previous gradient. Otherwise it seems to me it is pretty much the same as increasing the learning rate.
-=- Olivier
Post by Al DochertyYou're referring to param-eta * gparam ...
In a NN, we want to update the weights by subtracting (eta * the gradient) + (momentum * old gradient) yes?
Presumably, this isn't what the code is currently doing. So the sign error is the way we are adding the momentum term in correct?
Well, there's actually a sign error in the code below, that might be just that.
-=- Olivier
If you are confident that the implementation is good, then it might be because your hyperparameters aren't properly tuned, but you'll have to do that yourself.
Nevertheless, once implemented, the errors of my network shoot up.
I shamelessly edited your quoted code below to hopefully get something that works ;)
-=- Olivier
Post by Al Dochertyupdates = []
updates.append((param, param-eta * gparam + momentum * old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of shared variables with the same sizes as the parameters, but initialized with zeros.
So something like this you mean? Of course accounting for there being no old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum * old_grad_param))
I didn't look at the code and may be missing something, but it seems to me all you need is add to your update dict: old_grad_param=gparam (with one entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much so I think it'd be very hard to hack in momentum your way without rearranging a lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The worst part is that, while I know how to get the gradients to print out during training, I have no idea how to make them print out independently. I.e. just printing them out after I've defined them.
Al
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of updates to your parameters, so that you can use these values when calculating the next update. This gist might help, it's part of an SdA class (adapted from the Theano tutorial) that I modified to use momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.
Hello again,
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line. But how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.