Momentum term

Discussion:

Momentum term

Al Docherty

2014-03-06 16:25:01 UTC

Hello again,

I'm considering adding momentum to my neural network implementation. The
gradients and updates are calculated as so:

### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
for param in classifier.params:
gparam = T.grad(printcost, param)
gparams.append(gparam)

### CALCULATE CHANGE IN WEIGHTS
updates = []
for param, gparam in zip(classifier.params, gparams):
updates.append((param, param-eta * gparam))

I know I need to add the momentum term to the updates.append line. But how
do I store an old set of gradients?

Al

--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.

Arnaud Bergeron

2014-03-06 19:06:55 UTC

Permalink

You can add a second set of shared variable to store the gradients of the
previous run in.

Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network implementation. The
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line. But how
do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/groups/opt_out.

Lee Zamparo

2014-03-06 21:28:21 UTC

Permalink

Hi Al,

As Arnaud suggests, you need to store or cache the previous set of updates
to your parameters, so that you can use these values when calculating the
next update. This gist <https://gist.github.com/lzamparo/9400026> might
help, it's part of an SdA class (adapted from the Theano tutorial) that I
modified to use momentum and weight decay when performing parameter updates.

Hope this helps,

Lee.

Al Docherty

2014-03-06 21:57:32 UTC

Permalink

Hi Lee,

yes I think an example is more informative. I'll take a look now. Thanks as
well to Arnaud for the input

Al

Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of updates
to your parameters, so that you can use these values when calculating the
next update. This gist <https://gist.github.com/lzamparo/9400026> might
help, it's part of an SdA class (adapted from the Theano tutorial) that I
modified to use momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.

Al Docherty

2014-03-06 22:09:11 UTC

Permalink

I'd dare say your implementation differs a lot from mine, so much so I
think it'd be very hard to hack in momentum your way without rearranging a
lot of the code (and in doing so possibly leave errors around)

I guess I'm stuck on this one. I could potentially add in the momentum term
here:

updates.append((param, param-eta * gparam))

As so:

updates.append((param, (param - eta * gparam) + (momentum * old_grad))

But it's the establishing the old_grad I'm having trouble with. The worst
part is that, while I know how to get the gradients to print out during
training, I have no idea how to make them print out independently. I.e.
just printing them out after I've defined them.

Al

Olivier Delalleau

2014-03-06 22:42:51 UTC

Permalink

I didn't look at the code and may be missing something, but it seems to me all you need is add to your update dict: old_grad_param=gparam (with one entry per param).

-=- Olivier

I'd dare say your implementation differs a lot from mine, so much so I think it'd be very hard to hack in momentum your way without rearranging a lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The worst part is that, while I know how to get the gradients to print out during training, I have no idea how to make them print out independently. I.e. just printing them out after I've defined them.
Al
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of updates to your parameters, so that you can use these values when calculating the next update. This gist might help, it's part of an SdA class (adapted from the Theano tutorial) that I modified to use momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.
Hello again,
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line. But how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/groups/opt_out.

Al Docherty

2014-03-06 22:44:25 UTC

Permalink

Update dict?

Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it seems to me
all you need is add to your update dict: old_grad_param=gparam (with one
entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much so I
think it'd be very hard to hack in momentum your way without rearranging a
lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The worst
part is that, while I know how to get the gradients to print out during
training, I have no idea how to make them print out independently. I.e.
just printing them out after I've defined them.
Al

Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of
updates to your parameters, so that you can use these values when
calculating the next update. This gist<https://gist.github.com/lzamparo/9400026>might help, it's part of an SdA class (adapted from the Theano tutorial)
that I modified to use momentum and weight decay when performing parameter
updates.
Hope this helps,
Lee.

---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/groups/opt_out.

Al Docherty

2014-03-06 22:47:22 UTC

Permalink

So something like this you mean? Of course accounting for there being no
old gradient at the start of training.

updates = []
for param, gparam in zip(classifier.params, gparams):
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum * old_grad_param))

Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of
updates to your parameters, so that you can use these values when
calculating the next update. This gist<https://gist.github.com/lzamparo/9400026>might help, it's part of an SdA class (adapted from the Theano tutorial)
that I modified to use momentum and weight decay when performing parameter
updates.
Hope this helps,
Lee.

Arnaud Bergeron

2014-03-06 23:10:50 UTC

Permalink

Rather something like this:

updates = []
for param, gparam in zip(classifier.params, gparams):
updates.append((param, param-eta * gparam + momentum * old_grad_param))
for old_param, gparam in zip(classifier.old_params, gparams):
updates.append((old_param, gparam))

The content of classifier.old_params would be a new set of shared variables
with the same sizes as the parameters, but initialized with zeros.

Post by Al Docherty
So something like this you mean? Of course accounting for there being no
old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum * old_grad_param))

Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it seems to
me all you need is add to your update dict: old_grad_param=gparam (with one
entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much so I
think it'd be very hard to hack in momentum your way without rearranging a
lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The worst
part is that, while I know how to get the gradients to print out during
training, I have no idea how to make them print out independently. I.e.
just printing them out after I've defined them.
Al

Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of
updates to your parameters, so that you can use these values when
calculating the next update. This gist<https://gist.github.com/lzamparo/9400026>might help, it's part of an SdA class (adapted from the Theano tutorial)
that I modified to use momentum and weight decay when performing parameter
updates.
Hope this helps,
Lee.

Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network implementation.
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line. But
how do I store an old set of gradients?
Al

Olivier Delalleau

2014-03-06 23:22:26 UTC

Permalink

I shamelessly edited your quoted code below to hopefully get something that works ;)

-=- Olivier

Post by Al Docherty
updates = []
updates.append((param, param-eta * gparam + momentum * old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of shared variables with the same sizes as the parameters, but initialized with zeros.
So something like this you mean? Of course accounting for there being no old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum * old_grad_param))
I didn't look at the code and may be missing something, but it seems to me all you need is add to your update dict: old_grad_param=gparam (with one entry per param).
-=- Olivier

I'd dare say your implementation differs a lot from mine, so much so I think it'd be very hard to hack in momentum your way without rearranging a lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The worst part is that, while I know how to get the gradients to print out during training, I have no idea how to make them print out independently. I.e. just printing them out after I've defined them.
Al
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of updates to your parameters, so that you can use these values when calculating the next update. This gist might help, it's part of an SdA class (adapted from the Theano tutorial) that I modified to use momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.
Hello again,
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line. But how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/groups/opt_out.

--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.

Al Docherty

2014-03-07 13:43:55 UTC

Permalink

Haha thank you. I'm not so much of a newcomer to programming but Theano is
a wholllle new kettle of fish!

Post by Olivier Delalleau
I shamelessly edited your quoted code below to hopefully get something that works ;)
-=- Olivier
updates = []
for param, gparam, old_grad_params in izip(classifier.params, gparams,
updates.append((param, param-eta * gparam + momentum * old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of shared
variables with the same sizes as the parameters, but initialized with zeros.

Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it seems to
me all you need is add to your update dict: old_grad_param=gparam (with one
entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much so I
think it'd be very hard to hack in momentum your way without rearranging a
lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The
worst part is that, while I know how to get the gradients to print out
during training, I have no idea how to make them print out independently.
I.e. just printing them out after I've defined them.
Al

Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of
updates to your parameters, so that you can use these values when
calculating the next update. This gist<https://gist.github.com/lzamparo/9400026>might help, it's part of an SdA class (adapted from the Theano tutorial)
that I modified to use momentum and weight decay when performing parameter
updates.
Hope this helps,
Lee.

---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/groups/opt_out.
--

--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

Al Docherty

2014-03-07 14:10:17 UTC

Permalink

Nevertheless, once implemented, the errors of my network shoot up.

Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it seems to
me all you need is add to your update dict: old_grad_param=gparam (with one
entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much so I
think it'd be very hard to hack in momentum your way without rearranging a
lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The
worst part is that, while I know how to get the gradients to print out
during training, I have no idea how to make them print out independently.
I.e. just printing them out after I've defined them.
Al

Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of
updates to your parameters, so that you can use these values when
calculating the next update. This gist<https://gist.github.com/lzamparo/9400026>might help, it's part of an SdA class (adapted from the Theano tutorial)
that I modified to use momentum and weight decay when performing parameter
updates.
Hope this helps,
Lee.

Arnaud Bergeron

2014-03-07 19:29:06 UTC

Permalink

If you are confident that the implementation is good, then it might be
because your hyperparameters aren't properly tuned, but you'll have to do
that yourself.

Post by Al Docherty
Nevertheless, once implemented, the errors of my network shoot up.

Post by Olivier Delalleau
I shamelessly edited your quoted code below to hopefully get something that works ;)
-=- Olivier
updates = []
for param, gparam, old_grad_params in izip(classifier.params, gparams,
updates.append((param, param-eta * gparam + momentum *
old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of shared
variables with the same sizes as the parameters, but initialized with zeros.
So something like this you mean? Of course accounting for there being no

Post by Al Docherty
old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum *
old_grad_param))

Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it seems to
me all you need is add to your update dict: old_grad_param=gparam (with one
entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much so I
think it'd be very hard to hack in momentum your way without rearranging a
lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The
worst part is that, while I know how to get the gradients to print out
during training, I have no idea how to make them print out independently.
I.e. just printing them out after I've defined them.
Al

Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of
updates to your parameters, so that you can use these values when
calculating the next update. This gist<https://gist.github.com/lzamparo/9400026>might help, it's part of an SdA class (adapted from the Theano tutorial)
that I modified to use momentum and weight decay when performing parameter
updates.
Hope this helps,
Lee.

Olivier Delalleau

2014-03-07 22:39:00 UTC

Permalink

Well, there's actually a sign error in the code below, that might be just that.

-=- Olivier

If you are confident that the implementation is good, then it might be because your hyperparameters aren't properly tuned, but you'll have to do that yourself.
Nevertheless, once implemented, the errors of my network shoot up.
I shamelessly edited your quoted code below to hopefully get something that works ;)
-=- Olivier

I'd dare say your implementation differs a lot from mine, so much so I think it'd be very hard to hack in momentum your way without rearranging a lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The worst part is that, while I know how to get the gradients to print out during training, I have no idea how to make them print out independently. I.e. just printing them out after I've defined them.
Al
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of updates to your parameters, so that you can use these values when calculating the next update. This gist might help, it's part of an SdA class (adapted from the Theano tutorial) that I modified to use momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.
Hello again,
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line. But how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/groups/opt_out.

--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.

Al Docherty

2014-03-10 14:42:17 UTC

Permalink

You're referring to param-eta * gparam ...

In a NN, we want to update the weights by subtracting (eta * the gradient)
+ (momentum * old gradient) yes?

Presumably, this isn't what the code is currently doing. So the sign error
is the way we are adding the momentum term in correct?

Post by Olivier Delalleau
Well, there's actually a sign error in the code below, that might be just that.
-=- Olivier
If you are confident that the implementation is good, then it might be
because your hyperparameters aren't properly tuned, but you'll have to do
that yourself.

Post by Al Docherty
Nevertheless, once implemented, the errors of my network shoot up.

Post by Olivier Delalleau
I shamelessly edited your quoted code below to hopefully get something that works ;)
-=- Olivier
updates = []
for param, gparam, old_grad_params in izip(classifier.params, gparams,
updates.append((param, param-eta * gparam + momentum *
old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of shared
variables with the same sizes as the parameters, but initialized with zeros.
So something like this you mean? Of course accounting for there being no

Post by Al Docherty
old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum *
old_grad_param))

Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it seems
to me all you need is add to your update dict: old_grad_param=gparam (with
one entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much so I
think it'd be very hard to hack in momentum your way without rearranging a
lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The
worst part is that, while I know how to get the gradients to print out
during training, I have no idea how to make them print out independently.
I.e. just printing them out after I've defined them.
Al

Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of
updates to your parameters, so that you can use these values when
calculating the next update. This gist<https://gist.github.com/lzamparo/9400026>might help, it's part of an SdA class (adapted from the Theano tutorial)
that I modified to use momentum and weight decay when performing parameter
updates.
Hope this helps,
Lee.

--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--

Olivier Delalleau

2014-03-10 22:24:39 UTC

Permalink

Yes. Also, although I've almost never played with momentum myself, I'd expect the momentum term to be equal to the previous update rather than the previous gradient. Otherwise it seems to me it is pretty much the same as increasing the learning rate.

-=- Olivier

Post by Al Docherty
You're referring to param-eta * gparam ...
In a NN, we want to update the weights by subtracting (eta * the gradient) + (momentum * old gradient) yes?
Presumably, this isn't what the code is currently doing. So the sign error is the way we are adding the momentum term in correct?
Well, there's actually a sign error in the code below, that might be just that.
-=- Olivier

I'd dare say your implementation differs a lot from mine, so much so I think it'd be very hard to hack in momentum your way without rearranging a lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The worst part is that, while I know how to get the gradients to print out during training, I have no idea how to make them print out independently. I.e. just printing them out after I've defined them.
Al
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of updates to your parameters, so that you can use these values when calculating the next update. This gist might help, it's part of an SdA class (adapted from the Theano tutorial) that I modified to use momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.
Hello again,
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line. But how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/groups/opt_out.

--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.

Al Docherty

2014-03-11 12:34:49 UTC

Permalink

I implemented momentum, and took the addition out of the append as so:

updates = []
for param, gparam, oldparam in zip(classifier.params, gparams,
classifier.oldparams):
delta = eta * gparam + momentum * oldparam
updates.append((param, param - delta))
for oldparam, gparam in zip(classifier.oldparams, gparams):
updates.append((oldparam, gparam))

Now it works, and the training runs more smoothly with less oscillations in
the error, as expected.

Thanks for the help guys!

Al

Post by Olivier Delalleau
Yes. Also, although I've almost never played with momentum myself, I'd
expect the momentum term to be equal to the previous update rather than the
previous gradient. Otherwise it seems to me it is pretty much the same as
increasing the learning rate.
-=- Olivier
You're referring to param-eta * gparam ...
In a NN, we want to update the weights by subtracting (eta * the gradient)
+ (momentum * old gradient) yes?
Presumably, this isn't what the code is currently doing. So the sign error
is the way we are adding the momentum term in correct?

Post by Al Docherty
Nevertheless, once implemented, the errors of my network shoot up.

Post by Olivier Delalleau
I shamelessly edited your quoted code below to hopefully get something that works ;)
-=- Olivier
updates = []
for param, gparam, old_grad_params in izip(classifier.params,
updates.append((param, param-eta * gparam + momentum *
old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of shared
variables with the same sizes as the parameters, but initialized with zeros.
So something like this you mean? Of course accounting for there being

Post by Al Docherty
no old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum *
old_grad_param))

Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it seems
to me all you need is add to your update dict: old_grad_param=gparam (with
one entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much so
I think it'd be very hard to hack in momentum your way without rearranging
a lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The
worst part is that, while I know how to get the gradients to print out
during training, I have no idea how to make them print out independently.
I.e. just printing them out after I've defined them.
Al

Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of
updates to your parameters, so that you can use these values when
calculating the next update. This gist<https://gist.github.com/lzamparo/9400026>might help, it's part of an SdA class (adapted from the Theano tutorial)
that I modified to use momentum and weight decay when performing parameter
updates.
Hope this helps,
Lee.

Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line.
But how do I store an old set of gradients?
Al

---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/groups/opt_out.
--

David Chik

2014-08-07 08:20:02 UTC

Permalink

I used your code but got this error:

Traceback (most recent call last):
File "code/mlp_momentum.py", line 465, in <module>
test_mlp()
File "code/mlp_momentum.py", line 313, in test_mlp
y: train_set_y[index * batch_size:(index + 1) * batch_size]})
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/function.py",
line 223, in function
profile=profile)
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py",
line 490, in pfunc
no_default_updates=no_default_updates)
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py",
line 198, in rebuild_collect_shared
(store_into, update_d[store_into]))
ValueError: ('this shared variable already has an update expression', (W,
Elemwise{sub,no_inplace}.0))

Post by Al Docherty
updates = []
for param, gparam, oldparam in zip(classifier.params, gparams,
delta = eta * gparam + momentum * oldparam
updates.append((param, param - delta))
updates.append((oldparam, gparam))
Now it works, and the training runs more smoothly with less oscillations
in the error, as expected.
Thanks for the help guys!
Al

Post by Olivier Delalleau
Yes. Also, although I've almost never played with momentum myself, I'd
expect the momentum term to be equal to the previous update rather than the
previous gradient. Otherwise it seems to me it is pretty much the same as
increasing the learning rate.
-=- Olivier
You're referring to param-eta * gparam ...
In a NN, we want to update the weights by subtracting (eta * the
gradient) + (momentum * old gradient) yes?
Presumably, this isn't what the code is currently doing. So the sign
error is the way we are adding the momentum term in correct?

Post by Al Docherty
Nevertheless, once implemented, the errors of my network shoot up.

Post by Olivier Delalleau
I shamelessly edited your quoted code below to hopefully get something
that works ;)
-=- Olivier
updates = []
for param, gparam, old_grad_params in izip(classifier.params,
updates.append((param, param-eta * gparam + momentum *
old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of shared
variables with the same sizes as the parameters, but initialized with zeros.
So something like this you mean? Of course accounting for there being

Post by Al Docherty
no old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum * old_grad_param))

Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it seems
to me all you need is add to your update dict: old_grad_param=gparam (with
one entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much so
I think it'd be very hard to hack in momentum your way without rearranging
a lot of the code (and in doing so possibly leave errors around)
I guess I'm stuck on this one. I could potentially add in the
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The
worst part is that, while I know how to get the gradients to print out
during training, I have no idea how to make them print out independently.
I.e. just printing them out after I've defined them.
Al

Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of
updates to your parameters, so that you can use these values when
calculating the next update. This gist
<https://gist.github.com/lzamparo/9400026> might help, it's part
of an SdA class (adapted from the Theano tutorial) that I modified to use
momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.

Olivier Delalleau

2014-08-07 11:36:41 UTC

Permalink

The error says you are trying to update W with 2+ different expressions. You may only provide one update per shared variable, maybe what you want is to sum them?

-=- Olivier

Post by David Chik
File "code/mlp_momentum.py", line 465, in <module>
test_mlp()
File "code/mlp_momentum.py", line 313, in test_mlp
y: train_set_y[index * batch_size:(index + 1) * batch_size]})
File "/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/function.py", line 223, in function
profile=profile)
File "/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py", line 490, in pfunc
no_default_updates=no_default_updates)
File "/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py", line 198, in rebuild_collect_shared
(store_into, update_d[store_into]))
ValueError: ('this shared variable already has an update expression', (W, Elemwise{sub,no_inplace}.0))
updates = []
delta = eta * gparam + momentum * oldparam
updates.append((param, param - delta))
updates.append((oldparam, gparam))
Now it works, and the training runs more smoothly with less oscillations in the error, as expected.
Thanks for the help guys!
Al
Yes. Also, although I've almost never played with momentum myself, I'd expect the momentum term to be equal to the previous update rather than the previous gradient. Otherwise it seems to me it is pretty much the same as increasing the learning rate.
-=- Olivier

I'd dare say your implementation differs a lot from mine, so much so I think it'd be very hard to hack in momentum your way without rearranging a lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The worst part is that, while I know how to get the gradients to print out during training, I have no idea how to make them print out independently. I.e. just printing them out after I've defined them.
Al
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of updates to your parameters, so that you can use these values when calculating the next update. This gist might help, it's part of an SdA class (adapted from the Theano tutorial) that I modified to use momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.
Hello again,
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line. But how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/groups/opt_out.

--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.

jasonfu

2014-08-12 08:51:04 UTC

Permalink

hello,

I encountered the same problem when use the following code:

for param_i, grad_i, oldparam_i in zip(params, grads, oldparams):

delta = 0.9 * oldparam_i - learning_rate * grad_i

updates.append((oldparam_i,delta))

updates.append((param_i, param_i - delta))

it says "ValueError: ('this shared variable already has an update
expression', (W, GpuFromHost.0))". is your problem solved ?

best,

Jason

åš 2014å¹Ž8æ7æ¥ææåUTC+8äžå4æ¶20å02ç§ïŒDavid ChikåéïŒ

Post by Olivier Delalleau
Yes. Also, although I've almost never played with momentum myself, I'd
expect the momentum term to be equal to the previous update rather than the
previous gradient. Otherwise it seems to me it is pretty much the same as
increasing the learning rate.
-=- Olivier
You're referring to param-eta * gparam ...
In a NN, we want to update the weights by subtracting (eta * the
gradient) + (momentum * old gradient) yes?
Presumably, this isn't what the code is currently doing. So the sign
error is the way we are adding the momentum term in correct?

Post by Al Docherty
Nevertheless, once implemented, the errors of my network shoot up.

Post by Olivier Delalleau
I shamelessly edited your quoted code below to hopefully get
something that works ;)
-=- Olivier
updates = []
for param, gparam, old_grad_params in izip(classifier.params,
updates.append((param, param-eta * gparam + momentum * old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of
shared variables with the same sizes as the parameters, but initialized
with zeros.
So something like this you mean? Of course accounting for there being

Post by Al Docherty
no old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum * old_grad_param))

Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it
seems to me all you need is add to your update dict: old_grad_param=gparam
(with one entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much
so I think it'd be very hard to hack in momentum your way without
rearranging a lot of the code (and in doing so possibly leave errors around)
I guess I'm stuck on this one. I could potentially add in the
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The
worst part is that, while I know how to get the gradients to print out
during training, I have no idea how to make them print out independently.
I.e. just printing them out after I've defined them.
Al

Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of
updates to your parameters, so that you can use these values when
calculating the next update. This gist
<https://gist.github.com/lzamparo/9400026> might help, it's part
of an SdA class (adapted from the Theano tutorial) that I modified to use
momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.

Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append
line. But how do I store an old set of gradients?
Al

--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--

David Chik

2014-08-12 10:13:03 UTC

Permalink

No I have not solved the problem yet.

Hopefully someone will provide a complete, working example.

D

Post by jasonfu
hello,
delta = 0.9 * oldparam_i - learning_rate * grad_i
updates.append((oldparam_i,delta))
updates.append((param_i, param_i - delta))
it says "ValueError: ('this shared variable already has an update
expression', (W, GpuFromHost.0))". is your problem solved ?
best,
Jason
åš 2014å¹Ž8æ7æ¥ææåUTC+8äžå4æ¶20å02ç§ïŒDavid ChikåéïŒ

Post by Olivier Delalleau
Yes. Also, although I've almost never played with momentum myself, I'd
expect the momentum term to be equal to the previous update rather than the
previous gradient. Otherwise it seems to me it is pretty much the same as
increasing the learning rate.
-=- Olivier
You're referring to param-eta * gparam ...
In a NN, we want to update the weights by subtracting (eta * the
gradient) + (momentum * old gradient) yes?
Presumably, this isn't what the code is currently doing. So the sign
error is the way we are adding the momentum term in correct?

Post by Al Docherty
Nevertheless, once implemented, the errors of my network shoot up.

Post by Olivier Delalleau
I shamelessly edited your quoted code below to hopefully get
something that works ;)
-=- Olivier
updates = []
for param, gparam, old_grad_params in izip(classifier.params,
updates.append((param, param-eta * gparam + momentum * old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of
shared variables with the same sizes as the parameters, but initialized
with zeros.
So something like this you mean? Of course accounting for there

Post by Al Docherty
being no old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum * old_grad_param))

Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it
seems to me all you need is add to your update dict: old_grad_param=gparam
(with one entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much
so I think it'd be very hard to hack in momentum your way without
rearranging a lot of the code (and in doing so possibly leave errors around)
I guess I'm stuck on this one. I could potentially add in the
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with.
The worst part is that, while I know how to get the gradients to print out
during training, I have no idea how to make them print out independently.
I.e. just printing them out after I've defined them.
Al

Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set
of updates to your parameters, so that you can use these values when
calculating the next update. This gist
<https://gist.github.com/lzamparo/9400026> might help, it's part
of an SdA class (adapted from the Theano tutorial) that I modified to use
momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.

Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append
line. But how do I store an old set of gradients?
Al

Yifeng Li

2014-08-14 06:35:13 UTC

Permalink

Use this, it works for me:
---------------------------------------------------

delta_before=[]

for param_i in params:

delta_before_i=theano.shared(value=numpy.zeros(param_i.get_value().shape))

delta_before.append(delta_before_i)

updates = []

alpha=0.01

for param_i, grad_i, delta_before_i in zip(params, grads, delta_before):

delta_i=-learning_rate_shared * grad_i + alpha*delta_before_i

updates.append((param_i, param_i + delta_i ))

updates.append((delta_before_i,delta_i))

train_model = theano.function([index], cost, updates=updates,

givens={

x: train_set_x[index * batch_size: (index + 1) * batch_size],

y: train_set_y[index * batch_size: (index + 1) * batch_size]})
---------------------------------------------------
Yifeng Li
http://www.cmmt.ubc.ca/directory/faculty/yifeng-li

Post by David Chik
No I have not solved the problem yet.
Hopefully someone will provide a complete, working example.
D

Post by Al Docherty
updates = []
for param, gparam, oldparam in zip(classifier.params, gparams,
delta = eta * gparam + momentum * oldparam
updates.append((param, param - delta))
updates.append((oldparam, gparam))
Now it works, and the training runs more smoothly with less
oscillations in the error, as expected.
Thanks for the help guys!
Al

Post by Olivier Delalleau
Yes. Also, although I've almost never played with momentum myself, I'd
expect the momentum term to be equal to the previous update rather than the
previous gradient. Otherwise it seems to me it is pretty much the same as
increasing the learning rate.
-=- Olivier
You're referring to param-eta * gparam ...
In a NN, we want to update the weights by subtracting (eta * the
gradient) + (momentum * old gradient) yes?
Presumably, this isn't what the code is currently doing. So the sign
error is the way we are adding the momentum term in correct?

Post by Olivier Delalleau
Well, there's actually a sign error in the code below, that might be just that.
-=- Olivier
If you are confident that the implementation is good, then it might
be because your hyperparameters aren't properly tuned, but you'll have to
do that yourself.

Post by Al Docherty
Nevertheless, once implemented, the errors of my network shoot up.

Post by Olivier Delalleau
I shamelessly edited your quoted code below to hopefully get
something that works ;)
-=- Olivier
updates = []
for param, gparam, old_grad_params in izip(classifier.params,
updates.append((param, param-eta * gparam + momentum * old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of
shared variables with the same sizes as the parameters, but initialized
with zeros.
So something like this you mean? Of course accounting for there

Post by Al Docherty
being no old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum * old_grad_param))

Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it
seems to me all you need is add to your update dict: old_grad_param=gparam
(with one entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much
so I think it'd be very hard to hack in momentum your way without
rearranging a lot of the code (and in doing so possibly leave errors around)
I guess I'm stuck on this one. I could potentially add in the
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with.
The worst part is that, while I know how to get the gradients to print out
during training, I have no idea how to make them print out independently.
I.e. just printing them out after I've defined them.
Al

Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set
of updates to your parameters, so that you can use these values when
calculating the next update. This gist
<https://gist.github.com/lzamparo/9400026> might help, it's
part of an SdA class (adapted from the Theano tutorial) that I modified to
use momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.

Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append
line. But how do I store an old set of gradients?
Al

---
You received this message because you are subscribed to the
Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/groups/opt_out.
--

Yifeng Li

2014-08-14 06:41:59 UTC

Permalink

Do not know why indent does not show up in the last email...see below again

delta_before=[]

for param_i in params:

delta_before_i=theano.shared(value=numpy.zeros(param_i.get_value().shape))

delta_before.append(delta_before_i)

updates = []

alpha=0.01

for param_i, grad_i, delta_before_i in zip(params, grads, delta_before):

delta_i=-learning_rate_shared * grad_i + alpha*delta_before_i

updates.append((param_i, param_i + delta_i ))

updates.append((delta_before_i,delta_i))

train_model = theano.function([index], cost, updates=updates,

givens={x: train_set_x[index * batch_size: (index + 1) * batch_size],

y: train_set_y[index * batch_size: (index + 1) * batch_size]})

Yifeng Li

Post by Yifeng Li
---------------------------------------------------
delta_before=[]
delta_before_i=theano.shared(value=numpy.zeros(param_i.get_value().shape))
delta_before.append(delta_before_i)
updates = []
alpha=0.01
delta_i=-learning_rate_shared * grad_i + alpha*delta_before_i
updates.append((param_i, param_i + delta_i ))
updates.append((delta_before_i,delta_i))
train_model = theano.function([index], cost, updates=updates,
givens={
x: train_set_x[index * batch_size: (index + 1) * batch_size],
y: train_set_y[index * batch_size: (index + 1) * batch_size]})
---------------------------------------------------
Yifeng Li
http://www.cmmt.ubc.ca/directory/faculty/yifeng-li

Post by David Chik
No I have not solved the problem yet.
Hopefully someone will provide a complete, working example.
D

Post by Al Docherty
updates = []
for param, gparam, oldparam in zip(classifier.params, gparams,
delta = eta * gparam + momentum * oldparam
updates.append((param, param - delta))
updates.append((oldparam, gparam))
Now it works, and the training runs more smoothly with less
oscillations in the error, as expected.
Thanks for the help guys!
Al

Post by Olivier Delalleau
Yes. Also, although I've almost never played with momentum myself,
I'd expect the momentum term to be equal to the previous update rather than
the previous gradient. Otherwise it seems to me it is pretty much the same
as increasing the learning rate.
-=- Olivier
You're referring to param-eta * gparam ...
In a NN, we want to update the weights by subtracting (eta * the
gradient) + (momentum * old gradient) yes?
Presumably, this isn't what the code is currently doing. So the sign
error is the way we are adding the momentum term in correct?

Post by Olivier Delalleau
Well, there's actually a sign error in the code below, that might be just that.
-=- Olivier
If you are confident that the implementation is good, then it might
be because your hyperparameters aren't properly tuned, but you'll have to
do that yourself.

Post by Al Docherty
Nevertheless, once implemented, the errors of my network shoot up.

Post by Olivier Delalleau
I shamelessly edited your quoted code below to hopefully get
something that works ;)
-=- Olivier
updates = []
for param, gparam, old_grad_params in izip(classifier.params,
updates.append((param, param-eta * gparam + momentum * old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of
shared variables with the same sizes as the parameters, but initialized
with zeros.
So something like this you mean? Of course accounting for there

Post by Al Docherty
being no old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum *
old_grad_param))

Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it
seems to me all you need is add to your update dict: old_grad_param=gparam
(with one entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so
much so I think it'd be very hard to hack in momentum your way without
rearranging a lot of the code (and in doing so possibly leave errors around)
I guess I'm stuck on this one. I could potentially add in the
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with.
The worst part is that, while I know how to get the gradients to print out
during training, I have no idea how to make them print out independently.
I.e. just printing them out after I've defined them.
Al

Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set
of updates to your parameters, so that you can use these values when
calculating the next update. This gist
<https://gist.github.com/lzamparo/9400026> might help, it's
part of an SdA class (adapted from the Theano tutorial) that I modified to
use momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.

Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append
line. But how do I store an old set of gradients?
Al

---
You received this message because you are subscribed to the
Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from
For more options, visit https://groups.google.com/groups/opt_out
.
--

---
You received this message because you are subscribed to the
Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/groups/opt_out.

Abhishek Shivkumar

2015-06-18 17:30:20 UTC

Permalink

Hi,

When I use the code provided by Yifeng li, I get the following error. Any
idea how I can resolve it ?

TypeError: ('An update must have the same type as the original shared
variable (shared_var=W, shared_var.type=TensorType(float32, matrix),
update_val=Elemwise{add,no_inplace}.0, update_val.type=TensorType(float64,
matrix)).', 'If the difference is related to the broadcast pattern, you can
call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to
remove broadcastable dimensions.')

Post by Yifeng Li
Do not know why indent does not show up in the last email...see below again
delta_before=[]
delta_before_i=theano.shared(value=numpy.zeros(param_i.get_value().shape))
delta_before.append(delta_before_i)
updates = []
alpha=0.01
delta_i=-learning_rate_shared * grad_i + alpha*delta_before_i
updates.append((param_i, param_i + delta_i ))
updates.append((delta_before_i,delta_i))
train_model = theano.function([index], cost, updates=updates,
givens={x: train_set_x[index * batch_size: (index + 1) * batch_size],
y: train_set_y[index * batch_size: (index + 1) * batch_size]})
Yifeng Li

Post by David Chik
No I have not solved the problem yet.
Hopefully someone will provide a complete, working example.
D

Post by Al Docherty
updates = []
for param, gparam, oldparam in zip(classifier.params, gparams,
delta = eta * gparam + momentum * oldparam
updates.append((param, param - delta))
updates.append((oldparam, gparam))
Now it works, and the training runs more smoothly with less
oscillations in the error, as expected.
Thanks for the help guys!
Al

Post by Olivier Delalleau
Yes. Also, although I've almost never played with momentum myself,
I'd expect the momentum term to be equal to the previous update rather than
the previous gradient. Otherwise it seems to me it is pretty much the same
as increasing the learning rate.
-=- Olivier
You're referring to param-eta * gparam ...
In a NN, we want to update the weights by subtracting (eta * the
gradient) + (momentum * old gradient) yes?
Presumably, this isn't what the code is currently doing. So the sign
error is the way we are adding the momentum term in correct?

Post by Olivier Delalleau
Well, there's actually a sign error in the code below, that might
be just that.
-=- Olivier
If you are confident that the implementation is good, then it might
be because your hyperparameters aren't properly tuned, but you'll have to
do that yourself.

Post by Al Docherty
Nevertheless, once implemented, the errors of my network shoot up.

Post by Olivier Delalleau
I shamelessly edited your quoted code below to hopefully get
something that works ;)
-=- Olivier
updates = []
for param, gparam, old_grad_params in izip(classifier.params,
updates.append((param, param-eta * gparam + momentum *
old_grad_param))
for old_grad_param, gparam in izip(classifier.old_grad_params,
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of
shared variables with the same sizes as the parameters, but initialized
with zeros.
So something like this you mean? Of course accounting for there

Post by Al Docherty
being no old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum *
old_grad_param))

Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it
seems to me all you need is add to your update dict: old_grad_param=gparam
(with one entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so
much so I think it'd be very hard to hack in momentum your way without
rearranging a lot of the code (and in doing so possibly leave errors around)
I guess I'm stuck on this one. I could potentially add in the
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum *
old_grad))
But it's the establishing the old_grad I'm having trouble with.
The worst part is that, while I know how to get the gradients to print out
during training, I have no idea how to make them print out independently.
I.e. just printing them out after I've defined them.
Al

Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous
set of updates to your parameters, so that you can use these values when
calculating the next update. This gist
<https://gist.github.com/lzamparo/9400026> might help, it's
part of an SdA class (adapted from the Theano tutorial) that I modified to
use momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.

Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append
line. But how do I store an old set of gradients?
Al

--
---
You received this message because you are subscribed to the
Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--

--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Pascal Lamblin

2015-06-18 20:46:30 UTC

Permalink

The problem is that your original variable (parameter or velocity)
is in single precision (float32), but the update is double precision
(float64).

You can use theano.printing.debugprint(..., print_type=True) to check
which operation introduced the precision bump.

Post by Abhishek Shivkumar
Hi,
When I use the code provided by Yifeng li, I get the following error. Any
idea how I can resolve it ?
TypeError: ('An update must have the same type as the original shared
variable (shared_var=W, shared_var.type=TensorType(float32, matrix),
update_val=Elemwise{add,no_inplace}.0, update_val.type=TensorType(float64,
matrix)).', 'If the difference is related to the broadcast pattern, you can
call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to
remove broadcastable dimensions.')

Post by David Chik
No I have not solved the problem yet.
Hopefully someone will provide a complete, working example.
D

Post by jasonfu
hello,
delta = 0.9 * oldparam_i - learning_rate * grad_i
updates.append((oldparam_i,delta))
updates.append((param_i, param_i - delta))
it says "ValueError: ('this shared variable already has an update
expression', (W, GpuFromHost.0))". is your problem solved ?
best,
Jason
在 2014年8月7日星期四UTC+8下午4时20分02秒，David Chik写道：

Post by Al Docherty
updates = []
for param, gparam, oldparam in zip(classifier.params, gparams,
delta = eta * gparam + momentum * oldparam
updates.append((param, param - delta))
updates.append((oldparam, gparam))
Now it works, and the training runs more smoothly with less
oscillations in the error, as expected.
Thanks for the help guys!
Al

Post by Olivier Delalleau
Yes. Also, although I've almost never played with momentum myself,
I'd expect the momentum term to be equal to the previous update rather than
the previous gradient. Otherwise it seems to me it is pretty much the same
as increasing the learning rate.
-=- Olivier
You're referring to param-eta * gparam ...
In a NN, we want to update the weights by subtracting (eta * the
gradient) + (momentum * old gradient) yes?
Presumably, this isn't what the code is currently doing. So the sign
error is the way we are adding the momentum term in correct?

Post by Olivier Delalleau
Well, there's actually a sign error in the code below, that might
be just that.
-=- Olivier
If you are confident that the implementation is good, then it might
be because your hyperparameters aren't properly tuned, but you'll have to
do that yourself.

Post by Al Docherty
Nevertheless, once implemented, the errors of my network shoot up.

Post by Olivier Delalleau
I shamelessly edited your quoted code below to hopefully get
something that works ;)
-=- Olivier
updates = []
for param, gparam, old_grad_params in izip(classifier.params,
updates.append((param, param-eta * gparam + momentum *
old_grad_param))
for old_grad_param, gparam in izip(classifier.old_grad_params,
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of
shared variables with the same sizes as the parameters, but initialized
with zeros.
So something like this you mean? Of course accounting for there

Post by Al Docherty
being no old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum *
old_grad_param))

Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it
seems to me all you need is add to your update dict: old_grad_param=gparam
(with one entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so
much so I think it'd be very hard to hack in momentum your way without
rearranging a lot of the code (and in doing so possibly leave errors around)
I guess I'm stuck on this one. I could potentially add in the
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum *
old_grad))
But it's the establishing the old_grad I'm having trouble with.
The worst part is that, while I know how to get the gradients to print out
during training, I have no idea how to make them print out independently.
I.e. just printing them out after I've defined them.
Al

Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous
set of updates to your parameters, so that you can use these values when
calculating the next update. This gist
<https://gist.github.com/lzamparo/9400026> might help, it's
part of an SdA class (adapted from the Theano tutorial) that I modified to
use momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.
On Thursday, March 6, 2014 11:25:01 AM UTC-5, Al Docherty

Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append
line. But how do I store an old set of gradients?
Al

--
---
You received this message because you are subscribed to the
Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--

--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.

--
Pascal
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Frédéric Bastien

2015-06-18 21:55:06 UTC

Permalink

You can use the theano flag warn_float64=pdb to get into pdb to find where
this problem happen more rapidly.

Fred

Post by Pascal Lamblin
The problem is that your original variable (parameter or velocity)
is in single precision (float32), but the update is double precision
(float64).
You can use theano.printing.debugprint(..., print_type=True) to check
which operation introduced the precision bump.

Post by Abhishek Shivkumar
Hi,
When I use the code provided by Yifeng li, I get the following error.

Any

Post by Abhishek Shivkumar
idea how I can resolve it ?
TypeError: ('An update must have the same type as the original shared
variable (shared_var=W, shared_var.type=TensorType(float32, matrix),
update_val=Elemwise{add,no_inplace}.0,

update_val.type=TensorType(float64,

Post by Abhishek Shivkumar
matrix)).', 'If the difference is related to the broadcast pattern, you

can

Post by Abhishek Shivkumar
call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to
remove broadcastable dimensions.')

Post by Yifeng Li
Do not know why indent does not show up in the last email...see below

again

Post by Abhishek Shivkumar

Post by Yifeng Li
delta_before=[]

delta_before_i=theano.shared(value=numpy.zeros(param_i.get_value().shape))

Post by Abhishek Shivkumar

Post by Yifeng Li
delta_before.append(delta_before_i)
updates = []
alpha=0.01
for param_i, grad_i, delta_before_i in zip(params, grads,
delta_i=-learning_rate_shared * grad_i + alpha*delta_before_i
updates.append((param_i, param_i + delta_i ))
updates.append((delta_before_i,delta_i))
train_model = theano.function([index], cost, updates=updates,
givens={x: train_set_x[index * batch_size: (index + 1) * batch_size],
y: train_set_y[index * batch_size: (index + 1) * batch_size]})
Yifeng Li

Post by Yifeng Li
---------------------------------------------------
delta_before=[]

delta_before_i=theano.shared(value=numpy.zeros(param_i.get_value().shape))

Post by Abhishek Shivkumar

Post by Yifeng Li

Post by Yifeng Li
delta_before.append(delta_before_i)
updates = []
alpha=0.01
for param_i, grad_i, delta_before_i in zip(params, grads,
delta_i=-learning_rate_shared * grad_i + alpha*delta_before_i
updates.append((param_i, param_i + delta_i ))
updates.append((delta_before_i,delta_i))
train_model = theano.function([index], cost, updates=updates,
givens={
x: train_set_x[index * batch_size: (index + 1) * batch_size],
y: train_set_y[index * batch_size: (index + 1) * batch_size]})
---------------------------------------------------
Yifeng Li
http://www.cmmt.ubc.ca/directory/faculty/yifeng-li

Post by David Chik
No I have not solved the problem yet.
Hopefully someone will provide a complete, working example.
D

"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/function.py",