Discussion:
Momentum term
Al Docherty
2014-03-06 16:25:01 UTC
Permalink
Hello again,

I'm considering adding momentum to my neural network implementation. The
gradients and updates are calculated as so:

### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
for param in classifier.params:
gparam = T.grad(printcost, param)
gparams.append(gparam)

### CALCULATE CHANGE IN WEIGHTS
updates = []
for param, gparam in zip(classifier.params, gparams):
updates.append((param, param-eta * gparam))


I know I need to add the momentum term to the updates.append line. But how
do I store an old set of gradients?

Al
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Arnaud Bergeron
2014-03-06 19:06:55 UTC
Permalink
You can add a second set of shared variable to store the gradients of the
previous run in.
Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network implementation. The
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line. But how
do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Lee Zamparo
2014-03-06 21:28:21 UTC
Permalink
Hi Al,

As Arnaud suggests, you need to store or cache the previous set of updates
to your parameters, so that you can use these values when calculating the
next update. This gist <https://gist.github.com/lzamparo/9400026> might
help, it's part of an SdA class (adapted from the Theano tutorial) that I
modified to use momentum and weight decay when performing parameter updates.

Hope this helps,

Lee.
Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network implementation. The
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line. But how
do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Al Docherty
2014-03-06 21:57:32 UTC
Permalink
Hi Lee,

yes I think an example is more informative. I'll take a look now. Thanks as
well to Arnaud for the input

Al
Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of updates
to your parameters, so that you can use these values when calculating the
next update. This gist <https://gist.github.com/lzamparo/9400026> might
help, it's part of an SdA class (adapted from the Theano tutorial) that I
modified to use momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.
Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network implementation. The
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line. But
how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Al Docherty
2014-03-06 22:09:11 UTC
Permalink
I'd dare say your implementation differs a lot from mine, so much so I
think it'd be very hard to hack in momentum your way without rearranging a
lot of the code (and in doing so possibly leave errors around)

I guess I'm stuck on this one. I could potentially add in the momentum term
here:

updates.append((param, param-eta * gparam))

As so:

updates.append((param, (param - eta * gparam) + (momentum * old_grad))

But it's the establishing the old_grad I'm having trouble with. The worst
part is that, while I know how to get the gradients to print out during
training, I have no idea how to make them print out independently. I.e.
just printing them out after I've defined them.

Al
Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of updates
to your parameters, so that you can use these values when calculating the
next update. This gist <https://gist.github.com/lzamparo/9400026> might
help, it's part of an SdA class (adapted from the Theano tutorial) that I
modified to use momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.
Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network implementation. The
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line. But
how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Olivier Delalleau
2014-03-06 22:42:51 UTC
Permalink
I didn't look at the code and may be missing something, but it seems to me all you need is add to your update dict: old_grad_param=gparam (with one entry per param).

-=- Olivier
I'd dare say your implementation differs a lot from mine, so much so I think it'd be very hard to hack in momentum your way without rearranging a lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The worst part is that, while I know how to get the gradients to print out during training, I have no idea how to make them print out independently. I.e. just printing them out after I've defined them.
Al
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of updates to your parameters, so that you can use these values when calculating the next update. This gist might help, it's part of an SdA class (adapted from the Theano tutorial) that I modified to use momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.
Hello again,
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line. But how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Al Docherty
2014-03-06 22:44:25 UTC
Permalink
Update dict?
Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it seems to me
all you need is add to your update dict: old_grad_param=gparam (with one
entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much so I
think it'd be very hard to hack in momentum your way without rearranging a
lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The worst
part is that, while I know how to get the gradients to print out during
training, I have no idea how to make them print out independently. I.e.
just printing them out after I've defined them.
Al
Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of
updates to your parameters, so that you can use these values when
calculating the next update. This gist<https://gist.github.com/lzamparo/9400026>might help, it's part of an SdA class (adapted from the Theano tutorial)
that I modified to use momentum and weight decay when performing parameter
updates.
Hope this helps,
Lee.
Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network implementation. The
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line. But
how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Al Docherty
2014-03-06 22:47:22 UTC
Permalink
So something like this you mean? Of course accounting for there being no
old gradient at the start of training.

updates = []
for param, gparam in zip(classifier.params, gparams):
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum * old_grad_param))
Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it seems to me
all you need is add to your update dict: old_grad_param=gparam (with one
entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much so I
think it'd be very hard to hack in momentum your way without rearranging a
lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The worst
part is that, while I know how to get the gradients to print out during
training, I have no idea how to make them print out independently. I.e.
just printing them out after I've defined them.
Al
Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of
updates to your parameters, so that you can use these values when
calculating the next update. This gist<https://gist.github.com/lzamparo/9400026>might help, it's part of an SdA class (adapted from the Theano tutorial)
that I modified to use momentum and weight decay when performing parameter
updates.
Hope this helps,
Lee.
Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network implementation. The
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line. But
how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Arnaud Bergeron
2014-03-06 23:10:50 UTC
Permalink
Rather something like this:

updates = []
for param, gparam in zip(classifier.params, gparams):
updates.append((param, param-eta * gparam + momentum * old_grad_param))
for old_param, gparam in zip(classifier.old_params, gparams):
updates.append((old_param, gparam))

The content of classifier.old_params would be a new set of shared variables
with the same sizes as the parameters, but initialized with zeros.
Post by Al Docherty
So something like this you mean? Of course accounting for there being no
old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum * old_grad_param))
Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it seems to
me all you need is add to your update dict: old_grad_param=gparam (with one
entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much so I
think it'd be very hard to hack in momentum your way without rearranging a
lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The worst
part is that, while I know how to get the gradients to print out during
training, I have no idea how to make them print out independently. I.e.
just printing them out after I've defined them.
Al
Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of
updates to your parameters, so that you can use these values when
calculating the next update. This gist<https://gist.github.com/lzamparo/9400026>might help, it's part of an SdA class (adapted from the Theano tutorial)
that I modified to use momentum and weight decay when performing parameter
updates.
Hope this helps,
Lee.
Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network implementation.
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line. But
how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
Olivier Delalleau
2014-03-06 23:22:26 UTC
Permalink
I shamelessly edited your quoted code below to hopefully get something that works ;)

-=- Olivier
Post by Al Docherty
updates = []
updates.append((param, param-eta * gparam + momentum * old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of shared variables with the same sizes as the parameters, but initialized with zeros.
So something like this you mean? Of course accounting for there being no old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum * old_grad_param))
I didn't look at the code and may be missing something, but it seems to me all you need is add to your update dict: old_grad_param=gparam (with one entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much so I think it'd be very hard to hack in momentum your way without rearranging a lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The worst part is that, while I know how to get the gradients to print out during training, I have no idea how to make them print out independently. I.e. just printing them out after I've defined them.
Al
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of updates to your parameters, so that you can use these values when calculating the next update. This gist might help, it's part of an SdA class (adapted from the Theano tutorial) that I modified to use momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.
Hello again,
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line. But how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
Al Docherty
2014-03-07 13:43:55 UTC
Permalink
Haha thank you. I'm not so much of a newcomer to programming but Theano is
a wholllle new kettle of fish!
Post by Olivier Delalleau
I shamelessly edited your quoted code below to hopefully get something that works ;)
-=- Olivier
updates = []
for param, gparam, old_grad_params in izip(classifier.params, gparams,
updates.append((param, param-eta * gparam + momentum * old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of shared
variables with the same sizes as the parameters, but initialized with zeros.
Post by Al Docherty
So something like this you mean? Of course accounting for there being no
old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum *
old_grad_param))
Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it seems to
me all you need is add to your update dict: old_grad_param=gparam (with one
entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much so I
think it'd be very hard to hack in momentum your way without rearranging a
lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The
worst part is that, while I know how to get the gradients to print out
during training, I have no idea how to make them print out independently.
I.e. just printing them out after I've defined them.
Al
Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of
updates to your parameters, so that you can use these values when
calculating the next update. This gist<https://gist.github.com/lzamparo/9400026>might help, it's part of an SdA class (adapted from the Theano tutorial)
that I modified to use momentum and weight decay when performing parameter
updates.
Hope this helps,
Lee.
Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network implementation.
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line. But
how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
Al Docherty
2014-03-07 14:10:17 UTC
Permalink
Nevertheless, once implemented, the errors of my network shoot up.
Post by Olivier Delalleau
I shamelessly edited your quoted code below to hopefully get something that works ;)
-=- Olivier
updates = []
for param, gparam, old_grad_params in izip(classifier.params, gparams,
updates.append((param, param-eta * gparam + momentum * old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of shared
variables with the same sizes as the parameters, but initialized with zeros.
Post by Al Docherty
So something like this you mean? Of course accounting for there being no
old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum *
old_grad_param))
Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it seems to
me all you need is add to your update dict: old_grad_param=gparam (with one
entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much so I
think it'd be very hard to hack in momentum your way without rearranging a
lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The
worst part is that, while I know how to get the gradients to print out
during training, I have no idea how to make them print out independently.
I.e. just printing them out after I've defined them.
Al
Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of
updates to your parameters, so that you can use these values when
calculating the next update. This gist<https://gist.github.com/lzamparo/9400026>might help, it's part of an SdA class (adapted from the Theano tutorial)
that I modified to use momentum and weight decay when performing parameter
updates.
Hope this helps,
Lee.
Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network implementation.
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line. But
how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
Arnaud Bergeron
2014-03-07 19:29:06 UTC
Permalink
If you are confident that the implementation is good, then it might be
because your hyperparameters aren't properly tuned, but you'll have to do
that yourself.
Post by Al Docherty
Nevertheless, once implemented, the errors of my network shoot up.
Post by Olivier Delalleau
I shamelessly edited your quoted code below to hopefully get something that works ;)
-=- Olivier
updates = []
for param, gparam, old_grad_params in izip(classifier.params, gparams,
updates.append((param, param-eta * gparam + momentum *
old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of shared
variables with the same sizes as the parameters, but initialized with zeros.
So something like this you mean? Of course accounting for there being no
Post by Al Docherty
old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum *
old_grad_param))
Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it seems to
me all you need is add to your update dict: old_grad_param=gparam (with one
entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much so I
think it'd be very hard to hack in momentum your way without rearranging a
lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The
worst part is that, while I know how to get the gradients to print out
during training, I have no idea how to make them print out independently.
I.e. just printing them out after I've defined them.
Al
Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of
updates to your parameters, so that you can use these values when
calculating the next update. This gist<https://gist.github.com/lzamparo/9400026>might help, it's part of an SdA class (adapted from the Theano tutorial)
that I modified to use momentum and weight decay when performing parameter
updates.
Hope this helps,
Lee.
Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network implementation.
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line.
But how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
Olivier Delalleau
2014-03-07 22:39:00 UTC
Permalink
Well, there's actually a sign error in the code below, that might be just that.

-=- Olivier
If you are confident that the implementation is good, then it might be because your hyperparameters aren't properly tuned, but you'll have to do that yourself.
Nevertheless, once implemented, the errors of my network shoot up.
I shamelessly edited your quoted code below to hopefully get something that works ;)
-=- Olivier
Post by Al Docherty
updates = []
updates.append((param, param-eta * gparam + momentum * old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of shared variables with the same sizes as the parameters, but initialized with zeros.
So something like this you mean? Of course accounting for there being no old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum * old_grad_param))
I didn't look at the code and may be missing something, but it seems to me all you need is add to your update dict: old_grad_param=gparam (with one entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much so I think it'd be very hard to hack in momentum your way without rearranging a lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The worst part is that, while I know how to get the gradients to print out during training, I have no idea how to make them print out independently. I.e. just printing them out after I've defined them.
Al
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of updates to your parameters, so that you can use these values when calculating the next update. This gist might help, it's part of an SdA class (adapted from the Theano tutorial) that I modified to use momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.
Hello again,
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line. But how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
Al Docherty
2014-03-10 14:42:17 UTC
Permalink
You're referring to param-eta * gparam ...

In a NN, we want to update the weights by subtracting (eta * the gradient)
+ (momentum * old gradient) yes?

Presumably, this isn't what the code is currently doing. So the sign error
is the way we are adding the momentum term in correct?
Post by Olivier Delalleau
Well, there's actually a sign error in the code below, that might be just that.
-=- Olivier
If you are confident that the implementation is good, then it might be
because your hyperparameters aren't properly tuned, but you'll have to do
that yourself.
Post by Al Docherty
Nevertheless, once implemented, the errors of my network shoot up.
Post by Olivier Delalleau
I shamelessly edited your quoted code below to hopefully get something that works ;)
-=- Olivier
updates = []
for param, gparam, old_grad_params in izip(classifier.params, gparams,
updates.append((param, param-eta * gparam + momentum *
old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of shared
variables with the same sizes as the parameters, but initialized with zeros.
So something like this you mean? Of course accounting for there being no
Post by Al Docherty
old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum *
old_grad_param))
Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it seems
to me all you need is add to your update dict: old_grad_param=gparam (with
one entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much so I
think it'd be very hard to hack in momentum your way without rearranging a
lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The
worst part is that, while I know how to get the gradients to print out
during training, I have no idea how to make them print out independently.
I.e. just printing them out after I've defined them.
Al
Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of
updates to your parameters, so that you can use these values when
calculating the next update. This gist<https://gist.github.com/lzamparo/9400026>might help, it's part of an SdA class (adapted from the Theano tutorial)
that I modified to use momentum and weight decay when performing parameter
updates.
Hope this helps,
Lee.
Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network implementation.
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line.
But how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
Olivier Delalleau
2014-03-10 22:24:39 UTC
Permalink
Yes. Also, although I've almost never played with momentum myself, I'd expect the momentum term to be equal to the previous update rather than the previous gradient. Otherwise it seems to me it is pretty much the same as increasing the learning rate.

-=- Olivier
Post by Al Docherty
You're referring to param-eta * gparam ...
In a NN, we want to update the weights by subtracting (eta * the gradient) + (momentum * old gradient) yes?
Presumably, this isn't what the code is currently doing. So the sign error is the way we are adding the momentum term in correct?
Well, there's actually a sign error in the code below, that might be just that.
-=- Olivier
If you are confident that the implementation is good, then it might be because your hyperparameters aren't properly tuned, but you'll have to do that yourself.
Nevertheless, once implemented, the errors of my network shoot up.
I shamelessly edited your quoted code below to hopefully get something that works ;)
-=- Olivier
Post by Al Docherty
updates = []
updates.append((param, param-eta * gparam + momentum * old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of shared variables with the same sizes as the parameters, but initialized with zeros.
So something like this you mean? Of course accounting for there being no old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum * old_grad_param))
I didn't look at the code and may be missing something, but it seems to me all you need is add to your update dict: old_grad_param=gparam (with one entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much so I think it'd be very hard to hack in momentum your way without rearranging a lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The worst part is that, while I know how to get the gradients to print out during training, I have no idea how to make them print out independently. I.e. just printing them out after I've defined them.
Al
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of updates to your parameters, so that you can use these values when calculating the next update. This gist might help, it's part of an SdA class (adapted from the Theano tutorial) that I modified to use momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.
Hello again,
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line. But how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
Al Docherty
2014-03-11 12:34:49 UTC
Permalink
I implemented momentum, and took the addition out of the append as so:

updates = []
for param, gparam, oldparam in zip(classifier.params, gparams,
classifier.oldparams):
delta = eta * gparam + momentum * oldparam
updates.append((param, param - delta))
for oldparam, gparam in zip(classifier.oldparams, gparams):
updates.append((oldparam, gparam))

Now it works, and the training runs more smoothly with less oscillations in
the error, as expected.

Thanks for the help guys!

Al
Post by Olivier Delalleau
Yes. Also, although I've almost never played with momentum myself, I'd
expect the momentum term to be equal to the previous update rather than the
previous gradient. Otherwise it seems to me it is pretty much the same as
increasing the learning rate.
-=- Olivier
You're referring to param-eta * gparam ...
In a NN, we want to update the weights by subtracting (eta * the gradient)
+ (momentum * old gradient) yes?
Presumably, this isn't what the code is currently doing. So the sign error
is the way we are adding the momentum term in correct?
Post by Olivier Delalleau
Well, there's actually a sign error in the code below, that might be just that.
-=- Olivier
If you are confident that the implementation is good, then it might be
because your hyperparameters aren't properly tuned, but you'll have to do
that yourself.
Post by Al Docherty
Nevertheless, once implemented, the errors of my network shoot up.
Post by Olivier Delalleau
I shamelessly edited your quoted code below to hopefully get something that works ;)
-=- Olivier
updates = []
for param, gparam, old_grad_params in izip(classifier.params,
updates.append((param, param-eta * gparam + momentum *
old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of shared
variables with the same sizes as the parameters, but initialized with zeros.
So something like this you mean? Of course accounting for there being
Post by Al Docherty
no old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum *
old_grad_param))
Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it seems
to me all you need is add to your update dict: old_grad_param=gparam (with
one entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much so
I think it'd be very hard to hack in momentum your way without rearranging
a lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The
worst part is that, while I know how to get the gradients to print out
during training, I have no idea how to make them print out independently.
I.e. just printing them out after I've defined them.
Al
Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of
updates to your parameters, so that you can use these values when
calculating the next update. This gist<https://gist.github.com/lzamparo/9400026>might help, it's part of an SdA class (adapted from the Theano tutorial)
that I modified to use momentum and weight decay when performing parameter
updates.
Hope this helps,
Lee.
Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line.
But how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
David Chik
2014-08-07 08:20:02 UTC
Permalink
I used your code but got this error:

Traceback (most recent call last):
File "code/mlp_momentum.py", line 465, in <module>
test_mlp()
File "code/mlp_momentum.py", line 313, in test_mlp
y: train_set_y[index * batch_size:(index + 1) * batch_size]})
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/function.py",
line 223, in function
profile=profile)
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py",
line 490, in pfunc
no_default_updates=no_default_updates)
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py",
line 198, in rebuild_collect_shared
(store_into, update_d[store_into]))
ValueError: ('this shared variable already has an update expression', (W,
Elemwise{sub,no_inplace}.0))
Post by Al Docherty
updates = []
for param, gparam, oldparam in zip(classifier.params, gparams,
delta = eta * gparam + momentum * oldparam
updates.append((param, param - delta))
updates.append((oldparam, gparam))
Now it works, and the training runs more smoothly with less oscillations
in the error, as expected.
Thanks for the help guys!
Al
Post by Olivier Delalleau
Yes. Also, although I've almost never played with momentum myself, I'd
expect the momentum term to be equal to the previous update rather than the
previous gradient. Otherwise it seems to me it is pretty much the same as
increasing the learning rate.
-=- Olivier
You're referring to param-eta * gparam ...
In a NN, we want to update the weights by subtracting (eta * the
gradient) + (momentum * old gradient) yes?
Presumably, this isn't what the code is currently doing. So the sign
error is the way we are adding the momentum term in correct?
Post by Olivier Delalleau
Well, there's actually a sign error in the code below, that might be just that.
-=- Olivier
If you are confident that the implementation is good, then it might be
because your hyperparameters aren't properly tuned, but you'll have to do
that yourself.
Post by Al Docherty
Nevertheless, once implemented, the errors of my network shoot up.
Post by Olivier Delalleau
I shamelessly edited your quoted code below to hopefully get something
that works ;)
-=- Olivier
updates = []
for param, gparam, old_grad_params in izip(classifier.params,
updates.append((param, param-eta * gparam + momentum *
old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of shared
variables with the same sizes as the parameters, but initialized with zeros.
So something like this you mean? Of course accounting for there being
Post by Al Docherty
no old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum * old_grad_param))
Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it seems
to me all you need is add to your update dict: old_grad_param=gparam (with
one entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much so
I think it'd be very hard to hack in momentum your way without rearranging
a lot of the code (and in doing so possibly leave errors around)
I guess I'm stuck on this one. I could potentially add in the
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The
worst part is that, while I know how to get the gradients to print out
during training, I have no idea how to make them print out independently.
I.e. just printing them out after I've defined them.
Al
Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of
updates to your parameters, so that you can use these values when
calculating the next update. This gist
<https://gist.github.com/lzamparo/9400026> might help, it's part
of an SdA class (adapted from the Theano tutorial) that I modified to use
momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.
Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line.
But how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
Olivier Delalleau
2014-08-07 11:36:41 UTC
Permalink
The error says you are trying to update W with 2+ different expressions. You may only provide one update per shared variable, maybe what you want is to sum them?

-=- Olivier
Post by David Chik
File "code/mlp_momentum.py", line 465, in <module>
test_mlp()
File "code/mlp_momentum.py", line 313, in test_mlp
y: train_set_y[index * batch_size:(index + 1) * batch_size]})
File "/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/function.py", line 223, in function
profile=profile)
File "/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py", line 490, in pfunc
no_default_updates=no_default_updates)
File "/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py", line 198, in rebuild_collect_shared
(store_into, update_d[store_into]))
ValueError: ('this shared variable already has an update expression', (W, Elemwise{sub,no_inplace}.0))
updates = []
delta = eta * gparam + momentum * oldparam
updates.append((param, param - delta))
updates.append((oldparam, gparam))
Now it works, and the training runs more smoothly with less oscillations in the error, as expected.
Thanks for the help guys!
Al
Yes. Also, although I've almost never played with momentum myself, I'd expect the momentum term to be equal to the previous update rather than the previous gradient. Otherwise it seems to me it is pretty much the same as increasing the learning rate.
-=- Olivier
Post by Al Docherty
You're referring to param-eta * gparam ...
In a NN, we want to update the weights by subtracting (eta * the gradient) + (momentum * old gradient) yes?
Presumably, this isn't what the code is currently doing. So the sign error is the way we are adding the momentum term in correct?
Well, there's actually a sign error in the code below, that might be just that.
-=- Olivier
If you are confident that the implementation is good, then it might be because your hyperparameters aren't properly tuned, but you'll have to do that yourself.
Nevertheless, once implemented, the errors of my network shoot up.
I shamelessly edited your quoted code below to hopefully get something that works ;)
-=- Olivier
Post by Al Docherty
updates = []
updates.append((param, param-eta * gparam + momentum * old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of shared variables with the same sizes as the parameters, but initialized with zeros.
So something like this you mean? Of course accounting for there being no old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum * old_grad_param))
I didn't look at the code and may be missing something, but it seems to me all you need is add to your update dict: old_grad_param=gparam (with one entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much so I think it'd be very hard to hack in momentum your way without rearranging a lot of the code (and in doing so possibly leave errors around)
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The worst part is that, while I know how to get the gradients to print out during training, I have no idea how to make them print out independently. I.e. just printing them out after I've defined them.
Al
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of updates to your parameters, so that you can use these values when calculating the next update. This gist might help, it's part of an SdA class (adapted from the Theano tutorial) that I modified to use momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.
Hello again,
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line. But how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
jasonfu
2014-08-12 08:51:04 UTC
Permalink
hello,

I encountered the same problem when use the following code:

for param_i, grad_i, oldparam_i in zip(params, grads, oldparams):

delta = 0.9 * oldparam_i - learning_rate * grad_i

updates.append((oldparam_i,delta))

updates.append((param_i, param_i - delta))


it says "ValueError: ('this shared variable already has an update
expression', (W, GpuFromHost.0))". is your problem solved ?


best,

Jason

圚 2014幎8月7日星期四UTC+8䞋午4时20分02秒David Chik写道
Post by David Chik
File "code/mlp_momentum.py", line 465, in <module>
test_mlp()
File "code/mlp_momentum.py", line 313, in test_mlp
y: train_set_y[index * batch_size:(index + 1) * batch_size]})
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/function.py",
line 223, in function
profile=profile)
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py",
line 490, in pfunc
no_default_updates=no_default_updates)
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py",
line 198, in rebuild_collect_shared
(store_into, update_d[store_into]))
ValueError: ('this shared variable already has an update expression', (W,
Elemwise{sub,no_inplace}.0))
Post by Al Docherty
updates = []
for param, gparam, oldparam in zip(classifier.params, gparams,
delta = eta * gparam + momentum * oldparam
updates.append((param, param - delta))
updates.append((oldparam, gparam))
Now it works, and the training runs more smoothly with less oscillations
in the error, as expected.
Thanks for the help guys!
Al
Post by Olivier Delalleau
Yes. Also, although I've almost never played with momentum myself, I'd
expect the momentum term to be equal to the previous update rather than the
previous gradient. Otherwise it seems to me it is pretty much the same as
increasing the learning rate.
-=- Olivier
You're referring to param-eta * gparam ...
In a NN, we want to update the weights by subtracting (eta * the
gradient) + (momentum * old gradient) yes?
Presumably, this isn't what the code is currently doing. So the sign
error is the way we are adding the momentum term in correct?
Post by Olivier Delalleau
Well, there's actually a sign error in the code below, that might be just that.
-=- Olivier
If you are confident that the implementation is good, then it might be
because your hyperparameters aren't properly tuned, but you'll have to do
that yourself.
Post by Al Docherty
Nevertheless, once implemented, the errors of my network shoot up.
Post by Olivier Delalleau
I shamelessly edited your quoted code below to hopefully get
something that works ;)
-=- Olivier
updates = []
for param, gparam, old_grad_params in izip(classifier.params,
updates.append((param, param-eta * gparam + momentum * old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of
shared variables with the same sizes as the parameters, but initialized
with zeros.
So something like this you mean? Of course accounting for there being
Post by Al Docherty
no old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum * old_grad_param))
Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it
seems to me all you need is add to your update dict: old_grad_param=gparam
(with one entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much
so I think it'd be very hard to hack in momentum your way without
rearranging a lot of the code (and in doing so possibly leave errors around)
I guess I'm stuck on this one. I could potentially add in the
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with. The
worst part is that, while I know how to get the gradients to print out
during training, I have no idea how to make them print out independently.
I.e. just printing them out after I've defined them.
Al
Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set of
updates to your parameters, so that you can use these values when
calculating the next update. This gist
<https://gist.github.com/lzamparo/9400026> might help, it's part
of an SdA class (adapted from the Theano tutorial) that I modified to use
momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.
Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append
line. But how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
David Chik
2014-08-12 10:13:03 UTC
Permalink
No I have not solved the problem yet.

Hopefully someone will provide a complete, working example.

D
Post by jasonfu
hello,
delta = 0.9 * oldparam_i - learning_rate * grad_i
updates.append((oldparam_i,delta))
updates.append((param_i, param_i - delta))
it says "ValueError: ('this shared variable already has an update
expression', (W, GpuFromHost.0))". is your problem solved ?
best,
Jason
圚 2014幎8月7日星期四UTC+8䞋午4时20分02秒David Chik写道
Post by David Chik
File "code/mlp_momentum.py", line 465, in <module>
test_mlp()
File "code/mlp_momentum.py", line 313, in test_mlp
y: train_set_y[index * batch_size:(index + 1) * batch_size]})
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/function.py",
line 223, in function
profile=profile)
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py",
line 490, in pfunc
no_default_updates=no_default_updates)
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py",
line 198, in rebuild_collect_shared
(store_into, update_d[store_into]))
ValueError: ('this shared variable already has an update expression', (W,
Elemwise{sub,no_inplace}.0))
Post by Al Docherty
updates = []
for param, gparam, oldparam in zip(classifier.params, gparams,
delta = eta * gparam + momentum * oldparam
updates.append((param, param - delta))
updates.append((oldparam, gparam))
Now it works, and the training runs more smoothly with less oscillations
in the error, as expected.
Thanks for the help guys!
Al
Post by Olivier Delalleau
Yes. Also, although I've almost never played with momentum myself, I'd
expect the momentum term to be equal to the previous update rather than the
previous gradient. Otherwise it seems to me it is pretty much the same as
increasing the learning rate.
-=- Olivier
You're referring to param-eta * gparam ...
In a NN, we want to update the weights by subtracting (eta * the
gradient) + (momentum * old gradient) yes?
Presumably, this isn't what the code is currently doing. So the sign
error is the way we are adding the momentum term in correct?
Post by Olivier Delalleau
Well, there's actually a sign error in the code below, that might be just that.
-=- Olivier
If you are confident that the implementation is good, then it might be
because your hyperparameters aren't properly tuned, but you'll have to do
that yourself.
Post by Al Docherty
Nevertheless, once implemented, the errors of my network shoot up.
Post by Olivier Delalleau
I shamelessly edited your quoted code below to hopefully get
something that works ;)
-=- Olivier
updates = []
for param, gparam, old_grad_params in izip(classifier.params,
updates.append((param, param-eta * gparam + momentum * old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of
shared variables with the same sizes as the parameters, but initialized
with zeros.
So something like this you mean? Of course accounting for there
Post by Al Docherty
being no old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum * old_grad_param))
Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it
seems to me all you need is add to your update dict: old_grad_param=gparam
(with one entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much
so I think it'd be very hard to hack in momentum your way without
rearranging a lot of the code (and in doing so possibly leave errors around)
I guess I'm stuck on this one. I could potentially add in the
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with.
The worst part is that, while I know how to get the gradients to print out
during training, I have no idea how to make them print out independently.
I.e. just printing them out after I've defined them.
Al
Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set
of updates to your parameters, so that you can use these values when
calculating the next update. This gist
<https://gist.github.com/lzamparo/9400026> might help, it's part
of an SdA class (adapted from the Theano tutorial) that I modified to use
momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.
Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append
line. But how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
Yifeng Li
2014-08-14 06:35:13 UTC
Permalink
Use this, it works for me:
---------------------------------------------------

delta_before=[]

for param_i in params:

delta_before_i=theano.shared(value=numpy.zeros(param_i.get_value().shape))

delta_before.append(delta_before_i)

updates = []

alpha=0.01

for param_i, grad_i, delta_before_i in zip(params, grads, delta_before):

delta_i=-learning_rate_shared * grad_i + alpha*delta_before_i

updates.append((param_i, param_i + delta_i ))

updates.append((delta_before_i,delta_i))

train_model = theano.function([index], cost, updates=updates,

givens={

x: train_set_x[index * batch_size: (index + 1) * batch_size],

y: train_set_y[index * batch_size: (index + 1) * batch_size]})
---------------------------------------------------
Yifeng Li
http://www.cmmt.ubc.ca/directory/faculty/yifeng-li
Post by David Chik
No I have not solved the problem yet.
Hopefully someone will provide a complete, working example.
D
Post by jasonfu
hello,
delta = 0.9 * oldparam_i - learning_rate * grad_i
updates.append((oldparam_i,delta))
updates.append((param_i, param_i - delta))
it says "ValueError: ('this shared variable already has an update
expression', (W, GpuFromHost.0))". is your problem solved ?
best,
Jason
圚 2014幎8月7日星期四UTC+8䞋午4时20分02秒David Chik写道
Post by David Chik
File "code/mlp_momentum.py", line 465, in <module>
test_mlp()
File "code/mlp_momentum.py", line 313, in test_mlp
y: train_set_y[index * batch_size:(index + 1) * batch_size]})
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/function.py",
line 223, in function
profile=profile)
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py",
line 490, in pfunc
no_default_updates=no_default_updates)
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py",
line 198, in rebuild_collect_shared
(store_into, update_d[store_into]))
ValueError: ('this shared variable already has an update expression',
(W, Elemwise{sub,no_inplace}.0))
Post by Al Docherty
updates = []
for param, gparam, oldparam in zip(classifier.params, gparams,
delta = eta * gparam + momentum * oldparam
updates.append((param, param - delta))
updates.append((oldparam, gparam))
Now it works, and the training runs more smoothly with less
oscillations in the error, as expected.
Thanks for the help guys!
Al
Post by Olivier Delalleau
Yes. Also, although I've almost never played with momentum myself, I'd
expect the momentum term to be equal to the previous update rather than the
previous gradient. Otherwise it seems to me it is pretty much the same as
increasing the learning rate.
-=- Olivier
You're referring to param-eta * gparam ...
In a NN, we want to update the weights by subtracting (eta * the
gradient) + (momentum * old gradient) yes?
Presumably, this isn't what the code is currently doing. So the sign
error is the way we are adding the momentum term in correct?
Post by Olivier Delalleau
Well, there's actually a sign error in the code below, that might be just that.
-=- Olivier
If you are confident that the implementation is good, then it might
be because your hyperparameters aren't properly tuned, but you'll have to
do that yourself.
Post by Al Docherty
Nevertheless, once implemented, the errors of my network shoot up.
Post by Olivier Delalleau
I shamelessly edited your quoted code below to hopefully get
something that works ;)
-=- Olivier
updates = []
for param, gparam, old_grad_params in izip(classifier.params,
updates.append((param, param-eta * gparam + momentum * old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of
shared variables with the same sizes as the parameters, but initialized
with zeros.
So something like this you mean? Of course accounting for there
Post by Al Docherty
being no old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum * old_grad_param))
Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it
seems to me all you need is add to your update dict: old_grad_param=gparam
(with one entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so much
so I think it'd be very hard to hack in momentum your way without
rearranging a lot of the code (and in doing so possibly leave errors around)
I guess I'm stuck on this one. I could potentially add in the
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with.
The worst part is that, while I know how to get the gradients to print out
during training, I have no idea how to make them print out independently.
I.e. just printing them out after I've defined them.
Al
Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set
of updates to your parameters, so that you can use these values when
calculating the next update. This gist
<https://gist.github.com/lzamparo/9400026> might help, it's
part of an SdA class (adapted from the Theano tutorial) that I modified to
use momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.
Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append
line. But how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the
Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
Yifeng Li
2014-08-14 06:41:59 UTC
Permalink
Do not know why indent does not show up in the last email...see below again

delta_before=[]

for param_i in params:

delta_before_i=theano.shared(value=numpy.zeros(param_i.get_value().shape))

delta_before.append(delta_before_i)



updates = []

alpha=0.01

for param_i, grad_i, delta_before_i in zip(params, grads, delta_before):

delta_i=-learning_rate_shared * grad_i + alpha*delta_before_i

updates.append((param_i, param_i + delta_i ))

updates.append((delta_before_i,delta_i))




train_model = theano.function([index], cost, updates=updates,

givens={x: train_set_x[index * batch_size: (index + 1) * batch_size],

y: train_set_y[index * batch_size: (index + 1) * batch_size]})


Yifeng Li
Post by Yifeng Li
---------------------------------------------------
delta_before=[]
delta_before_i=theano.shared(value=numpy.zeros(param_i.get_value().shape))
delta_before.append(delta_before_i)
updates = []
alpha=0.01
delta_i=-learning_rate_shared * grad_i + alpha*delta_before_i
updates.append((param_i, param_i + delta_i ))
updates.append((delta_before_i,delta_i))
train_model = theano.function([index], cost, updates=updates,
givens={
x: train_set_x[index * batch_size: (index + 1) * batch_size],
y: train_set_y[index * batch_size: (index + 1) * batch_size]})
---------------------------------------------------
Yifeng Li
http://www.cmmt.ubc.ca/directory/faculty/yifeng-li
Post by David Chik
No I have not solved the problem yet.
Hopefully someone will provide a complete, working example.
D
Post by jasonfu
hello,
delta = 0.9 * oldparam_i - learning_rate * grad_i
updates.append((oldparam_i,delta))
updates.append((param_i, param_i - delta))
it says "ValueError: ('this shared variable already has an update
expression', (W, GpuFromHost.0))". is your problem solved ?
best,
Jason
圚 2014幎8月7日星期四UTC+8䞋午4时20分02秒David Chik写道
Post by David Chik
File "code/mlp_momentum.py", line 465, in <module>
test_mlp()
File "code/mlp_momentum.py", line 313, in test_mlp
y: train_set_y[index * batch_size:(index + 1) * batch_size]})
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/function.py",
line 223, in function
profile=profile)
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py",
line 490, in pfunc
no_default_updates=no_default_updates)
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py",
line 198, in rebuild_collect_shared
(store_into, update_d[store_into]))
ValueError: ('this shared variable already has an update expression',
(W, Elemwise{sub,no_inplace}.0))
Post by Al Docherty
updates = []
for param, gparam, oldparam in zip(classifier.params, gparams,
delta = eta * gparam + momentum * oldparam
updates.append((param, param - delta))
updates.append((oldparam, gparam))
Now it works, and the training runs more smoothly with less
oscillations in the error, as expected.
Thanks for the help guys!
Al
Post by Olivier Delalleau
Yes. Also, although I've almost never played with momentum myself,
I'd expect the momentum term to be equal to the previous update rather than
the previous gradient. Otherwise it seems to me it is pretty much the same
as increasing the learning rate.
-=- Olivier
You're referring to param-eta * gparam ...
In a NN, we want to update the weights by subtracting (eta * the
gradient) + (momentum * old gradient) yes?
Presumably, this isn't what the code is currently doing. So the sign
error is the way we are adding the momentum term in correct?
Post by Olivier Delalleau
Well, there's actually a sign error in the code below, that might be just that.
-=- Olivier
If you are confident that the implementation is good, then it might
be because your hyperparameters aren't properly tuned, but you'll have to
do that yourself.
Post by Al Docherty
Nevertheless, once implemented, the errors of my network shoot up.
Post by Olivier Delalleau
I shamelessly edited your quoted code below to hopefully get
something that works ;)
-=- Olivier
updates = []
for param, gparam, old_grad_params in izip(classifier.params,
updates.append((param, param-eta * gparam + momentum * old_grad_param))
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of
shared variables with the same sizes as the parameters, but initialized
with zeros.
So something like this you mean? Of course accounting for there
Post by Al Docherty
being no old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum *
old_grad_param))
Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it
seems to me all you need is add to your update dict: old_grad_param=gparam
(with one entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so
much so I think it'd be very hard to hack in momentum your way without
rearranging a lot of the code (and in doing so possibly leave errors around)
I guess I'm stuck on this one. I could potentially add in the
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum * old_grad))
But it's the establishing the old_grad I'm having trouble with.
The worst part is that, while I know how to get the gradients to print out
during training, I have no idea how to make them print out independently.
I.e. just printing them out after I've defined them.
Al
Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous set
of updates to your parameters, so that you can use these values when
calculating the next update. This gist
<https://gist.github.com/lzamparo/9400026> might help, it's
part of an SdA class (adapted from the Theano tutorial) that I modified to
use momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.
Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append
line. But how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the
Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from
For more options, visit https://groups.google.com/groups/opt_out
.
--
---
You received this message because you are subscribed to the
Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.
Abhishek Shivkumar
2015-06-18 17:30:20 UTC
Permalink
Hi,

When I use the code provided by Yifeng li, I get the following error. Any
idea how I can resolve it ?

TypeError: ('An update must have the same type as the original shared
variable (shared_var=W, shared_var.type=TensorType(float32, matrix),
update_val=Elemwise{add,no_inplace}.0, update_val.type=TensorType(float64,
matrix)).', 'If the difference is related to the broadcast pattern, you can
call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to
remove broadcastable dimensions.')
Post by Yifeng Li
Do not know why indent does not show up in the last email...see below again
delta_before=[]
delta_before_i=theano.shared(value=numpy.zeros(param_i.get_value().shape))
delta_before.append(delta_before_i)
updates = []
alpha=0.01
delta_i=-learning_rate_shared * grad_i + alpha*delta_before_i
updates.append((param_i, param_i + delta_i ))
updates.append((delta_before_i,delta_i))
train_model = theano.function([index], cost, updates=updates,
givens={x: train_set_x[index * batch_size: (index + 1) * batch_size],
y: train_set_y[index * batch_size: (index + 1) * batch_size]})
Yifeng Li
Post by Yifeng Li
---------------------------------------------------
delta_before=[]
delta_before_i=theano.shared(value=numpy.zeros(param_i.get_value().shape))
delta_before.append(delta_before_i)
updates = []
alpha=0.01
delta_i=-learning_rate_shared * grad_i + alpha*delta_before_i
updates.append((param_i, param_i + delta_i ))
updates.append((delta_before_i,delta_i))
train_model = theano.function([index], cost, updates=updates,
givens={
x: train_set_x[index * batch_size: (index + 1) * batch_size],
y: train_set_y[index * batch_size: (index + 1) * batch_size]})
---------------------------------------------------
Yifeng Li
http://www.cmmt.ubc.ca/directory/faculty/yifeng-li
Post by David Chik
No I have not solved the problem yet.
Hopefully someone will provide a complete, working example.
D
Post by jasonfu
hello,
delta = 0.9 * oldparam_i - learning_rate * grad_i
updates.append((oldparam_i,delta))
updates.append((param_i, param_i - delta))
it says "ValueError: ('this shared variable already has an update
expression', (W, GpuFromHost.0))". is your problem solved ?
best,
Jason
圚 2014幎8月7日星期四UTC+8䞋午4时20分02秒David Chik写道
Post by David Chik
File "code/mlp_momentum.py", line 465, in <module>
test_mlp()
File "code/mlp_momentum.py", line 313, in test_mlp
y: train_set_y[index * batch_size:(index + 1) * batch_size]})
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/function.py",
line 223, in function
profile=profile)
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py",
line 490, in pfunc
no_default_updates=no_default_updates)
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py",
line 198, in rebuild_collect_shared
(store_into, update_d[store_into]))
ValueError: ('this shared variable already has an update expression',
(W, Elemwise{sub,no_inplace}.0))
Post by Al Docherty
updates = []
for param, gparam, oldparam in zip(classifier.params, gparams,
delta = eta * gparam + momentum * oldparam
updates.append((param, param - delta))
updates.append((oldparam, gparam))
Now it works, and the training runs more smoothly with less
oscillations in the error, as expected.
Thanks for the help guys!
Al
Post by Olivier Delalleau
Yes. Also, although I've almost never played with momentum myself,
I'd expect the momentum term to be equal to the previous update rather than
the previous gradient. Otherwise it seems to me it is pretty much the same
as increasing the learning rate.
-=- Olivier
You're referring to param-eta * gparam ...
In a NN, we want to update the weights by subtracting (eta * the
gradient) + (momentum * old gradient) yes?
Presumably, this isn't what the code is currently doing. So the sign
error is the way we are adding the momentum term in correct?
Post by Olivier Delalleau
Well, there's actually a sign error in the code below, that might
be just that.
-=- Olivier
If you are confident that the implementation is good, then it might
be because your hyperparameters aren't properly tuned, but you'll have to
do that yourself.
Post by Al Docherty
Nevertheless, once implemented, the errors of my network shoot up.
Post by Olivier Delalleau
I shamelessly edited your quoted code below to hopefully get
something that works ;)
-=- Olivier
updates = []
for param, gparam, old_grad_params in izip(classifier.params,
updates.append((param, param-eta * gparam + momentum *
old_grad_param))
for old_grad_param, gparam in izip(classifier.old_grad_params,
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of
shared variables with the same sizes as the parameters, but initialized
with zeros.
So something like this you mean? Of course accounting for there
Post by Al Docherty
being no old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum *
old_grad_param))
Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it
seems to me all you need is add to your update dict: old_grad_param=gparam
(with one entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so
much so I think it'd be very hard to hack in momentum your way without
rearranging a lot of the code (and in doing so possibly leave errors around)
I guess I'm stuck on this one. I could potentially add in the
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum *
old_grad))
But it's the establishing the old_grad I'm having trouble with.
The worst part is that, while I know how to get the gradients to print out
during training, I have no idea how to make them print out independently.
I.e. just printing them out after I've defined them.
Al
Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous
set of updates to your parameters, so that you can use these values when
calculating the next update. This gist
<https://gist.github.com/lzamparo/9400026> might help, it's
part of an SdA class (adapted from the Theano tutorial) that I modified to
use momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.
Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append
line. But how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the
Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from
For more options, visit https://groups.google.com/grou
ps/opt_out.
--
---
You received this message because you are subscribed to the
Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from
For more options, visit https://groups.google.com/groups/opt_out
.
--
---
You received this message because you are subscribed to the
Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Pascal Lamblin
2015-06-18 20:46:30 UTC
Permalink
The problem is that your original variable (parameter or velocity)
is in single precision (float32), but the update is double precision
(float64).

You can use theano.printing.debugprint(..., print_type=True) to check
which operation introduced the precision bump.
Post by Abhishek Shivkumar
Hi,
When I use the code provided by Yifeng li, I get the following error. Any
idea how I can resolve it ?
TypeError: ('An update must have the same type as the original shared
variable (shared_var=W, shared_var.type=TensorType(float32, matrix),
update_val=Elemwise{add,no_inplace}.0, update_val.type=TensorType(float64,
matrix)).', 'If the difference is related to the broadcast pattern, you can
call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to
remove broadcastable dimensions.')
Post by Yifeng Li
Do not know why indent does not show up in the last email...see below again
delta_before=[]
delta_before_i=theano.shared(value=numpy.zeros(param_i.get_value().shape))
delta_before.append(delta_before_i)
updates = []
alpha=0.01
delta_i=-learning_rate_shared * grad_i + alpha*delta_before_i
updates.append((param_i, param_i + delta_i ))
updates.append((delta_before_i,delta_i))
train_model = theano.function([index], cost, updates=updates,
givens={x: train_set_x[index * batch_size: (index + 1) * batch_size],
y: train_set_y[index * batch_size: (index + 1) * batch_size]})
Yifeng Li
Post by Yifeng Li
---------------------------------------------------
delta_before=[]
delta_before_i=theano.shared(value=numpy.zeros(param_i.get_value().shape))
delta_before.append(delta_before_i)
updates = []
alpha=0.01
delta_i=-learning_rate_shared * grad_i + alpha*delta_before_i
updates.append((param_i, param_i + delta_i ))
updates.append((delta_before_i,delta_i))
train_model = theano.function([index], cost, updates=updates,
givens={
x: train_set_x[index * batch_size: (index + 1) * batch_size],
y: train_set_y[index * batch_size: (index + 1) * batch_size]})
---------------------------------------------------
Yifeng Li
http://www.cmmt.ubc.ca/directory/faculty/yifeng-li
Post by David Chik
No I have not solved the problem yet.
Hopefully someone will provide a complete, working example.
D
Post by jasonfu
hello,
delta = 0.9 * oldparam_i - learning_rate * grad_i
updates.append((oldparam_i,delta))
updates.append((param_i, param_i - delta))
it says "ValueError: ('this shared variable already has an update
expression', (W, GpuFromHost.0))". is your problem solved ?
best,
Jason
在 2014年8月7日星期四UTC+8下午4时20分02秒,David Chik写道:
Post by David Chik
File "code/mlp_momentum.py", line 465, in <module>
test_mlp()
File "code/mlp_momentum.py", line 313, in test_mlp
y: train_set_y[index * batch_size:(index + 1) * batch_size]})
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/function.py",
line 223, in function
profile=profile)
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py",
line 490, in pfunc
no_default_updates=no_default_updates)
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py",
line 198, in rebuild_collect_shared
(store_into, update_d[store_into]))
ValueError: ('this shared variable already has an update expression',
(W, Elemwise{sub,no_inplace}.0))
Post by Al Docherty
updates = []
for param, gparam, oldparam in zip(classifier.params, gparams,
delta = eta * gparam + momentum * oldparam
updates.append((param, param - delta))
updates.append((oldparam, gparam))
Now it works, and the training runs more smoothly with less
oscillations in the error, as expected.
Thanks for the help guys!
Al
Post by Olivier Delalleau
Yes. Also, although I've almost never played with momentum myself,
I'd expect the momentum term to be equal to the previous update rather than
the previous gradient. Otherwise it seems to me it is pretty much the same
as increasing the learning rate.
-=- Olivier
You're referring to param-eta * gparam ...
In a NN, we want to update the weights by subtracting (eta * the
gradient) + (momentum * old gradient) yes?
Presumably, this isn't what the code is currently doing. So the sign
error is the way we are adding the momentum term in correct?
Post by Olivier Delalleau
Well, there's actually a sign error in the code below, that might
be just that.
-=- Olivier
If you are confident that the implementation is good, then it might
be because your hyperparameters aren't properly tuned, but you'll have to
do that yourself.
Post by Al Docherty
Nevertheless, once implemented, the errors of my network shoot up.
Post by Olivier Delalleau
I shamelessly edited your quoted code below to hopefully get
something that works ;)
-=- Olivier
updates = []
for param, gparam, old_grad_params in izip(classifier.params,
updates.append((param, param-eta * gparam + momentum *
old_grad_param))
for old_grad_param, gparam in izip(classifier.old_grad_params,
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set of
shared variables with the same sizes as the parameters, but initialized
with zeros.
So something like this you mean? Of course accounting for there
Post by Al Docherty
being no old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum *
old_grad_param))
Post by Olivier Delalleau
I didn't look at the code and may be missing something, but it
seems to me all you need is add to your update dict: old_grad_param=gparam
(with one entry per param).
-=- Olivier
I'd dare say your implementation differs a lot from mine, so
much so I think it'd be very hard to hack in momentum your way without
rearranging a lot of the code (and in doing so possibly leave errors around)
I guess I'm stuck on this one. I could potentially add in the
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum *
old_grad))
But it's the establishing the old_grad I'm having trouble with.
The worst part is that, while I know how to get the gradients to print out
during training, I have no idea how to make them print out independently.
I.e. just printing them out after I've defined them.
Al
Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous
set of updates to your parameters, so that you can use these values when
calculating the next update. This gist
<https://gist.github.com/lzamparo/9400026> might help, it's
part of an SdA class (adapted from the Theano tutorial) that I modified to
use momentum and weight decay when performing parameter updates.
Hope this helps,
Lee.
On Thursday, March 6, 2014 11:25:01 AM UTC-5, Al Docherty
Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append
line. But how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the
Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from
For more options, visit https://groups.google.com/grou
ps/opt_out.
--
---
You received this message because you are subscribed to the
Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from
For more options, visit https://groups.google.com/groups/opt_out
.
--
---
You received this message because you are subscribed to the
Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout.
--
Pascal
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Frédéric Bastien
2015-06-18 21:55:06 UTC
Permalink
You can use the theano flag warn_float64=pdb to get into pdb to find where
this problem happen more rapidly.

Fred
Post by Pascal Lamblin
The problem is that your original variable (parameter or velocity)
is in single precision (float32), but the update is double precision
(float64).
You can use theano.printing.debugprint(..., print_type=True) to check
which operation introduced the precision bump.
Post by Abhishek Shivkumar
Hi,
When I use the code provided by Yifeng li, I get the following error.
Any
Post by Abhishek Shivkumar
idea how I can resolve it ?
TypeError: ('An update must have the same type as the original shared
variable (shared_var=W, shared_var.type=TensorType(float32, matrix),
update_val=Elemwise{add,no_inplace}.0,
update_val.type=TensorType(float64,
Post by Abhishek Shivkumar
matrix)).', 'If the difference is related to the broadcast pattern, you
can
Post by Abhishek Shivkumar
call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to
remove broadcastable dimensions.')
Post by Yifeng Li
Do not know why indent does not show up in the last email...see below
again
Post by Abhishek Shivkumar
Post by Yifeng Li
delta_before=[]
delta_before_i=theano.shared(value=numpy.zeros(param_i.get_value().shape))
Post by Abhishek Shivkumar
Post by Yifeng Li
delta_before.append(delta_before_i)
updates = []
alpha=0.01
for param_i, grad_i, delta_before_i in zip(params, grads,
delta_i=-learning_rate_shared * grad_i + alpha*delta_before_i
updates.append((param_i, param_i + delta_i ))
updates.append((delta_before_i,delta_i))
train_model = theano.function([index], cost, updates=updates,
givens={x: train_set_x[index * batch_size: (index + 1) * batch_size],
y: train_set_y[index * batch_size: (index + 1) * batch_size]})
Yifeng Li
Post by Yifeng Li
---------------------------------------------------
delta_before=[]
delta_before_i=theano.shared(value=numpy.zeros(param_i.get_value().shape))
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
delta_before.append(delta_before_i)
updates = []
alpha=0.01
for param_i, grad_i, delta_before_i in zip(params, grads,
delta_i=-learning_rate_shared * grad_i + alpha*delta_before_i
updates.append((param_i, param_i + delta_i ))
updates.append((delta_before_i,delta_i))
train_model = theano.function([index], cost, updates=updates,
givens={
x: train_set_x[index * batch_size: (index + 1) * batch_size],
y: train_set_y[index * batch_size: (index + 1) * batch_size]})
---------------------------------------------------
Yifeng Li
http://www.cmmt.ubc.ca/directory/faculty/yifeng-li
Post by David Chik
No I have not solved the problem yet.
Hopefully someone will provide a complete, working example.
D
Post by jasonfu
hello,
delta = 0.9 * oldparam_i - learning_rate * grad_i
updates.append((oldparam_i,delta))
updates.append((param_i, param_i - delta))
it says "ValueError: ('this shared variable already has an update
expression', (W, GpuFromHost.0))". is your problem solved ?
best,
Jason
圚 2014幎8月7日星期四UTC+8䞋午4时20分02秒David Chik写道
Post by David Chik
File "code/mlp_momentum.py", line 465, in <module>
test_mlp()
File "code/mlp_momentum.py", line 313, in test_mlp
y: train_set_y[index * batch_size:(index + 1) * batch_size]})
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/function.py",
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
line 223, in function
profile=profile)
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py",
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
line 490, in pfunc
no_default_updates=no_default_updates)
File
"/Users/david/anaconda/lib/python2.7/site-packages/theano/compile/pfunc.py",
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
line 198, in rebuild_collect_shared
(store_into, update_d[store_into]))
ValueError: ('this shared variable already has an update
expression',
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
(W, Elemwise{sub,no_inplace}.0))
Post by Al Docherty
I implemented momentum, and took the addition out of the append
updates = []
for param, gparam, oldparam in zip(classifier.params, gparams,
delta = eta * gparam + momentum * oldparam
updates.append((param, param - delta))
updates.append((oldparam, gparam))
Now it works, and the training runs more smoothly with less
oscillations in the error, as expected.
Thanks for the help guys!
Al
Post by Olivier Delalleau
Yes. Also, although I've almost never played with momentum
myself,
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
I'd expect the momentum term to be equal to the previous update
rather than
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
the previous gradient. Otherwise it seems to me it is pretty
much the same
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
as increasing the learning rate.
-=- Olivier
You're referring to param-eta * gparam ...
In a NN, we want to update the weights by subtracting (eta * the
gradient) + (momentum * old gradient) yes?
Presumably, this isn't what the code is currently doing. So the
sign
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
error is the way we are adding the momentum term in correct?
Post by Olivier Delalleau
Well, there's actually a sign error in the code below, that
might
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
be just that.
-=- Olivier
If you are confident that the implementation is good, then it
might
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
be because your hyperparameters aren't properly tuned, but
you'll have to
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
do that yourself.
Post by Al Docherty
Nevertheless, once implemented, the errors of my network shoot
up.
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Post by Al Docherty
On Thursday, 6 March 2014 18:22:26 UTC-5, Olivier Delalleau
Post by Olivier Delalleau
I shamelessly edited your quoted code below to hopefully get
something that works ;)
-=- Olivier
updates = []
for param, gparam, old_grad_params in
izip(classifier.params,
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
updates.append((param, param-eta * gparam + momentum *
old_grad_param))
for old_grad_param, gparam in
izip(classifier.old_grad_params,
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
updates.append((old_grad_param, gparam))
The content of classifier.old_grad_params would be a new set
of
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
shared variables with the same sizes as the parameters, but
initialized
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
with zeros.
So something like this you mean? Of course accounting for
there
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
Post by Al Docherty
being no old gradient at the start of training.
updates = []
old_grad_param = gparam
updates.append((param, param-eta * gparam + momentum *
old_grad_param))
On Thursday, 6 March 2014 17:42:51 UTC-5, Olivier Delalleau
Post by Olivier Delalleau
I didn't look at the code and may be missing something, but
it
old_grad_param=gparam
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
(with one entry per param).
-=- Olivier
a
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
I'd dare say your implementation differs a lot from mine, so
much so I think it'd be very hard to hack in momentum your
way without
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
rearranging a lot of the code (and in doing so possibly
leave errors around)
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
I guess I'm stuck on this one. I could potentially add in
the
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
updates.append((param, param-eta * gparam))
updates.append((param, (param - eta * gparam) + (momentum *
old_grad))
But it's the establishing the old_grad I'm having trouble
with.
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
The worst part is that, while I know how to get the
gradients to print out
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
during training, I have no idea how to make them print out
independently.
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
I.e. just printing them out after I've defined them.
Al
Post by Lee Zamparo
Hi Al,
As Arnaud suggests, you need to store or cache the previous
set of updates to your parameters, so that you can use
these values when
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
Post by Lee Zamparo
calculating the next update. This gist
<https://gist.github.com/lzamparo/9400026> might help,
it's
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
Post by Lee Zamparo
part of an SdA class (adapted from the Theano tutorial)
that I modified to
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
Post by Lee Zamparo
use momentum and weight decay when performing parameter
updates.
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
Post by Lee Zamparo
Hope this helps,
Lee.
On Thursday, March 6, 2014 11:25:01 AM UTC-5, Al Docherty
Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network
implementation. The gradients and updates are calculated
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the
updates.append
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
Post by Lee Zamparo
Post by Al Docherty
line. But how do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the
Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails
from
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
For more options, visit https://groups.google.com/grou
ps/opt_out.
--
---
You received this message because you are subscribed to the
Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from
For more options, visit
https://groups.google.com/groups/opt_out
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
Post by Al Docherty
.
--
---
You received this message because you are subscribed to the
Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from
it,
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Post by Al Docherty
Post by Olivier Delalleau
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the
Google
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Post by Al Docherty
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from
it,
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Post by Al Docherty
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the
Google
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from
it,
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Post by Olivier Delalleau
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the
Google
Post by Abhishek Shivkumar
Post by Yifeng Li
Post by Yifeng Li
Post by David Chik
Post by jasonfu
Post by David Chik
Post by Al Docherty
Post by Olivier Delalleau
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
Post by Abhishek Shivkumar
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
Pascal
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Beatriz G.
2017-07-18 09:32:37 UTC
Permalink
Hi everyone.

I would like to know what is momentum used for, I think that has something
to do with the weights updates, I have been reading information but I do
not understand at all. It has something to do with the dynamic learning
rate?

Regards.
Post by Al Docherty
Hello again,
I'm considering adding momentum to my neural network implementation. The
### OBTAIN PARAMETERS AND GRADIENTS
gparams = []
gparam = T.grad(printcost, param)
gparams.append(gparam)
### CALCULATE CHANGE IN WEIGHTS
updates = []
updates.append((param, param-eta * gparam))
I know I need to add the momentum term to the updates.append line. But how
do I store an old set of gradients?
Al
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...