[theano-users] how to adjust learning rate

Discussion:

Ji Qiujia

2017-05-28 02:03:41 UTC

Recently I am doing mnist image classification using resnet. And I found
something strange, or interesting. First, though it's usually said that we
should do early stopping, I found it's always better to run more epochs
with the initial learning rate, which I set to 0.1 or 0.01, and then
downscale learning rate quickly. For example, my learning rate strategy is
to begin with 0.1 and is scaled down by 0.1 at the 200th, 210th, 220th
epoch with batchsize of 64 and totally 230 epochs. I also found the last
downscaling of learning rate usually degrade performance. Am I doing
anything wrong?You are welcomed to share your parameter adjusting
experience.

--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alexander Botev

2017-05-31 17:44:09 UTC

Permalink

Depends what kind of performance are you measuring it on and what optimizer
you are using?
Is it training or validation/test performance and are you using any
adaptive method (RMSProp, Adam etc..)?

Post by Ji Qiujia
Recently I am doing mnist image classification using resnet. And I found
something strange, or interesting. First, though it's usually said that we
should do early stopping, I found it's always better to run more epochs
with the initial learning rate, which I set to 0.1 or 0.01, and then
downscale learning rate quickly. For example, my learning rate strategy is
to begin with 0.1 and is scaled down by 0.1 at the 200th, 210th, 220th
epoch with batchsize of 64 and totally 230 epochs. I also found the last
downscaling of learning rate usually degrade performance. Am I doing
anything wrong?You are welcomed to share your parameter adjusting
experience.

Ramana Subramanyam

2017-06-20 12:28:23 UTC

Permalink

Hi,
Sorry it's a bit late, but for anyone coming to this thread in the future,
you can have a look at JÃ€n Schutler's gist
https://gist.github.com/f0k/f3190ebba6c53887d598d03119ca2066#file-wgan_mnist-py-L283-L285 .
There are many different ways to do decay, but they are just a small edit
to Jan's code

Ramana

Post by Alexander Botev
Depends what kind of performance are you measuring it on and what
optimizer you are using?
Is it training or validation/test performance and are you using any
adaptive method (RMSProp, Adam etc..)?