[theano-users] Gradient Problem (always 0)

Discussion:

Mohamed Akrout

2017-06-28 23:12:06 UTC

Hi all,

I am running a neuroscience with an recurrent neural network model with
Theano:

def rnn(u_t, x_tm1, r_tm1, Wrec):
x_t = ( (1 - alpha)*x_tm1 + alpha*(T.dot(r_tm1, Wrec ) + brec +
u_t[:,Nin:]) )
r_t = f_hidden(x_t)

then I define the scan function to iterate at each time step iteration

[x, r], _ = theano.scan(fn=rnn,
outputs_info=[x0_, f_hidden(x0_)],
sequences=u,
non_sequences=[Wrec])

Wrec and brec are learnt by stochastic gradient descent: g = T.grad(cost ,
[Wrec, brec])

where cost is the cost function: T.sum(f_loss(z, target[:,:,:Nout])) with z
= f_output(T.dot(r, Wout_.T) + bout )

Until now, everything works good.

Now I want to add two new vectors, let's call them u and v so that the
initial rnn function becomes:

def rnn(u_t, x_tm1, r_tm1, Wrec, *u, v*):
x_t = ( (1 - alpha)*x_tm1 + alpha*(T.dot(r_tm1, Wrec + *T.dot(u,
v)* ) + brec + u_t[:,Nin:]) )
r_t = f_hidden(x_t)

[x, r], _ = theano.scan(fn=rnn,
outputs_info=[x0_, f_hidden(x0_)],
sequences=u,
non_sequences=[Wrec,* m, n*])

m and n are the variables corresponding to u and v in the main function.

and suddenly, the gradient T.grad(cost, m) and T.grad(cost, n) are zeros

I am blocked since 2 weeks now on this problem. I verified that the values
are not integer by using dtype=theano.config.floatX every where in the
definition of the variables.

As you can see the link between the cost and m (or n) is: the cost function
depends on z, and z depends on r and r is one of the outputs of the rnn
function that uses m and n in the equation.

Do you have any ideas why this does not work ?

Any idea is welcome. I hope I can unblock this problem soon.
Thank you!

--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Frédéric Bastien

2017-06-29 12:34:08 UTC

Permalink

I don't know, but you can use theano.printing.debugprint([cost, grads...])

To see the gradient function. Maybe it will help you understand what is
going on.

Don't forget m and n are non sequence. This mean the dot will be lifted out
of the loop by Theano. Only the extra addition will be done at each
iterations.

Fred

Post by Mohamed Akrout
Hi all,
I am running a neuroscience with an recurrent neural network model with
x_t = ( (1 - alpha)*x_tm1 + alpha*(T.dot(r_tm1, Wrec ) + brec +
u_t[:,Nin:]) )
r_t = f_hidden(x_t)
then I define the scan function to iterate at each time step iteration
[x, r], _ = theano.scan(fn=rnn,
outputs_info=[x0_, f_hidden(x0_)],
sequences=u,
non_sequences=[Wrec])
Wrec and brec are learnt by stochastic gradient descent: g = T.grad(cost ,
[Wrec, brec])
where cost is the cost function: T.sum(f_loss(z, target[:,:,:Nout])) with
z = f_output(T.dot(r, Wout_.T) + bout )
Until now, everything works good.
Now I want to add two new vectors, let's call them u and v so that the
x_t = ( (1 - alpha)*x_tm1 + alpha*(T.dot(r_tm1, Wrec + *T.dot(u,
v)* ) + brec + u_t[:,Nin:]) )
r_t = f_hidden(x_t)
[x, r], _ = theano.scan(fn=rnn,
outputs_info=[x0_, f_hidden(x0_)],
sequences=u,
non_sequences=[Wrec,* m, n*])
m and n are the variables corresponding to u and v in the main function.
and suddenly, the gradient T.grad(cost, m) and T.grad(cost, n) are zeros
I am blocked since 2 weeks now on this problem. I verified that the values
are not integer by using dtype=theano.config.floatX every where in the
definition of the variables.
As you can see the link between the cost and m (or n) is: the cost
function depends on z, and z depends on r and r is one of the outputs of
the rnn function that uses m and n in the equation.
Do you have any ideas why this does not work ?
Any idea is welcome. I hope I can unblock this problem soon.
Thank you!
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

Mohamed Akrout

2017-06-29 13:05:12 UTC

Permalink

Yes I printed the gradient function of m but it is extremely big. I find it
unreadable (file attached). I don't know how this tree will help me find
the problem. There are nodes who are Alloc and second but I don't know how
to change and/or control them.

When you say: "Only the extra addition will be done at each iterations",
about which extra addition are you talking?

Thank you Fred.

Med

Regarding your notice, if m and n are non sequence, Theano will not updat

Post by FrÃ©dÃ©ric Bastien
I don't know, but you can use theano.printing.debugprint([cost, grads...])
To see the gradient function. Maybe it will help you understand what is
going on.
Don't forget m and n are non sequence. This mean the dot will be lifted
out of the loop by Theano. Only the extra addition will be done at each
iterations.
Fred

Post by Mohamed Akrout
Hi all,
I am running a neuroscience with an recurrent neural network model with
x_t = ( (1 - alpha)*x_tm1 + alpha*(T.dot(r_tm1, Wrec ) + brec +
u_t[:,Nin:]) )
r_t = f_hidden(x_t)
then I define the scan function to iterate at each time step iteration
[x, r], _ = theano.scan(fn=rnn,
outputs_info=[x0_, f_hidden(x0_)],
sequences=u,
non_sequences=[Wrec])
Wrec and brec are learnt by stochastic gradient descent: g = T.grad(cost
, [Wrec, brec])
where cost is the cost function: T.sum(f_loss(z, target[:,:,:Nout])) with
z = f_output(T.dot(r, Wout_.T) + bout )
Until now, everything works good.
Now I want to add two new vectors, let's call them u and v so that the
x_t = ( (1 - alpha)*x_tm1 + alpha*(T.dot(r_tm1, Wrec + *T.dot(u,
v)* ) + brec + u_t[:,Nin:]) )
r_t = f_hidden(x_t)
[x, r], _ = theano.scan(fn=rnn,
outputs_info=[x0_, f_hidden(x0_)],
sequences=u,
non_sequences=[Wrec,* m, n*])
m and n are the variables corresponding to u and v in the main function.
and suddenly, the gradient T.grad(cost, m) and T.grad(cost, n) are zeros
I am blocked since 2 weeks now on this problem. I verified that the
values are not integer by using dtype=theano.config.floatX every where in
the definition of the variables.
As you can see the link between the cost and m (or n) is: the cost
function depends on z, and z depends on r and r is one of the outputs of
the rnn function that uses m and n in the equation.
Do you have any ideas why this does not work ?
Any idea is welcome. I hope I can unblock this problem soon.
Thank you!
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

Frédéric Bastien

2017-06-29 13:22:19 UTC

Permalink

The + of the + T.dot(u, v).

The debugprint command I gave you will help separate the forward
computation from the grad computation.

The grad of a dot is a another dot. So what would explain a 0 outputs would
be too many or only zeros in the inputs. Can you very the values of m and
n? Make sure there is no zeros in them.

Post by Mohamed Akrout
Yes I printed the gradient function of m but it is extremely big. I find
it unreadable (file attached). I don't know how this tree will help me find
the problem. There are nodes who are Alloc and second but I don't know how
to change and/or control them.
When you say: "Only the extra addition will be done at each iterations",
about which extra addition are you talking?
Thank you Fred.
Med
Regarding your notice, if m and n are non sequence, Theano will not updat

Hi all,

Post by FrÃ©dÃ©ric Bastien

Post by Mohamed Akrout
I am running a neuroscience with an recurrent neural network model with
x_t = ( (1 - alpha)*x_tm1 + alpha*(T.dot(r_tm1, Wrec ) + brec +
u_t[:,Nin:]) )
r_t = f_hidden(x_t)
then I define the scan function to iterate at each time step iteration
[x, r], _ = theano.scan(fn=rnn,
outputs_info=[x0_, f_hidden(x0_)],
sequences=u,
non_sequences=[Wrec])
Wrec and brec are learnt by stochastic gradient descent: g = T.grad(cost
, [Wrec, brec])
where cost is the cost function: T.sum(f_loss(z, target[:,:,:Nout]))
with z = f_output(T.dot(r, Wout_.T) + bout )
Until now, everything works good.
Now I want to add two new vectors, let's call them u and v so that the
x_t = ( (1 - alpha)*x_tm1 + alpha*(T.dot(r_tm1, Wrec + *T.dot(u,
v)* ) + brec + u_t[:,Nin:]) )
r_t = f_hidden(x_t)
[x, r], _ = theano.scan(fn=rnn,
outputs_info=[x0_, f_hidden(x0_)],
sequences=u,
non_sequences=[Wrec,* m, n*])
m and n are the variables corresponding to u and v in the main function.
and suddenly, the gradient T.grad(cost, m) and T.grad(cost, n) are zeros
I am blocked since 2 weeks now on this problem. I verified that the
values are not integer by using dtype=theano.config.floatX every where in
the definition of the variables.
As you can see the link between the cost and m (or n) is: the cost
function depends on z, and z depends on r and r is one of the outputs of
the rnn function that uses m and n in the equation.
Do you have any ideas why this does not work ?
Any idea is welcome. I hope I can unblock this problem soon.
Thank you!
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.

To unsubscribe from this group and stop receiving emails from it, send an

Post by Mohamed Akrout
For more options, visit https://groups.google.com/d/optout.

---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

Frédéric Bastien

2017-06-29 13:23:44 UTC

Permalink

You can also add names to your intermediate variables. theano.grad() will
use them to create names for the grads nodes. This will help you understand
what is going on. Maybe the debugprint parameter stop_on_name=True could
also help make that graph more readable.

Post by FrÃ©dÃ©ric Bastien
The + of the + T.dot(u, v).
The debugprint command I gave you will help separate the forward
computation from the grad computation.
The grad of a dot is a another dot. So what would explain a 0 outputs
would be too many or only zeros in the inputs. Can you very the values of m
and n? Make sure there is no zeros in them.

Hi all,

Post by FrÃ©dÃ©ric Bastien

Post by Mohamed Akrout
I am running a neuroscience with an recurrent neural network model with
x_t = ( (1 - alpha)*x_tm1 + alpha*(T.dot(r_tm1, Wrec ) + brec
+ u_t[:,Nin:]) )
r_t = f_hidden(x_t)
then I define the scan function to iterate at each time step iteration
[x, r], _ = theano.scan(fn=rnn,
outputs_info=[x0_, f_hidden(x0_)],
sequences=u,
non_sequences=[Wrec])
Wrec and brec are learnt by stochastic gradient descent: g =
T.grad(cost , [Wrec, brec])
where cost is the cost function: T.sum(f_loss(z, target[:,:,:Nout]))
with z = f_output(T.dot(r, Wout_.T) + bout )
Until now, everything works good.
Now I want to add two new vectors, let's call them u and v so that the
x_t = ( (1 - alpha)*x_tm1 + alpha*(T.dot(r_tm1, Wrec + *T.dot(u,
v)* ) + brec + u_t[:,Nin:]) )
r_t = f_hidden(x_t)
[x, r], _ = theano.scan(fn=rnn,
outputs_info=[x0_, f_hidden(x0_)],
sequences=u,
non_sequences=[Wrec,* m, n*])
m and n are the variables corresponding to u and v in the main function.
and suddenly, the gradient T.grad(cost, m) and T.grad(cost, n) are zeros
I am blocked since 2 weeks now on this problem. I verified that the
values are not integer by using dtype=theano.config.floatX every where in
the definition of the variables.
As you can see the link between the cost and m (or n) is: the cost
function depends on z, and z depends on r and r is one of the outputs of
the rnn function that uses m and n in the equation.
Do you have any ideas why this does not work ?
Any idea is welcome. I hope I can unblock this problem soon.
Thank you!
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.

To unsubscribe from this group and stop receiving emails from it, send

Post by Mohamed Akrout
For more options, visit https://groups.google.com/d/optout.

Mohamed Akrout

2017-06-29 14:36:26 UTC

Permalink

Yes I changed the values of m and n by initialising them with different
distributions or randomly.

I changed the "+" to theano.tensor.sum --> x_t = ( (1 - alpha)*x_tm1 +
alpha*(T.dot(r_tm1, T.sum(Wrec, T.dot(u, v)) ) + brec + u_t[:,Nin:]) )

But this does not work as well and gives the following error:
TypeError: TensorType does not support iteration. Maybe you are using
builtin.sum instead of theano.tensor.sum? (Maybe .max?)

I never thought that the fact that T.dot is one argument of another T.dot
could be problematic.
Until now I am blocked, if I find the solution I will tell you what it is :(

Med

Post by FrÃ©dÃ©ric Bastien
You can also add names to your intermediate variables. theano.grad() will
use them to create names for the grads nodes. This will help you understand
what is going on. Maybe the debugprint parameter stop_on_name=True could
also help make that graph more readable.

Hi all,

Post by FrÃ©dÃ©ric Bastien

Post by Mohamed Akrout
I am running a neuroscience with an recurrent neural network model
x_t = ( (1 - alpha)*x_tm1 + alpha*(T.dot(r_tm1, Wrec ) + brec
+ u_t[:,Nin:]) )
r_t = f_hidden(x_t)
then I define the scan function to iterate at each time step iteration
[x, r], _ = theano.scan(fn=rnn,
outputs_info=[x0_, f_hidden(x0_)],
sequences=u,
non_sequences=[Wrec])
Wrec and brec are learnt by stochastic gradient descent: g =
T.grad(cost , [Wrec, brec])
where cost is the cost function: T.sum(f_loss(z, target[:,:,:Nout]))
with z = f_output(T.dot(r, Wout_.T) + bout )
Until now, everything works good.
Now I want to add two new vectors, let's call them u and v so that the
x_t = ( (1 - alpha)*x_tm1 + alpha*(T.dot(r_tm1, Wrec + *T.dot(u,
v)* ) + brec + u_t[:,Nin:]) )
r_t = f_hidden(x_t)
[x, r], _ = theano.scan(fn=rnn,
outputs_info=[x0_, f_hidden(x0_)],
sequences=u,
non_sequences=[Wrec,* m, n*])
m and n are the variables corresponding to u and v in the main function.
and suddenly, the gradient T.grad(cost, m) and T.grad(cost, n) are zeros
I am blocked since 2 weeks now on this problem. I verified that the
values are not integer by using dtype=theano.config.floatX every where in
the definition of the variables.
As you can see the link between the cost and m (or n) is: the cost
function depends on z, and z depends on r and r is one of the outputs of
the rnn function that uses m and n in the equation.
Do you have any ideas why this does not work ?
Any idea is welcome. I hope I can unblock this problem soon.
Thank you!
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.

To unsubscribe from this group and stop receiving emails from it, send

Post by Mohamed Akrout
For more options, visit https://groups.google.com/d/optout.

---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.

Frédéric Bastien

2017-06-30 13:23:30 UTC

Permalink

You can try MonitorMode to step through all the computation of the function:

http://deeplearning.net/software/theano/tutorial/debug_faq.html#how-do-i-step-through-a-compiled-function

This will help you see what is going on during the execution. If you add
names to your u, v and the output of dot, you will have access to them
during the executino to help see what is going on.

u.name = 'u'
v.name = 'v'
d = T.dot(u,v)
d.name='my_dot'

then use d in the code.

Post by Mohamed Akrout
Yes I changed the values of m and n by initialising them with different
distributions or randomly.
I changed the "+" to theano.tensor.sum --> x_t = ( (1 - alpha)*x_tm1 +
alpha*(T.dot(r_tm1, T.sum(Wrec, T.dot(u, v)) ) + brec + u_t[:,Nin:]) )
TypeError: TensorType does not support iteration. Maybe you are using
builtin.sum instead of theano.tensor.sum? (Maybe .max?)
I never thought that the fact that T.dot is one argument of another T.dot
could be problematic.
Until now I am blocked, if I find the solution I will tell you what it is :(
Med

The + of the + T.dot(u, v).

Post by FrÃ©dÃ©ric Bastien
The debugprint command I gave you will help separate the forward
computation from the grad computation.
The grad of a dot is a another dot. So what would explain a 0 outputs
would be too many or only zeros in the inputs. Can you very the values of m
and n? Make sure there is no zeros in them.

Post by Mohamed Akrout
Yes I printed the gradient function of m but it is extremely big. I
find it unreadable (file attached). I don't know how this tree will help me
find the problem. There are nodes who are Alloc and second but I don't know
how to change and/or control them.
When you say: "Only the extra addition will be done at each
iterations", about which extra addition are you talking?
Thank you Fred.
Med
Regarding your notice, if m and n are non sequence, Theano will not updat

Post by FrÃ©dÃ©ric Bastien
I don't know, but you can use theano.printing.debugprint([cost, grads...])
To see the gradient function. Maybe it will help you understand what
is going on.
Don't forget m and n are non sequence. This mean the dot will be
lifted out of the loop by Theano. Only the extra addition will be done at
each iterations.
Fred

Hi all,

Post by Mohamed Akrout
I am running a neuroscience with an recurrent neural network model
x_t = ( (1 - alpha)*x_tm1 + alpha*(T.dot(r_tm1, Wrec ) +
brec + u_t[:,Nin:]) )
r_t = f_hidden(x_t)
then I define the scan function to iterate at each time step iteration
[x, r], _ = theano.scan(fn=rnn,
outputs_info=[x0_, f_hidden(x0_)],
sequences=u,
non_sequences=[Wrec])
Wrec and brec are learnt by stochastic gradient descent: g =
T.grad(cost , [Wrec, brec])
where cost is the cost function: T.sum(f_loss(z, target[:,:,:Nout]))
with z = f_output(T.dot(r, Wout_.T) + bout )
Until now, everything works good.
Now I want to add two new vectors, let's call them u and v so that
x_t = ( (1 - alpha)*x_tm1 + alpha*(T.dot(r_tm1, Wrec + *T.dot(u,
v)* ) + brec + u_t[:,Nin:]) )
r_t = f_hidden(x_t)
[x, r], _ = theano.scan(fn=rnn,
outputs_info=[x0_, f_hidden(x0_)],
sequences=u,
non_sequences=[Wrec,* m, n*])
m and n are the variables corresponding to u and v in the main function.
and suddenly, the gradient T.grad(cost, m) and T.grad(cost, n) are zeros
I am blocked since 2 weeks now on this problem. I verified that the
values are not integer by using dtype=theano.config.floatX every where in
the definition of the variables.
As you can see the link between the cost and m (or n) is: the cost
function depends on z, and z depends on r and r is one of the outputs of
the rnn function that uses m and n in the equation.
Do you have any ideas why this does not work ?
Any idea is welcome. I hope I can unblock this problem soon.
Thank you!
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.

To unsubscribe from this group and stop receiving emails from it, send

Post by Mohamed Akrout
For more options, visit https://groups.google.com/d/optout.

Pascal Lamblin

2017-06-30 22:58:49 UTC

Permalink

I'm assuming m and n were defined as T.vector(), and that the last line of
the "def rnn(...)" functions is actually "return x_t, r_t", is that correct?

Do you have a non-zero gradient for Wrec?
Can you monitor the value of theano.grad(cost, Wrec).sum() is?
Normally, the sum of the gradient wrt Wrec should be equal to the gradient
wrt dot(u, v). So if the gradient wrt Wrec is not zero everywhere, but its
sum is zero, then it would explain that result.
If we manually backprop, then we can see that the gradient of the cost wrt
u is equivalent to grad(cost, Wrec).sum() * v (and the gradient wrt v
should be equivalent to grad(cost, Wrec).sum() * u). Can you monitor those
values?