Discussion:
[theano-users] Theano gradient of subtensor
Daniel Hernandez
2017-09-27 21:25:16 UTC
Permalink
Hi,

I was wondering if someone here had an answer to this unsolved question
over in stack overflow:

https://stackoverflow.com/questions/37545325/theano-gradient-of-subtensor

Basically, how do you compute gradients w.r.t. a subtensor?

The question arises in the context of large tensors, say Y and X, where it
is known that each entry in Y depends only on a small subset of the entries
of X. Taking T.grad(Y, X) is computationally expensive since it will
compute every possible gradient so one would like to be able to compute,
e.g. T.grad(Y, X[i]) . Here is some basic code illustrating the problem.

X = T.matrix()
Y = T.sum(X**2)

full_grad = T.grad(Y, X) # This works

X0 = X[0]
test = T.grad(Y, X0) # This pukes a Disconnected Input error

Silencing the Disconnected Input can be done in grad, but of course, that
doesn't solve anything, evaluating the gradients only results in a bunch of
0s. So, is there a way of taking these gradients with respect to a
subtensor?
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Pascal Lamblin
2017-09-28 19:32:10 UTC
Permalink
Maybe the following can help you.

http://deeplearning.net/software/theano/tutorial/faq_tutorial.html#how-to-update-a-subset-of-weights

Also, if you take a subtensor of the gradient itself, some optimizations
can apply that would avoid the computation of the full gradient.

For instance, with your example, the "subtensor" and "* 2" operations
Post by Daniel Hernandez
grad0 = full_grad[0]
g0 = theano.function([X, Y], grad0)
theano.printing.debugprint(g0)
Elemwise{mul,no_inplace} [id A] '' 1
|TensorConstant{(1,) of 2.0} [id B]
|Subtensor{int64} [id C] '' 0
|<TensorType(float64, matrix)> [id D]
|Constant{0} [id E]
Post by Daniel Hernandez
Hi,
I was wondering if someone here had an answer to this unsolved question
https://stackoverflow.com/questions/37545325/theano-gradient-of-subtensor
Basically, how do you compute gradients w.r.t. a subtensor?
The question arises in the context of large tensors, say Y and X, where
it is known that each entry in Y depends only on a small subset of the
entries of X. Taking T.grad(Y, X) is computationally expensive since it
will compute every possible gradient so one would like to be able to
compute, e.g. T.grad(Y, X[i]) . Here is some basic code illustrating the
problem.
X = T.matrix()
Y = T.sum(X**2)
full_grad = T.grad(Y, X) # This works
X0 = X[0]
test = T.grad(Y, X0) # This pukes a Disconnected Input error
Silencing the Disconnected Input can be done in grad, but of course,
that doesn't solve anything, evaluating the gradients only results in a
bunch of 0s. So, is there a way of taking these gradients with respect
to a subtensor?
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
Pascal Lamblin
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
dhern
2017-10-03 03:35:19 UTC
Permalink
Thanks for the reply.

Right, that method however seems to address the issue for gradients with
respect to shared variables. I am interested, as in the code above in
taking symbolic gradients with respect to subarrays of theano tensors. That
doesn't seem to be possible, correct?. I will look more closely into taking
a subtensor of the gradient, although I am not sure it reduces computation
time in my actual code, since that is what I did to begin with and it is
still very time consuming.
Post by Pascal Lamblin
Maybe the following can help you.
http://deeplearning.net/software/theano/tutorial/faq_tutorial.html#how-to-update-a-subset-of-weights
Also, if you take a subtensor of the gradient itself, some optimizations
can apply that would avoid the computation of the full gradient.
For instance, with your example, the "subtensor" and "* 2" operations
Post by Daniel Hernandez
grad0 = full_grad[0]
g0 = theano.function([X, Y], grad0)
theano.printing.debugprint(g0)
Elemwise{mul,no_inplace} [id A] '' 1
|TensorConstant{(1,) of 2.0} [id B]
|Subtensor{int64} [id C] '' 0
|<TensorType(float64, matrix)> [id D]
|Constant{0} [id E]
Post by Daniel Hernandez
Hi,
I was wondering if someone here had an answer to this unsolved question
https://stackoverflow.com/questions/37545325/theano-gradient-of-subtensor
Post by Daniel Hernandez
Basically, how do you compute gradients w.r.t. a subtensor?
The question arises in the context of large tensors, say Y and X, where
it is known that each entry in Y depends only on a small subset of the
entries of X. Taking T.grad(Y, X) is computationally expensive since it
will compute every possible gradient so one would like to be able to
compute, e.g. T.grad(Y, X[i]) . Here is some basic code illustrating the
problem.
X = T.matrix()
Y = T.sum(X**2)
full_grad = T.grad(Y, X) # This works
X0 = X[0]
test = T.grad(Y, X0) # This pukes a Disconnected Input error
Silencing the Disconnected Input can be done in grad, but of course,
that doesn't solve anything, evaluating the gradients only results in a
bunch of 0s. So, is there a way of taking these gradients with respect
to a subtensor?
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
Pascal Lamblin
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Frédéric Bastien
2017-10-11 12:32:21 UTC
Permalink
You need to take the subtensor in the forward to save all computation. It
is a very problem to remove useless computation due to a subtensor at the
end of the graph. We cover very few optimisation compared to what is
needed. So move the subtensor in the forward.

Fred
Post by dhern
Thanks for the reply.
Right, that method however seems to address the issue for gradients with
respect to shared variables. I am interested, as in the code above in
taking symbolic gradients with respect to subarrays of theano tensors. That
doesn't seem to be possible, correct?. I will look more closely into taking
a subtensor of the gradient, although I am not sure it reduces computation
time in my actual code, since that is what I did to begin with and it is
still very time consuming.
Post by Pascal Lamblin
Maybe the following can help you.
http://deeplearning.net/software/theano/tutorial/faq_tutorial.html#how-to-update-a-subset-of-weights
Also, if you take a subtensor of the gradient itself, some optimizations
can apply that would avoid the computation of the full gradient.
For instance, with your example, the "subtensor" and "* 2" operations
Post by Daniel Hernandez
grad0 = full_grad[0]
g0 = theano.function([X, Y], grad0)
theano.printing.debugprint(g0)
Elemwise{mul,no_inplace} [id A] '' 1
|TensorConstant{(1,) of 2.0} [id B]
|Subtensor{int64} [id C] '' 0
|<TensorType(float64, matrix)> [id D]
|Constant{0} [id E]
Post by Daniel Hernandez
Hi,
I was wondering if someone here had an answer to this unsolved question
https://stackoverflow.com/questions/37545325/theano-gradient-of-subtensor
Post by Daniel Hernandez
Basically, how do you compute gradients w.r.t. a subtensor?
The question arises in the context of large tensors, say Y and X, where
it is known that each entry in Y depends only on a small subset of the
entries of X. Taking T.grad(Y, X) is computationally expensive since it
will compute every possible gradient so one would like to be able to
compute, e.g. T.grad(Y, X[i]) . Here is some basic code illustrating
the
Post by Daniel Hernandez
problem.
X = T.matrix()
Y = T.sum(X**2)
full_grad = T.grad(Y, X) # This works
X0 = X[0]
test = T.grad(Y, X0) # This pukes a Disconnected Input error
Silencing the Disconnected Input can be done in grad, but of course,
that doesn't solve anything, evaluating the gradients only results in a
bunch of 0s. So, is there a way of taking these gradients with respect
to a subtensor?
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
Pascal Lamblin
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...