Gradient with respect to subtensor

Discussion:

Justin Brody

2014-06-28 23:01:46 UTC

Hello,
I've been trying for many days to properly understand how shared variables
and symbolic variables interact in Theano, but sadly I don't think I'm
there. My ignorance is quite probably reflected in this question but I
would still be very grateful for any guidance.

I'm trying to implement a "deconvolutional network"; specifically I have a
3-tensor of inputs (each input is a 2D image) and a 4-tensor of codes; for
the ith input codes[i] represents a set of codewords which together code
for input i.

I've been having a lot of trouble figuring out how to do gradient descent
on the codewords. Here are the relevant parts of my code:

codes = shared(initial_codes, name="codes") # Shared 4-tensor w/ dims (input #, code #, row #, col #)
idx = T.lscalar()
pre_loss_conv = conv2d(input = codes[idx].dimshuffle('x', 0, 1,2),
filters = dicts.dimshuffle('x', 0,1, 2),
border_mode = 'valid')
loss_conv = pre_loss_conv.reshape((pre_loss_conv.shape[2], pre_loss_conv.shape[3]))
loss_in = inputs[idx]
loss = T.sum(1./2.*(loss_in - loss_conv)**2)

del_codes = T.grad(loss, codes[idx])
delc_fn = function([idx], del_codes)
train_codes = function([input_index], loss, updates = [
[codes, T.set_subtensor(codes[input_index], codes[input_index] -
learning_rate*del_codes[input_index]) ]])

(here codes and dicts are shared tensor variables). Theano is unhappy with
this, specifically with defining

del_codes = T.grad(loss, codes[idx])

The error message I'm getting is: *theano.gradient.DisconnectedInputError:
grad method was asked to compute the gradient with respect to a variable
that is not part of the computational graph of the cost, or is used only by
a non-differentiable operator: Subtensor{int64}.0*

I'm guessing that it wants a symbolic variable instead of codes[idx]; but
then I'm not sure how to get everything connected to get the intended
effect. I'm guessing I'll need to change the final line to something like

learning_rate*del_codes) ]])

Can someone give me some pointers as to how to define this function
properly? I think I'm probably missing something basic about working with
Theano but I'm not sure what.

Thanks in advance!

-Justin

--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/***@public.gmane.org
For more options, visit https://groups.google.com/d/optout.

Justin Brody

2014-06-29 01:20:41 UTC

Permalink

For whatever it's worth, I think my fundamental misunderstanding is about
how symbolic variables get "bound". For example, I tried changing my code
to:
current_codes = T.tensor3('current_codes')
del_codes = T.grad(loss, current_codes)
delc_fn = function([idx], del_codes, givens = [ [current_codes,
codes[idx] ]])

and then calling the delc_fn in the updates part of my training function.
Theano complains that current_codes is not part of the computational graph
of loss. In my mind, it will *become* part of the computational graph when
it gets bound to codes[idx]. So this is the tension I'm having trouble
resolving: I want to use the code I just wrote to get the loss defined
with respect to a specific variable (rather than a subtensor of a specific
variable) but I want to use T.grad(loss, codes[idx]) to express what I'm
really trying to do.

Post by Justin Brody
Hello,
I've been trying for many days to properly understand how shared variables
and symbolic variables interact in Theano, but sadly I don't think I'm
there. My ignorance is quite probably reflected in this question but I
would still be very grateful for any guidance.
I'm trying to implement a "deconvolutional network"; specifically I have a
3-tensor of inputs (each input is a 2D image) and a 4-tensor of codes; for
the ith input codes[i] represents a set of codewords which together code
for input i.
I've been having a lot of trouble figuring out how to do gradient descent
codes = shared(initial_codes, name="codes") # Shared 4-tensor w/ dims (input #, code #, row #, col #)
idx = T.lscalar()
pre_loss_conv = conv2d(input = codes[idx].dimshuffle('x', 0, 1,2),
filters = dicts.dimshuffle('x', 0,1, 2),
border_mode = 'valid')
loss_conv = pre_loss_conv.reshape((pre_loss_conv.shape[2], pre_loss_conv.shape[3]))
loss_in = inputs[idx]
loss = T.sum(1./2.*(loss_in - loss_conv)**2)
del_codes = T.grad(loss, codes[idx])
delc_fn = function([idx], del_codes)
train_codes = function([input_index], loss, updates = [
[codes, T.set_subtensor(codes[input_index], codes[input_index] -
learning_rate*del_codes[input_index]) ]])
(here codes and dicts are shared tensor variables). Theano is unhappy
with this, specifically with defining
del_codes = T.grad(loss, codes[idx])
grad method was asked to compute the gradient with respect to a variable
that is not part of the computational graph of the cost, or is used only by
a non-differentiable operator: Subtensor{int64}.0*
I'm guessing that it wants a symbolic variable instead of codes[idx]; but
then I'm not sure how to get everything connected to get the intended
effect. I'm guessing I'll need to change the final line to something like
learning_rate*del_codes) ]])
Can someone give me some pointers as to how to define this function
properly? I think I'm probably missing something basic about working with
Theano but I'm not sure what.
Thanks in advance!
-Justin

Justin Brody

2014-06-30 17:09:46 UTC

Permalink

In case anyone else has similar problems, I got a good answer on
stackoverflow:
http://stackoverflow.com/questions/24468482/defining-a-gradient-with-respect-to-a-subtensor-in-theano

Post by Justin Brody
For whatever it's worth, I think my fundamental misunderstanding is about
how symbolic variables get "bound". For example, I tried changing my code
current_codes = T.tensor3('current_codes')
del_codes = T.grad(loss, current_codes)
delc_fn = function([idx], del_codes, givens = [ [current_codes,
codes[idx] ]])
and then calling the delc_fn in the updates part of my training function.
Theano complains that current_codes is not part of the computational graph
of loss. In my mind, it will *become* part of the computational graph
when it gets bound to codes[idx]. So this is the tension I'm having
trouble resolving: I want to use the code I just wrote to get the loss
defined with respect to a specific variable (rather than a subtensor of a
specific variable) but I want to use T.grad(loss, codes[idx]) to express
what I'm really trying to do.

Post by Justin Brody
Hello,
I've been trying for many days to properly understand how shared
variables and symbolic variables interact in Theano, but sadly I don't
think I'm there. My ignorance is quite probably reflected in this
question but I would still be very grateful for any guidance.
I'm trying to implement a "deconvolutional network"; specifically I have
a 3-tensor of inputs (each input is a 2D image) and a 4-tensor of codes;
for the ith input codes[i] represents a set of codewords which together
code for input i.
I've been having a lot of trouble figuring out how to do gradient descent
codes = shared(initial_codes, name="codes") # Shared 4-tensor w/ dims (input #, code #, row #, col #)
idx = T.lscalar()
pre_loss_conv = conv2d(input = codes[idx].dimshuffle('x', 0, 1,2),
filters = dicts.dimshuffle('x', 0,1, 2),
border_mode = 'valid')
loss_conv = pre_loss_conv.reshape((pre_loss_conv.shape[2], pre_loss_conv.shape[3]))
loss_in = inputs[idx]
loss = T.sum(1./2.*(loss_in - loss_conv)**2)
del_codes = T.grad(loss, codes[idx])
delc_fn = function([idx], del_codes)
train_codes = function([input_index], loss, updates = [
[codes, T.set_subtensor(codes[input_index], codes[input_index] -
learning_rate*del_codes[input_index]) ]])
(here codes and dicts are shared tensor variables). Theano is unhappy
with this, specifically with defining
del_codes = T.grad(loss, codes[idx])
grad method was asked to compute the gradient with respect to a variable
that is not part of the computational graph of the cost, or is used only by
a non-differentiable operator: Subtensor{int64}.0*
I'm guessing that it wants a symbolic variable instead of codes[idx]; but
then I'm not sure how to get everything connected to get the intended
effect. I'm guessing I'll need to change the final line to something like
learning_rate*del_codes) ]])
Can someone give me some pointers as to how to define this function
properly? I think I'm probably missing something basic about working with
Theano but I'm not sure what.
Thanks in advance!
-Justin

Olivier Delalleau

2014-07-03 01:31:37 UTC

Permalink

When you write
grad(loss, codes[idx])
the codes[idx] statement is creating a new symbolic variable that you have
not used yet anywhere in the computational graph of loss, which is why
Theano complains.

The correct way to write it is grad(loss, codes)[idx] (and if everything
goes well Theano will be able to figure out by itself if it can avoid
computing the full grad(loss, codes))

-=- Olivier

Post by Justin Brody
In case anyone else has similar problems, I got a good answer on
http://stackoverflow.com/questions/24468482/defining-a-gradient-with-respect-to-a-subtensor-in-theano

Post by Justin Brody
For whatever it's worth, I think my fundamental misunderstanding is about
how symbolic variables get "bound". For example, I tried changing my code
current_codes = T.tensor3('current_codes')
del_codes = T.grad(loss, current_codes)
delc_fn = function([idx], del_codes, givens = [ [current_codes,
codes[idx] ]])
and then calling the delc_fn in the updates part of my training
function. Theano complains that current_codes is not part of the
computational graph of loss. In my mind, it will *become* part of the
computational graph when it gets bound to codes[idx]. So this is the
to get the loss defined with respect to a specific variable (rather than a
subtensor of a specific variable) but I want to use T.grad(loss,
codes[idx]) to express what I'm really trying to do.

Post by Justin Brody
Hello,
I've been trying for many days to properly understand how shared
variables and symbolic variables interact in Theano, but sadly I don't
think I'm there. My ignorance is quite probably reflected in this
question but I would still be very grateful for any guidance.
I'm trying to implement a "deconvolutional network"; specifically I have
a 3-tensor of inputs (each input is a 2D image) and a 4-tensor of codes;
for the ith input codes[i] represents a set of codewords which together
code for input i.
I've been having a lot of trouble figuring out how to do gradient
codes = shared(initial_codes, name="codes") # Shared 4-tensor w/ dims (input #, code #, row #, col #)
idx = T.lscalar()
pre_loss_conv = conv2d(input = codes[idx].dimshuffle('x', 0, 1,2),
filters = dicts.dimshuffle('x', 0,1, 2),
border_mode = 'valid')
loss_conv = pre_loss_conv.reshape((pre_loss_conv.shape[2], pre_loss_conv.shape[3]))
loss_in = inputs[idx]
loss = T.sum(1./2.*(loss_in - loss_conv)**2)
del_codes = T.grad(loss, codes[idx])
delc_fn = function([idx], del_codes)
train_codes = function([input_index], loss, updates = [
[codes, T.set_subtensor(codes[input_index], codes[input_index] -
learning_rate*del_codes[input_index]) ]])
(here codes and dicts are shared tensor variables). Theano is unhappy
with this, specifically with defining
del_codes = T.grad(loss, codes[idx])
grad method was asked to compute the gradient with respect to a variable
that is not part of the computational graph of the cost, or is used only by
a non-differentiable operator: Subtensor{int64}.0*
I'm guessing that it wants a symbolic variable instead of codes[idx];
but then I'm not sure how to get everything connected to get the intended
effect. I'm guessing I'll need to change the final line to something like
learning_rate*del_codes) ]])
Can someone give me some pointers as to how to define this function
properly? I think I'm probably missing something basic about working with
Theano but I'm not sure what.
Thanks in advance!
-Justin

---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

Qiang Cui

2017-07-20 16:02:36 UTC

Permalink

Hi, Olivier,

Thank you very much.

' grad(loss, codes)[idx] ' solves my problem. It is the right solution to
only update the subtensor. It avoids computing the whole gradients for each
item in codes.

Post by Olivier Delalleau
When you write
grad(loss, codes[idx])
the codes[idx] statement is creating a new symbolic variable that you have
not used yet anywhere in the computational graph of loss, which is why
Theano complains.
The correct way to write it is grad(loss, codes)[idx] (and if everything
goes well Theano will be able to figure out by itself if it can avoid
computing the full grad(loss, codes))
-=- Olivier

Post by Justin Brody
In case anyone else has similar problems, I got a good answer on
http://stackoverflow.com/questions/24468482/defining-a-gradient-with-respect-to-a-subtensor-in-theano

Post by Justin Brody
For whatever it's worth, I think my fundamental misunderstanding is
about how symbolic variables get "bound". For example, I tried changing my
current_codes = T.tensor3('current_codes')
del_codes = T.grad(loss, current_codes)
delc_fn = function([idx], del_codes, givens = [ [current_codes,
codes[idx] ]])
and then calling the delc_fn in the updates part of my training
function. Theano complains that current_codes is not part of the
computational graph of loss. In my mind, it will *become* part of the
computational graph when it gets bound to codes[idx]. So this is the
to get the loss defined with respect to a specific variable (rather than a
subtensor of a specific variable) but I want to use T.grad(loss,
codes[idx]) to express what I'm really trying to do.

Post by Justin Brody
Hello,
I've been trying for many days to properly understand how shared
variables and symbolic variables interact in Theano, but sadly I don't
think I'm there. My ignorance is quite probably reflected in this
question but I would still be very grateful for any guidance.
I'm trying to implement a "deconvolutional network"; specifically I
have a 3-tensor of inputs (each input is a 2D image) and a 4-tensor of
codes; for the ith input codes[i] represents a set of codewords which
together code for input i.
I've been having a lot of trouble figuring out how to do gradient
codes = shared(initial_codes, name="codes") # Shared 4-tensor w/ dims (input #, code #, row #, col #)
idx = T.lscalar()
pre_loss_conv = conv2d(input = codes[idx].dimshuffle('x', 0, 1,2),
filters = dicts.dimshuffle('x', 0,1, 2),
border_mode = 'valid')
loss_conv = pre_loss_conv.reshape((pre_loss_conv.shape[2], pre_loss_conv.shape[3]))
loss_in = inputs[idx]
loss = T.sum(1./2.*(loss_in - loss_conv)**2)
del_codes = T.grad(loss, codes[idx])
delc_fn = function([idx], del_codes)
train_codes = function([input_index], loss, updates = [
[codes, T.set_subtensor(codes[input_index], codes[input_index] -
learning_rate*del_codes[input_index]) ]])
(here codes and dicts are shared tensor variables). Theano is unhappy
with this, specifically with defining
del_codes = T.grad(loss, codes[idx])
grad method was asked to compute the gradient with respect to a variable
that is not part of the computational graph of the cost, or is used only by
a non-differentiable operator: Subtensor{int64}.0*
I'm guessing that it wants a symbolic variable instead of codes[idx];
but then I'm not sure how to get everything connected to get the intended
effect. I'm guessing I'll need to change the final line to something like
learning_rate*del_codes) ]])
Can someone give me some pointers as to how to define this function
properly? I think I'm probably missing something basic about working with
Theano but I'm not sure what.
Thanks in advance!
-Justin

--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.