[theano-users] Unused input error with chained OpFromGraph ops

Discussion:

n***@gmail.com

2017-07-11 09:32:50 UTC

Hi,

I am trying to split an computation over two ops in order to avoid spurious
computations when computing the gradient.
My current attempt uses a first op which returns the desired result for the
forward part and extra intermediate results. The second op just forwards
the desired result, but its grad is overriden to compute the gradient based
on the intermediate results.

In this configuration, Theano complains about unused inputs in the forward
computation because the intermediate results are not used for the forward
method of the second op.

Is this an expected behaviour or a bug?

----

import numpy as np
import theano.tensor as T
import theano

def make_ops():
x_var = T.vector()
m_var = T.bvector()

r = m_var.sum().astype('floatX')
z = x_var * m_var / r

def grad_op1(inputs, output_gradients):
return [
output_gradients[0], # computation delegated to op2
theano.gradient.DisconnectedType()()
]

op1 = theano.OpFromGraph(
inputs=[x_var, m_var],
outputs=[z, r],
grad_overrides=grad_op1)

z_var = T.vector()
r_var = T.scalar()

def grad_op2(inputs, output_gradients):
_, m_, r_ = inputs
return [
m_ * r_,
theano.gradient.DisconnectedType()(),
theano.gradient.DisconnectedType()()
]

op2 = theano.OpFromGraph(
inputs=[z_var, m_var, r_var],
outputs=[z_var],
grad_overrides=grad_op2)

return op1, op2

op1, op2 = make_ops()
x_var = T.vector()
m_var = T.bvector()
z_, r = op1(x_var, m_var)
z = op2(z_, m_var, r)

print(z_.eval({x_var: np.array([1., .3, .0, .2], dtype=np.float32),
m_var: np.array([1, 0, 1, 1], dtype=np.int8)}))

f = theano.function([x_var, m_var], [z], on_unused_input='ignore') #
raises anyway

print(f(np.array([1., .3, .0, .2], dtype=np.float32),
np.array([1, 0, 1, 1], dtype=np.int8)))

# g = theano.grad(T.sum(z), wrt=x_var)
# print(g.eval({x_var: np.array([1., .3, .0, .2], dtype=np.float32),
# m_var: np.array([1, 0, 1, 1], dtype=np.int8)}))

--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Pascal Lamblin

2017-07-12 00:46:49 UTC

Permalink

That error actually comes from the function compilation that happens inside
op2.
One option is to add "inline=True" to the call to OpFromGraph returning
op2, in that case the gradient gets overriden, but you will have the
inlined subgraph in the global one, and OpFromGraph will not build an inner
function.
The other one would be to add "on_unused_input='ignore'" to the call to
OpFromGraph:

op2 = theano.OpFromGraph(
inputs=[z_var, m_var, r_var],
outputs=[z_var],
grad_overrides=grad_op2,
on_unused_input='ignore')

With it, the program worked, but I still got an error in g.eval(...):
TypeError: Cannot convert Type DisconnectedType (of Variable
<DisconnectedType>) into Type TensorType(float64, scalar). You can try to
manually convert <DisconnectedType> into a TensorType(float64, scalar).
I'm not sure where that comes from, please let us know if you do not figure
it out.

Post by n***@gmail.com
Hi,
I am trying to split an computation over two ops in order to avoid
spurious computations when computing the gradient.
My current attempt uses a first op which returns the desired result for
the forward part and extra intermediate results. The second op just
forwards the desired result, but its grad is overriden to compute the
gradient based on the intermediate results.
In this configuration, Theano complains about unused inputs in the forward
computation because the intermediate results are not used for the forward
method of the second op.
Is this an expected behaviour or a bug?
----
import numpy as np
import theano.tensor as T
import theano
x_var = T.vector()
m_var = T.bvector()
r = m_var.sum().astype('floatX')
z = x_var * m_var / r
return [
output_gradients[0], # computation delegated to op2
theano.gradient.DisconnectedType()()
]
op1 = theano.OpFromGraph(
inputs=[x_var, m_var],
outputs=[z, r],
grad_overrides=grad_op1)
z_var = T.vector()
r_var = T.scalar()
_, m_, r_ = inputs
return [
m_ * r_,
theano.gradient.DisconnectedType()(),
theano.gradient.DisconnectedType()()
]
op2 = theano.OpFromGraph(
inputs=[z_var, m_var, r_var],
outputs=[z_var],
grad_overrides=grad_op2)
return op1, op2
op1, op2 = make_ops()
x_var = T.vector()
m_var = T.bvector()
z_, r = op1(x_var, m_var)
z = op2(z_, m_var, r)
print(z_.eval({x_var: np.array([1., .3, .0, .2], dtype=np.float32),
m_var: np.array([1, 0, 1, 1], dtype=np.int8)}))
f = theano.function([x_var, m_var], [z], on_unused_input='ignore') #
raises anyway
print(f(np.array([1., .3, .0, .2], dtype=np.float32),
np.array([1, 0, 1, 1], dtype=np.int8)))
# g = theano.grad(T.sum(z), wrt=x_var)
# print(g.eval({x_var: np.array([1., .3, .0, .2], dtype=np.float32),
# m_var: np.array([1, 0, 1, 1], dtype=np.int8)}))

n***@gmail.com

2017-07-13 11:03:28 UTC

Permalink

Hi,

Thank you for the suggestion, actually inlining makes more sense for what I
am trying to do.

However, a casting issue arises when trying to compute the derivative wrt
to the continuous input. If I understood correctly, DisconnectedInput
should be returned as the gradient for integral inputs (or inputs wrt which
I don't need the derivative) right?

Below is the slightly modified code which illustrate this new issue:

import numpy as np
import theano.tensor as T
import theano

def make_ops():
x_var = T.vector()
m_var = T.bvector()

r = m_var.sum().astype('floatX')
z = x_var * m_var / r

def grad_op1(inputs, output_gradients):
return [
output_gradients[0], # computation delegated to op2
theano.gradient.DisconnectedType()(),
]

op1 = theano.OpFromGraph(
inputs=[x_var, m_var],
outputs=[z, r],
grad_overrides=grad_op1,
inline=True)

z_var = T.vector()
r_var = T.scalar()

def grad_op2(inputs, output_gradients):
_, m_, r_ = inputs
return [
m_ * r_,
theano.gradient.DisconnectedType()(),
theano.gradient.DisconnectedType()()
]

op2 = theano.OpFromGraph(
inputs=[z_var, m_var, r_var],
outputs=[z_var],
grad_overrides=grad_op2,
inline=True)

return op1, op2

op1, op2 = make_ops()
x_var = T.vector()
m_var = T.bvector()
z_, r = op1(x_var, m_var)
z = op2(z_, m_var, r)

g = theano.grad(T.sum(z), wrt=x_var)
print(g.eval({x_var: np.array([1., .3, .0, .2], dtype=np.float32),
m_var: np.array([1, 0, 1, 1], dtype=np.int8)}))

n***@gmail.com

2017-07-17 12:04:45 UTC

Permalink

Hello,

I still haven't managed to trace the error down. Below is a shorter example
that triggers the error. It seems theano tries to create a variable for the
output gradient for a node through which I do not back propagate. At some
point it hits a DisconnectedType instance and raises an error.

import numpy as np
import theano.tensor as T
import theano

def make_ops():
x_var = T.vector()
m_var = T.bvector()

r = m_var.sum().astype('floatX')
z = x_var * m_var / r

def grad_op1(inputs, output_gradients):
pass
return [
output_gradients[0], # computation delegated to op2
theano.gradient.DisconnectedType()()
]

op1 = theano.OpFromGraph(
inputs=[x_var, m_var],
outputs=[z, r],
grad_overrides=grad_op1,
inline=True,
name="op1")

return op1

op1 = make_ops()
x_var = T.vector()
m_var = T.bvector()
z, r = op1(x_var, m_var)

g = theano.grad(T.sum(z), wrt=x_var)
print(g.eval({x_var: np.array([1., .3, .0, .2], dtype=np.float32),
m_var: np.array([1, 0, 1, 1], dtype=np.int8)}))

output:
TypeError: Cannot convert Type DisconnectedType (of Variable <
DisconnectedType>) into Type TensorType(float32, scalar). You can try to
manually convert <DisconnectedType> into a TensorType(float32, scalar).

Process finished with exit code 1

Post by n***@gmail.com
Hi,
Thank you for the suggestion, actually inlining makes more sense for what
I am trying to do.
However, a casting issue arises when trying to compute the derivative wrt
to the continuous input. If I understood correctly, DisconnectedInput
should be returned as the gradient for integral inputs (or inputs wrt which
I don't need the derivative) right?
import numpy as np
import theano.tensor as T
import theano
x_var = T.vector()
m_var = T.bvector()
r = m_var.sum().astype('floatX')
z = x_var * m_var / r
return [
output_gradients[0], # computation delegated to op2
theano.gradient.DisconnectedType()(),
]
op1 = theano.OpFromGraph(
inputs=[x_var, m_var],
outputs=[z, r],
grad_overrides=grad_op1,
inline=True)
z_var = T.vector()
r_var = T.scalar()
_, m_, r_ = inputs
return [
m_ * r_,
theano.gradient.DisconnectedType()(),
theano.gradient.DisconnectedType()()
]
op2 = theano.OpFromGraph(
inputs=[z_var, m_var, r_var],
outputs=[z_var],
grad_overrides=grad_op2,
inline=True)
return op1, op2
op1, op2 = make_ops()
x_var = T.vector()
m_var = T.bvector()
z_, r = op1(x_var, m_var)
z = op2(z_, m_var, r)
g = theano.grad(T.sum(z), wrt=x_var)
print(g.eval({x_var: np.array([1., .3, .0, .2], dtype=np.float32),
m_var: np.array([1, 0, 1, 1], dtype=np.int8)}))

Post by n***@gmail.com
Hi,
I am trying to split an computation over two ops in order to avoid
spurious computations when computing the gradient.
My current attempt uses a first op which returns the desired result for
the forward part and extra intermediate results. The second op just
forwards the desired result, but its grad is overriden to compute the
gradient based on the intermediate results.
In this configuration, Theano complains about unused inputs in the
forward computation because the intermediate results are not used for the
forward method of the second op.
Is this an expected behaviour or a bug?
----
import numpy as np
import theano.tensor as T
import theano
x_var = T.vector()
m_var = T.bvector()
r = m_var.sum().astype('floatX')
z = x_var * m_var / r
return [
output_gradients[0], # computation delegated to op2
theano.gradient.DisconnectedType()()
]
op1 = theano.OpFromGraph(
inputs=[x_var, m_var],
outputs=[z, r],
grad_overrides=grad_op1)
z_var = T.vector()
r_var = T.scalar()
_, m_, r_ = inputs
return [
m_ * r_,
theano.gradient.DisconnectedType()(),
theano.gradient.DisconnectedType()()
]
op2 = theano.OpFromGraph(
inputs=[z_var, m_var, r_var],
outputs=[z_var],
grad_overrides=grad_op2)
return op1, op2
op1, op2 = make_ops()
x_var = T.vector()
m_var = T.bvector()
z_, r = op1(x_var, m_var)
z = op2(z_, m_var, r)
print(z_.eval({x_var: np.array([1., .3, .0, .2], dtype=np.float32),
m_var: np.array([1, 0, 1, 1], dtype=np.int8)}))
f = theano.function([x_var, m_var], [z], on_unused_input='ignore') #
raises anyway
print(f(np.array([1., .3, .0, .2], dtype=np.float32),
np.array([1, 0, 1, 1], dtype=np.int8)))
# g = theano.grad(T.sum(z), wrt=x_var)
# print(g.eval({x_var: np.array([1., .3, .0, .2], dtype=np.float32),
# m_var: np.array([1, 0, 1, 1], dtype=np.int8)}))