Discussion:
[theano-users] Split Op (OpFromGraph) to save intermediate results for grad
Frédéric Bastien
2017-08-08 18:55:49 UTC
Permalink
I don't understand what you mean by "forward the precomputed output"

What I would recommand is to make 1 op for the forward. The intermediate
value that can be reused for the gradient, make them output. Don't use them
in the forward, but you can reuse them your grad override.

Frédéric
I am trying to build an Op with a custom/optimized gradient formula. To
override the automatic differenciation, I'm trying to use OpFromGraph.
The gradient formula can reuse intermediate results from the feed forward
pass, so I have tried to split the Op in two: Op1 computes the intermediate
and final result and gives all of it to Op2, Op2 forwards the final result
and takes care of the gradient computation given all the necessary values.
Note that the gradient of the loss wrt the intermediate results is never
needed.
Below is a what I believe to be a minimal working example of my problem,
it exhibits a strange conversion error related to the gradient computation
with the intermediate values. Please take note of the presence of an
integral variable.
import numpy as np
import theano.tensor as T
import theano
x = T.vector()
m = T.bvector()
r = m.sum().astype('floatX') # intermediate value
z = x * m / r # final result
return [
output_gradients[0], # gradient computation delegated to op2
T.DisconnectedType()() # variable has integral type
# T.zeros_like(inputs[1])
]
op1 = theano.OpFromGraph(
inputs=[x, m],
outputs=[z, m, r],
grad_overrides=grad_op1,
inline=True,
name="op1")
z = T.vector()
r_forwarded = T.scalar()
_, m_, r_ = inputs
dm_ = theano.gradient.DisconnectedType()(name="dm_")
# I think the error could be around here
<<<<<<<<<<------------------------------
# dr_ = theano.gradient.DisconnectedType()(name="dr_")
dr_ = T.zeros_like(r_)
return [m_ / r_, dm_, dr_]
op2 = theano.OpFromGraph(
inputs=[z, m, r_forwarded],
outputs=[z], # Op 2 forwards the precomputed output
grad_overrides=grad_op2,
inline=True,
name="op2")
return op1, op2
op1, op2 = make_ops()
x = T.vector(name="x")
m = T.bvector(name="m")
z_intermediate, m_forwarded, r = op1(x, m)
z = op2(z_intermediate, m, r)
g = theano.grad(T.sum(z), wrt=x)
print(g.eval({x: np.array([1., .3, .0, .2], dtype=np.float32),
m: np.array([1, 0, 1, 1], dtype=np.int8)}))
main()
(Note: I had tried to hijack my previous question thread with this problem
but it went unnoticed, sorry for double posting)
Thank you
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
n***@gmail.com
2017-08-09 10:02:48 UTC
Permalink
"forward the precomputed output" means that Op1 already computed the final
output, therefore Op2 just has to behaves as identity in the forward pass

The intermediate value is already an output of Op1 as shown in the example
code, sorry if that wasn't clear.

Nicolas
Post by Frédéric Bastien
I don't understand what you mean by "forward the precomputed output"
What I would recommand is to make 1 op for the forward. The intermediate
value that can be reused for the gradient, make them output. Don't use them
in the forward, but you can reuse them your grad override.
Frédéric
I am trying to build an Op with a custom/optimized gradient formula. To
override the automatic differenciation, I'm trying to use OpFromGraph.
The gradient formula can reuse intermediate results from the feed forward
pass, so I have tried to split the Op in two: Op1 computes the intermediate
and final result and gives all of it to Op2, Op2 forwards the final result
and takes care of the gradient computation given all the necessary values.
Note that the gradient of the loss wrt the intermediate results is never
needed.
Below is a what I believe to be a minimal working example of my problem,
it exhibits a strange conversion error related to the gradient computation
with the intermediate values. Please take note of the presence of an
integral variable.
import numpy as np
import theano.tensor as T
import theano
x = T.vector()
m = T.bvector()
r = m.sum().astype('floatX') # intermediate value
z = x * m / r # final result
return [
output_gradients[0], # gradient computation delegated to op2
T.DisconnectedType()() # variable has integral type
# T.zeros_like(inputs[1])
]
op1 = theano.OpFromGraph(
inputs=[x, m],
outputs=[z, m, r],
grad_overrides=grad_op1,
inline=True,
name="op1")
z = T.vector()
r_forwarded = T.scalar()
_, m_, r_ = inputs
dm_ = theano.gradient.DisconnectedType()(name="dm_")
# I think the error could be around here
<<<<<<<<<<------------------------------
# dr_ = theano.gradient.DisconnectedType()(name="dr_")
dr_ = T.zeros_like(r_)
return [m_ / r_, dm_, dr_]
op2 = theano.OpFromGraph(
inputs=[z, m, r_forwarded],
outputs=[z], # Op 2 forwards the precomputed output
grad_overrides=grad_op2,
inline=True,
name="op2")
return op1, op2
op1, op2 = make_ops()
x = T.vector(name="x")
m = T.bvector(name="m")
z_intermediate, m_forwarded, r = op1(x, m)
z = op2(z_intermediate, m, r)
g = theano.grad(T.sum(z), wrt=x)
print(g.eval({x: np.array([1., .3, .0, .2], dtype=np.float32),
m: np.array([1, 0, 1, 1], dtype=np.int8)}))
main()
(Note: I had tried to hijack my previous question thread with this
problem but it went unnoticed, sorry for double posting)
Thank you
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Frédéric Bastien
2017-08-09 21:47:27 UTC
Permalink
Sorry, but I'm not able to answer this grad question. Hopefully someone
else that better understand that part can answer.

Fred
I am trying to build an Op with a custom/optimized gradient formula. To
override the automatic differenciation, I'm trying to use OpFromGraph.
The gradient formula can reuse intermediate results from the feed forward
pass, so I have tried to split the Op in two: Op1 computes the intermediate
and final result and gives all of it to Op2, Op2 forwards the final result
and takes care of the gradient computation given all the necessary values.
Note that the gradient of the loss wrt the intermediate results is never
needed.
Below is a what I believe to be a minimal working example of my problem,
it exhibits a strange conversion error related to the gradient computation
with the intermediate values. Please take note of the presence of an
integral variable.
import numpy as np
import theano.tensor as T
import theano
x = T.vector()
m = T.bvector()
r = m.sum().astype('floatX') # intermediate value
z = x * m / r # final result
return [
output_gradients[0], # gradient computation delegated to op2
T.DisconnectedType()() # variable has integral type
# T.zeros_like(inputs[1])
]
op1 = theano.OpFromGraph(
inputs=[x, m],
outputs=[z, m, r],
grad_overrides=grad_op1,
inline=True,
name="op1")
z = T.vector()
r_forwarded = T.scalar()
_, m_, r_ = inputs
dm_ = theano.gradient.DisconnectedType()(name="dm_")
# I think the error could be around here
<<<<<<<<<<------------------------------
# dr_ = theano.gradient.DisconnectedType()(name="dr_")
dr_ = T.zeros_like(r_)
return [m_ / r_, dm_, dr_]
op2 = theano.OpFromGraph(
inputs=[z, m, r_forwarded],
outputs=[z], # Op 2 forwards the precomputed output
grad_overrides=grad_op2,
inline=True,
name="op2")
return op1, op2
op1, op2 = make_ops()
x = T.vector(name="x")
m = T.bvector(name="m")
z_intermediate, m_forwarded, r = op1(x, m)
z = op2(z_intermediate, m, r)
g = theano.grad(T.sum(z), wrt=x)
print(g.eval({x: np.array([1., .3, .0, .2], dtype=np.float32),
m: np.array([1, 0, 1, 1], dtype=np.int8)}))
main()
(Note: I had tried to hijack my previous question thread with this problem
but it went unnoticed, sorry for double posting)
Thank you
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...