2016-05-29 21:33:37 UTC
I'm getting an error computing gradients through a scan in which some
intermediate values of the scan function have different sizes in different
iterations (the inputs and outputs always have the same size). Here's a
minimal example:
import numpy as np
import theano
import theano.tensor as T
d = 11
h = 7
W1 = theano.shared(name='W1', value=np.random.uniform(-0.1, 0.1, (d,h)))
W2 = theano.shared(name='W2', value=np.random.uniform(-0.1, 0.1, (h,)))
n = T.lscalar('n')
vecs = T.matrix('vecs')
inds = T.lmatrix('inds')
def recurrence(t, vecs, inds, W1, W2):
cur_inds = inds[T.eq(inds[:,0], t).nonzero()]
cur_vecs = vecs[cur_inds[:,1]]
hidden_layers = T.tanh(cur_vecs.dot(W1))
scores = hidden_layers.dot(W2)
return T.sum(scores)
results, _ = theano.scan(
fn=recurrence, sequences=[T.arange(n)], outputs_info=[None],
non_sequences=[vecs, inds, W1, W2], strict=True)
obj = T.sum(results)
grads = T.grad(obj, [W1, W2])
f = theano.function(inputs=[n, vecs, inds], outputs=grads)
vecs_in = np.ones((10, d))
inds_in = np.array([[0, 0], [1, 1], [1, 2], [2, 3], [3, 4], [3, 5], [3, 6],
[3, 7], [4, 8], [4, 9]])
print f(5, vecs_in, inds_in)
Running this code results in the following error message (tried on 0.7.0,
0.8.2, and 0.9.0dev1.dev-0044349fdf4244c5b616994bf16ad2ff1ff8ce8a):
Traceback (most recent call last):
File "edge_scores.py", line 33, in <module>
print f(5, vecs_in, inds_in)
line 912, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line
314, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
line 899, in __call__
self.fn() if output_subset is None else\
line 951, in rval
r = p(n, [x[0] for x in i], o)
line 940, in <lambda>
self, node)
File "theano/scan_module/scan_perform.pyx", line 547, in
ValueError: could not broadcast input array from shape (11,4) into shape
Apply node that caused the error: forall_inplace,cpu,grad_of_scan_fn}(n,
Alloc.0, Elemwise{eq,no_inplace}.0, Alloc.0, n, n, W1, W2, vecs, inds,
Toposort index: 47
Inputs types: [TensorType(int64, scalar), TensorType(float64, col),
TensorType(int8, matrix), TensorType(float64, matrix), TensorType(int64,
scalar), TensorType(int64, scalar), TensorType(float64, matrix),
TensorType(float64, vector), TensorType(float64, matrix), TensorType(int64,
matrix), TensorType(float64, row)]
Inputs shapes: [(), (5, 1), (5, 10), (2, 7), (), (), (11, 7), (7,), (10,
11), (10, 2), (1, 7)]
Inputs strides: [(), (8, 8), (10, 1), (56, 8), (), (), (56, 8), (8,), (88,
8), (16, 8), (56, 8)]
Inputs values: [array(5), array([[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.]]), 'not shown', 'not shown', array(5), array(5), 'not shown',
'not shown', 'not shown', 'not shown', 'not shown']
Outputs clients: [[Subtensor{int64}(forall_inplace,cpu,grad_of_scan_fn}.0,
HINT: Re-running with most Theano optimization disabled could give you a
back-trace of when this node was created. This can be done with by setting
the Theano flag 'optimizer=fast_compile'. If that does not work, Theano
optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and
storage map footprint of this apply node.
A couple observations:
- There's no error if I turn off optimizations (theano.config.optimizer =
- There's no error if I have a single layer and no hidden layer (i.e. if
scores = cur_vecs.dot(W) for W of the appropriate shape).
I'm getting an error computing gradients through a scan in which some
intermediate values of the scan function have different sizes in different
iterations (the inputs and outputs always have the same size). Here's a
minimal example:
import numpy as np
import theano
import theano.tensor as T
d = 11
h = 7
W1 = theano.shared(name='W1', value=np.random.uniform(-0.1, 0.1, (d,h)))
W2 = theano.shared(name='W2', value=np.random.uniform(-0.1, 0.1, (h,)))
n = T.lscalar('n')
vecs = T.matrix('vecs')
inds = T.lmatrix('inds')
def recurrence(t, vecs, inds, W1, W2):
cur_inds = inds[T.eq(inds[:,0], t).nonzero()]
cur_vecs = vecs[cur_inds[:,1]]
hidden_layers = T.tanh(cur_vecs.dot(W1))
scores = hidden_layers.dot(W2)
return T.sum(scores)
results, _ = theano.scan(
fn=recurrence, sequences=[T.arange(n)], outputs_info=[None],
non_sequences=[vecs, inds, W1, W2], strict=True)
obj = T.sum(results)
grads = T.grad(obj, [W1, W2])
f = theano.function(inputs=[n, vecs, inds], outputs=grads)
vecs_in = np.ones((10, d))
inds_in = np.array([[0, 0], [1, 1], [1, 2], [2, 3], [3, 4], [3, 5], [3, 6],
[3, 7], [4, 8], [4, 9]])
print f(5, vecs_in, inds_in)
Running this code results in the following error message (tried on 0.7.0,
0.8.2, and 0.9.0dev1.dev-0044349fdf4244c5b616994bf16ad2ff1ff8ce8a):
Traceback (most recent call last):
File "edge_scores.py", line 33, in <module>
print f(5, vecs_in, inds_in)
line 912, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line
314, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
line 899, in __call__
self.fn() if output_subset is None else\
line 951, in rval
r = p(n, [x[0] for x in i], o)
line 940, in <lambda>
self, node)
File "theano/scan_module/scan_perform.pyx", line 547, in
ValueError: could not broadcast input array from shape (11,4) into shape
Apply node that caused the error: forall_inplace,cpu,grad_of_scan_fn}(n,
Alloc.0, Elemwise{eq,no_inplace}.0, Alloc.0, n, n, W1, W2, vecs, inds,
Toposort index: 47
Inputs types: [TensorType(int64, scalar), TensorType(float64, col),
TensorType(int8, matrix), TensorType(float64, matrix), TensorType(int64,
scalar), TensorType(int64, scalar), TensorType(float64, matrix),
TensorType(float64, vector), TensorType(float64, matrix), TensorType(int64,
matrix), TensorType(float64, row)]
Inputs shapes: [(), (5, 1), (5, 10), (2, 7), (), (), (11, 7), (7,), (10,
11), (10, 2), (1, 7)]
Inputs strides: [(), (8, 8), (10, 1), (56, 8), (), (), (56, 8), (8,), (88,
8), (16, 8), (56, 8)]
Inputs values: [array(5), array([[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.]]), 'not shown', 'not shown', array(5), array(5), 'not shown',
'not shown', 'not shown', 'not shown', 'not shown']
Outputs clients: [[Subtensor{int64}(forall_inplace,cpu,grad_of_scan_fn}.0,
HINT: Re-running with most Theano optimization disabled could give you a
back-trace of when this node was created. This can be done with by setting
the Theano flag 'optimizer=fast_compile'. If that does not work, Theano
optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and
storage map footprint of this apply node.
A couple observations:
- There's no error if I turn off optimizations (theano.config.optimizer =
- There's no error if I have a single layer and no hidden layer (i.e. if
scores = cur_vecs.dot(W) for W of the appropriate shape).
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.