[theano-users] why does this gradient is invalid?

Discussion:

佐藤優

2017-08-09 08:50:12 UTC

I wonder why bellow code is invalid..

from numpy import *
import theano.tensor as T
x = T.dmatrix("x")
mx = x[...,None,:]
a = T.ones((1,3))
T.grad(mx[...,0].dot(a).sum(), a).eval({x:ones((5,10)).astype(float32)})

bellow error is emerged.

---------------------------------------------------------------------------ValueError Traceback (most recent call last)/home/yu/anaconda3/lib/python3.5/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs) 883 outputs =\--> 884 self.fn() if output_subset is None else\ 885 self.fn(output_subset=output_subset)
ValueError: Shape mismatch: A.shape[1] != x.shape[0]

During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)<ipython-input-74-52410617594a> in <module>() 3 mx = x[...,None,:] 4 a = T.ones((1,3))----> 5 T.grad(mx[...,0].dot(a).sum(), a).eval({x:ones((5,10)).astype(float32)})
/home/yu/anaconda3/lib/python3.5/site-packages/theano/gof/graph.py in eval(self, inputs_to_values) 517 args = [inputs_to_values[param] for param in inputs] 518 --> 519 rval = self._fn_cache[inputs](*args) 520 521 return rval
/home/yu/anaconda3/lib/python3.5/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs) 896 node=self.fn.nodes[self.fn.position_of_error], 897 thunk=thunk,--> 898 storage_map=getattr(self.fn, 'storage_map', None)) 899 else: 900 # old-style linkers raise their own exceptions
/home/yu/anaconda3/lib/python3.5/site-packages/theano/gof/link.py in raise_with_op(node, thunk, exc_info, storage_map) 323 # extra long error message in that case. 324 pass--> 325 reraise(exc_type, exc_value, exc_trace) 326 327
/home/yu/anaconda3/lib/python3.5/site-packages/six.py in reraise(tp, value, tb) 683 value = tp() 684 if value.__traceback__ is not tb:--> 685 raise value.with_traceback(tb) 686 raise value 687
/home/yu/anaconda3/lib/python3.5/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs) 882 try: 883 outputs =\--> 884 self.fn() if output_subset is None else\ 885 self.fn(output_subset=output_subset) 886 except Exception:
ValueError: Shape mismatch: A.shape[1] != x.shape[0]
Apply node that caused the error: CGemv{inplace}(AllocEmpty{dtype='float64'}.0, TensorConstant{1.0}, InplaceDimShuffle{1,0}.0, Rebroadcast{0}.0, TensorConstant{0.0})
Toposort index: 7
Inputs types: [TensorType(float64, vector), TensorType(float64, scalar), TensorType(float64, matrix), TensorType(float64, vector), TensorType(float64, scalar)]
Inputs shapes: [(3,), (), (3, 5), (1,), ()]
Inputs strides: [(8,), (), (8, 24), (80,), ()]
Inputs values: [array([ 0.00000000e+000, 4.94065646e-324, 9.88131292e-324]), array(1.0), 'not shown', array([ 1.]), array(0.0)]
Inputs type_num: [12, 12, 12, 12, 12]
Outputs clients: [[InplaceDimShuffle{x,0}(CGemv{inplace}.0)]]

Debugprint of the apply node:
CGemv{inplace} [id A] <TensorType(float64, vector)> ''
|AllocEmpty{dtype='float64'} [id B] <TensorType(float64, vector)> ''
| |TensorConstant{3} [id C] <TensorType(int64, scalar)>
|TensorConstant{1.0} [id D] <TensorType(float64, scalar)>
|InplaceDimShuffle{1,0} [id E] <TensorType(float64, matrix)> ''
| |Alloc [id F] <TensorType(float64, matrix)> ''
| |TensorConstant{(1, 1) of 1.0} [id G] <TensorType(float64, (True, True))>
| |Shape_i{0} [id H] <TensorType(int64, scalar)> ''
| | |x [id I] <TensorType(float64, matrix)>
| |TensorConstant{3} [id C] <TensorType(int64, scalar)>
|Rebroadcast{0} [id J] <TensorType(float64, vector)> ''
| |Subtensor{int8, ::, int64} [id K] <TensorType(float64, (True,))> ''
| |InplaceDimShuffle{0,x,1} [id L] <TensorType(float64, (False, True, False))> ''
| | |x [id I] <TensorType(float64, matrix)>
| |Constant{0} [id M] <int8>
| |Constant{0} [id N] <int64>
|TensorConstant{0.0} [id O] <TensorType(float64, scalar)>

Storage map footprint:
- x, Input, Shape: (5, 10), ElemSize: 8 Byte(s), TotalSize: 400 Byte(s)
- InplaceDimShuffle{0,x,1}.0, Shape: (5, 1, 10), ElemSize: 8 Byte(s), TotalSize: 400 Byte(s)
- Alloc.0, Shape: (5, 3), ElemSize: 8 Byte(s), TotalSize: 120 Byte(s)
- InplaceDimShuffle{1,0}.0, Shape: (3, 5), ElemSize: 8 Byte(s), TotalSize: 120 Byte(s)
- AllocEmpty{dtype='float64'}.0, Shape: (3,), ElemSize: 8 Byte(s), TotalSize: 24 Byte(s)
- Subtensor{int8, ::, int64}.0, Shape: (1,), ElemSize: 8 Byte(s), TotalSize: 8 Byte(s)
- Shape_i{0}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- TensorConstant{1.0}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- TensorConstant{0.0}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- Constant{0}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- Rebroadcast{0}.0, Shape: (1,), ElemSize: 8 Byte(s), TotalSize: 8 Byte(s)
- TensorConstant{3}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- TensorConstant{(1, 1) of 1.0}, Shape: (1, 1), ElemSize: 8 Byte(s), TotalSize: 8 Byte(s)
- Constant{0}, Shape: (), ElemSize: 1 Byte(s), TotalSize: 1.0 Byte(s)
TotalSize: 593.0 Byte(s) 0.000 GB
TotalSize inputs: 441.0 Byte(s) 0.000 GB

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.

I thought above script includes broadcasted operation was wrong,
So no broadcasting used before gradient operation as follows,

x = T.tensor3("x")
mx = x
a = T.ones((1,3))
T.grad(mx[...,0].dot(a).sum(), a).eval({x:ones((5,1,10)).astype(float32)})

successfully performed and dumped bellow result.

array([[ 5., 5., 5.]], dtype=float32)

But why did the former case invalid?

Is the gradient with broadcasting mathmatically invalid?

Why does shape miss much happen on gradient?

Could you taught me about above question?

--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Frédéric Bastien

2017-08-09 22:39:04 UTC

Permalink

This is a bug in one Theano optimization: local_dimshuffle_subtensor

Thanks for the report. I made an issue so that we don't forget it:

https://github.com/Theano/Theano/issues/6288

FrÃ©dÃ©ric

Post by ä½è¤åª
I wonder why bellow code is invalid..
from numpy import *
import theano.tensor as T
x = T.dmatrix("x")
mx = x[...,None,:]
a = T.ones((1,3))
T.grad(mx[...,0].dot(a).sum(), a).eval({x:ones((5,10)).astype(float32)})
bellow error is emerged.
---------------------------------------------------------------------------ValueError Traceback (most recent call last)/home/yu/anaconda3/lib/python3.5/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs) 883 outputs =\--> 884 self.fn() if output_subset is None else\ 885 self.fn(output_subset=output_subset)
ValueError: Shape mismatch: A.shape[1] != x.shape[0]
ValueError Traceback (most recent call last)<ipython-input-74-52410617594a> in <module>() 3 mx = x[...,None,:] 4 a = T.ones((1,3))----> 5 T.grad(mx[...,0].dot(a).sum(), a).eval({x:ones((5,10)).astype(float32)})
/home/yu/anaconda3/lib/python3.5/site-packages/theano/gof/graph.py in eval(self, inputs_to_values) 517 args = [inputs_to_values[param] for param in inputs] 518 --> 519 rval = self._fn_cache[inputs](*args) 520 521 return rval
/home/yu/anaconda3/lib/python3.5/site-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs) 896 node=self.fn.nodes[self.fn.position_of_error], 897 thunk=thunk,--> 898 storage_map=getattr(self.fn, 'storage_map', None)) 899 else: 900 # old-style linkers raise their own exceptions
/home/yu/anaconda3/lib/python3.5/site-packages/theano/gof/link.py in raise_with_op(node, thunk, exc_info, storage_map) 323 # extra long error message in that case. 324 pass--> 325 reraise(exc_type, exc_value, exc_trace) 326 327
/home/yu/anaconda3/lib/python3.5/site-packages/six.py in reraise(tp, value, tb) 683 value = tp() 684 if value.__traceback__ is not tb:--> 685 raise value.with_traceback(tb) 686 raise value 687
ValueError: Shape mismatch: A.shape[1] != x.shape[0]
Apply node that caused the error: CGemv{inplace}(AllocEmpty{dtype='float64'}.0, TensorConstant{1.0}, InplaceDimShuffle{1,0}.0, Rebroadcast{0}.0, TensorConstant{0.0})
Toposort index: 7
Inputs types: [TensorType(float64, vector), TensorType(float64, scalar), TensorType(float64, matrix), TensorType(float64, vector), TensorType(float64, scalar)]
Inputs shapes: [(3,), (), (3, 5), (1,), ()]
Inputs strides: [(8,), (), (8, 24), (80,), ()]
Inputs values: [array([ 0.00000000e+000, 4.94065646e-324, 9.88131292e-324]), array(1.0), 'not shown', array([ 1.]), array(0.0)]
Inputs type_num: [12, 12, 12, 12, 12]
Outputs clients: [[InplaceDimShuffle{x,0}(CGemv{inplace}.0)]]
CGemv{inplace} [id A] <TensorType(float64, vector)> ''
|AllocEmpty{dtype='float64'} [id B] <TensorType(float64, vector)> ''
| |TensorConstant{3} [id C] <TensorType(int64, scalar)>
|TensorConstant{1.0} [id D] <TensorType(float64, scalar)>
|InplaceDimShuffle{1,0} [id E] <TensorType(float64, matrix)> ''
| |Alloc [id F] <TensorType(float64, matrix)> ''
| |TensorConstant{(1, 1) of 1.0} [id G] <TensorType(float64, (True, True))>
| |Shape_i{0} [id H] <TensorType(int64, scalar)> ''
| | |x [id I] <TensorType(float64, matrix)>
| |TensorConstant{3} [id C] <TensorType(int64, scalar)>
|Rebroadcast{0} [id J] <TensorType(float64, vector)> ''
| |Subtensor{int8, ::, int64} [id K] <TensorType(float64, (True,))> ''
| |InplaceDimShuffle{0,x,1} [id L] <TensorType(float64, (False, True, False))> ''
| | |x [id I] <TensorType(float64, matrix)>
| |Constant{0} [id M] <int8>
| |Constant{0} [id N] <int64>
|TensorConstant{0.0} [id O] <TensorType(float64, scalar)>
- x, Input, Shape: (5, 10), ElemSize: 8 Byte(s), TotalSize: 400 Byte(s)
- InplaceDimShuffle{0,x,1}.0, Shape: (5, 1, 10), ElemSize: 8 Byte(s), TotalSize: 400 Byte(s)
- Alloc.0, Shape: (5, 3), ElemSize: 8 Byte(s), TotalSize: 120 Byte(s)
- InplaceDimShuffle{1,0}.0, Shape: (3, 5), ElemSize: 8 Byte(s), TotalSize: 120 Byte(s)
- AllocEmpty{dtype='float64'}.0, Shape: (3,), ElemSize: 8 Byte(s), TotalSize: 24 Byte(s)
- Subtensor{int8, ::, int64}.0, Shape: (1,), ElemSize: 8 Byte(s), TotalSize: 8 Byte(s)
- Shape_i{0}.0, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- TensorConstant{1.0}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- TensorConstant{0.0}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- Constant{0}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- Rebroadcast{0}.0, Shape: (1,), ElemSize: 8 Byte(s), TotalSize: 8 Byte(s)
- TensorConstant{3}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- TensorConstant{(1, 1) of 1.0}, Shape: (1, 1), ElemSize: 8 Byte(s), TotalSize: 8 Byte(s)
- Constant{0}, Shape: (), ElemSize: 1 Byte(s), TotalSize: 1.0 Byte(s)
TotalSize: 593.0 Byte(s) 0.000 GB
TotalSize inputs: 441.0 Byte(s) 0.000 GB
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
I thought above script includes broadcasted operation was wrong,
So no broadcasting used before gradient operation as follows,
x = T.tensor3("x")
mx = x
a = T.ones((1,3))
T.grad(mx[...,0].dot(a).sum(), a).eval({x:ones((5,1,10)).astype(float32)})
successfully performed and dumped bellow result.
array([[ 5., 5., 5.]], dtype=float32)
But why did the former case invalid?
Is the gradient with broadcasting mathmatically invalid?
Why does shape miss much happen on gradient?
Could you taught me about above question?
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

佐藤優

2017-08-10 00:26:53 UTC

Permalink

I understand.
I sincerely thank you.

Sato

Post by FrÃ©dÃ©ric Bastien
This is a bug in one Theano optimization: local_dimshuffle_subtensor
https://github.com/Theano/Theano/issues/6288
FrÃ©dÃ©ric