David Anderson
2017-07-18 09:07:36 UTC
Hi there!
I'm implementing a convolutional operation and I'm getting an unexpected
error when I try perform a convolution on a Binomial sampled tensor.
The error is:
RuntimeError: GpuCorrMM forward encountered an error running gemm: 5
The error can be re-created with the following code (At least on my machine
it can):
import numpy as np
import theano as th
from theano import tensor as T
from theano.tensor.shared_randomstreams import RandomStreams
rng = np.random.RandomState()
theano_rng = RandomStreams(rng.randint(2 ** 30))
th_input = T.tensor4()
th_filter = T.tensor4()
th_sampled = theano_rng.binomial(size=th_input.shape, n=1, p=th_input)
th_output = T.nnet.conv2d(th_sampled, th_filter)
op = th.function(
inputs=[th_input, th_filter],
outputs=th_output
)
input_sample = np.random.rand(1, 1, 28, 28)
kernel = np.random.rand(1, 1, 6, 6)
op(input_sample, kernel)
Interestingly, the error is NOT shown for other distribution samples, like theano_rng.normal(),
which has type RandomFunction{normal}.1 instead
of RandomFunction{binomial}.1
For what its worth, my THEANO_FLAGS are as follows:
floatX=float64,device=cuda,nvcc.flags=-D_FORCE_INLINES,exception_verbosity=
high
The rest of the stack trace is as follows:
Traceback (most recent call last):
File "tmp2.py", line 23, in <module>
op(input_sample, kernel)
File
"/home/dave/miniconda2/lib/python2.7/site-packages/theano/compile/function_module.py",
line 898, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File
"/home/dave/miniconda2/lib/python2.7/site-packages/theano/gof/link.py",
line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File
"/home/dave/miniconda2/lib/python2.7/site-packages/theano/compile/function_module.py",
line 884, in __call__
self.fn() if output_subset is None else\
RuntimeError: GpuCorrMM forward encountered an error running gemm: 5
Apply node that caused the error: GpuCorrMM{valid, (1, 1), (1,
1)}(GpuContiguous.0, GpuContiguous.0)
Toposort index: 11
Inputs types: [GpuArrayType<None>(int64, (False, False, False, False)),
GpuArrayType<None>(float64, (False, False, False, False))]
Inputs shapes: [(1, 1, 28, 28), (1, 1, 6, 6)]
Inputs strides: [(6272, 6272, 224, 8), (288, 288, 48, 8)]
Inputs values: ['not shown', 'not shown']
Inputs type_num: [7, 12]
Outputs clients: [[HostFromGpu(gpuarray)(GpuCorrMM{valid, (1, 1), (1,
1)}.0)]]
Debugprint of the apply node:
GpuCorrMM{valid, (1, 1), (1, 1)} [id A] <GpuArrayType<None>(int64, (False,
False, False, False))> ''
|GpuContiguous [id B] <GpuArrayType<None>(int64, (False, False, False,
False))> ''
| |GpuFromHost<None> [id C] <GpuArrayType<None>(int64, (False, False,
False, False))> ''
| |RandomFunction{binomial}.1 [id D] <TensorType(int64, 4D)> ''
| |<RandomStateType> [id E] <RandomStateType>
| |MakeVector{dtype='int64'} [id F] <TensorType(int64, vector)> ''
| | |Shape_i{0} [id G] <TensorType(int64, scalar)> ''
| | | |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
| | |Shape_i{1} [id I] <TensorType(int64, scalar)> ''
| | | |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
| | |Shape_i{2} [id J] <TensorType(int64, scalar)> ''
| | | |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
| | |Shape_i{3} [id K] <TensorType(int64, scalar)> ''
| | |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
| |TensorConstant{1} [id L] <TensorType(int8, scalar)>
| |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
|GpuContiguous [id M] <GpuArrayType<None>(float64, (False, False, False,
False))> ''
|GpuFromHost<None> [id N] <GpuArrayType<None>(float64, (False, False,
False, False))> ''
|Subtensor{::, ::, ::int64, ::int64} [id O] <TensorType(float64, 4D)>
''
|<TensorType(float64, 4D)> [id P] <TensorType(float64, 4D)>
|Constant{-1} [id Q] <int64>
|Constant{-1} [id Q] <int64>
Storage map footprint:
- GpuContiguous.0, Shape: (1, 1, 28, 28), ElemSize: 8 Byte(s), TotalSize:
6272 Byte(s)
- <TensorType(float64, 4D)>, Input, Shape: (1, 1, 28, 28), ElemSize: 8
Byte(s), TotalSize: 6272 Byte(s)
- GpuContiguous.0, Shape: (1, 1, 6, 6), ElemSize: 8 Byte(s), TotalSize:
288 Byte(s)
- <TensorType(float64, 4D)>, Input, Shape: (1, 1, 6, 6), ElemSize: 8
Byte(s), TotalSize: 288 Byte(s)
- Constant{-1}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- TensorConstant{1}, Shape: (), ElemSize: 1 Byte(s), TotalSize: 1.0 Byte(s)
TotalSize: 13129.0 Byte(s) 0.000 GB
TotalSize inputs: 6569.0 Byte(s) 0.000 GB
Am I doing something wrong here? Any idea how I might get around this? It
works if i split up the code into two functions, one that does the sampling
and returns out the tensor, and then one that takes in this result and does
the convolution. But it'd be really stupid to pass the value back out to
the CPU RAM from the GPU RAM just to get around this...
Any advice would be hugely appreciated!
Cheers,
Dave
I'm implementing a convolutional operation and I'm getting an unexpected
error when I try perform a convolution on a Binomial sampled tensor.
The error is:
RuntimeError: GpuCorrMM forward encountered an error running gemm: 5
The error can be re-created with the following code (At least on my machine
it can):
import numpy as np
import theano as th
from theano import tensor as T
from theano.tensor.shared_randomstreams import RandomStreams
rng = np.random.RandomState()
theano_rng = RandomStreams(rng.randint(2 ** 30))
th_input = T.tensor4()
th_filter = T.tensor4()
th_sampled = theano_rng.binomial(size=th_input.shape, n=1, p=th_input)
th_output = T.nnet.conv2d(th_sampled, th_filter)
op = th.function(
inputs=[th_input, th_filter],
outputs=th_output
)
input_sample = np.random.rand(1, 1, 28, 28)
kernel = np.random.rand(1, 1, 6, 6)
op(input_sample, kernel)
Interestingly, the error is NOT shown for other distribution samples, like theano_rng.normal(),
which has type RandomFunction{normal}.1 instead
of RandomFunction{binomial}.1
For what its worth, my THEANO_FLAGS are as follows:
floatX=float64,device=cuda,nvcc.flags=-D_FORCE_INLINES,exception_verbosity=
high
The rest of the stack trace is as follows:
Traceback (most recent call last):
File "tmp2.py", line 23, in <module>
op(input_sample, kernel)
File
"/home/dave/miniconda2/lib/python2.7/site-packages/theano/compile/function_module.py",
line 898, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File
"/home/dave/miniconda2/lib/python2.7/site-packages/theano/gof/link.py",
line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File
"/home/dave/miniconda2/lib/python2.7/site-packages/theano/compile/function_module.py",
line 884, in __call__
self.fn() if output_subset is None else\
RuntimeError: GpuCorrMM forward encountered an error running gemm: 5
Apply node that caused the error: GpuCorrMM{valid, (1, 1), (1,
1)}(GpuContiguous.0, GpuContiguous.0)
Toposort index: 11
Inputs types: [GpuArrayType<None>(int64, (False, False, False, False)),
GpuArrayType<None>(float64, (False, False, False, False))]
Inputs shapes: [(1, 1, 28, 28), (1, 1, 6, 6)]
Inputs strides: [(6272, 6272, 224, 8), (288, 288, 48, 8)]
Inputs values: ['not shown', 'not shown']
Inputs type_num: [7, 12]
Outputs clients: [[HostFromGpu(gpuarray)(GpuCorrMM{valid, (1, 1), (1,
1)}.0)]]
Debugprint of the apply node:
GpuCorrMM{valid, (1, 1), (1, 1)} [id A] <GpuArrayType<None>(int64, (False,
False, False, False))> ''
|GpuContiguous [id B] <GpuArrayType<None>(int64, (False, False, False,
False))> ''
| |GpuFromHost<None> [id C] <GpuArrayType<None>(int64, (False, False,
False, False))> ''
| |RandomFunction{binomial}.1 [id D] <TensorType(int64, 4D)> ''
| |<RandomStateType> [id E] <RandomStateType>
| |MakeVector{dtype='int64'} [id F] <TensorType(int64, vector)> ''
| | |Shape_i{0} [id G] <TensorType(int64, scalar)> ''
| | | |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
| | |Shape_i{1} [id I] <TensorType(int64, scalar)> ''
| | | |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
| | |Shape_i{2} [id J] <TensorType(int64, scalar)> ''
| | | |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
| | |Shape_i{3} [id K] <TensorType(int64, scalar)> ''
| | |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
| |TensorConstant{1} [id L] <TensorType(int8, scalar)>
| |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
|GpuContiguous [id M] <GpuArrayType<None>(float64, (False, False, False,
False))> ''
|GpuFromHost<None> [id N] <GpuArrayType<None>(float64, (False, False,
False, False))> ''
|Subtensor{::, ::, ::int64, ::int64} [id O] <TensorType(float64, 4D)>
''
|<TensorType(float64, 4D)> [id P] <TensorType(float64, 4D)>
|Constant{-1} [id Q] <int64>
|Constant{-1} [id Q] <int64>
Storage map footprint:
- GpuContiguous.0, Shape: (1, 1, 28, 28), ElemSize: 8 Byte(s), TotalSize:
6272 Byte(s)
- <TensorType(float64, 4D)>, Input, Shape: (1, 1, 28, 28), ElemSize: 8
Byte(s), TotalSize: 6272 Byte(s)
- GpuContiguous.0, Shape: (1, 1, 6, 6), ElemSize: 8 Byte(s), TotalSize:
288 Byte(s)
- <TensorType(float64, 4D)>, Input, Shape: (1, 1, 6, 6), ElemSize: 8
Byte(s), TotalSize: 288 Byte(s)
- Constant{-1}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- TensorConstant{1}, Shape: (), ElemSize: 1 Byte(s), TotalSize: 1.0 Byte(s)
TotalSize: 13129.0 Byte(s) 0.000 GB
TotalSize inputs: 6569.0 Byte(s) 0.000 GB
Am I doing something wrong here? Any idea how I might get around this? It
works if i split up the code into two functions, one that does the sampling
and returns out the tensor, and then one that takes in this result and does
the convolution. But it'd be really stupid to pass the value back out to
the CPU RAM from the GPU RAM just to get around this...
Any advice would be hugely appreciated!
Cheers,
Dave
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.