[theano-users] Error performing 2D convolution on Binomial distribution sample (gemm error)

Discussion:

David Anderson

2017-07-18 09:07:36 UTC

Hi there!

I'm implementing a convolutional operation and I'm getting an unexpected
error when I try perform a convolution on a Binomial sampled tensor.

The error is:
RuntimeError: GpuCorrMM forward encountered an error running gemm: 5

The error can be re-created with the following code (At least on my machine
it can):

import numpy as np
import theano as th
from theano import tensor as T
from theano.tensor.shared_randomstreams import RandomStreams

rng = np.random.RandomState()
theano_rng = RandomStreams(rng.randint(2 ** 30))

th_input = T.tensor4()
th_filter = T.tensor4()

th_sampled = theano_rng.binomial(size=th_input.shape, n=1, p=th_input)
th_output = T.nnet.conv2d(th_sampled, th_filter)

op = th.function(
inputs=[th_input, th_filter],
outputs=th_output
)

input_sample = np.random.rand(1, 1, 28, 28)
kernel = np.random.rand(1, 1, 6, 6)

op(input_sample, kernel)

Interestingly, the error is NOT shown for other distribution samples, like theano_rng.normal(),
which has type RandomFunction{normal}.1 instead
of RandomFunction{binomial}.1

For what its worth, my THEANO_FLAGS are as follows:
floatX=float64,device=cuda,nvcc.flags=-D_FORCE_INLINES,exception_verbosity=
high

The rest of the stack trace is as follows:
Traceback (most recent call last):
File "tmp2.py", line 23, in <module>
op(input_sample, kernel)
File
"/home/dave/miniconda2/lib/python2.7/site-packages/theano/compile/function_module.py",
line 898, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File
"/home/dave/miniconda2/lib/python2.7/site-packages/theano/gof/link.py",
line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File
"/home/dave/miniconda2/lib/python2.7/site-packages/theano/compile/function_module.py",
line 884, in __call__
self.fn() if output_subset is None else\
RuntimeError: GpuCorrMM forward encountered an error running gemm: 5
Apply node that caused the error: GpuCorrMM{valid, (1, 1), (1,
1)}(GpuContiguous.0, GpuContiguous.0)
Toposort index: 11
Inputs types: [GpuArrayType<None>(int64, (False, False, False, False)),
GpuArrayType<None>(float64, (False, False, False, False))]
Inputs shapes: [(1, 1, 28, 28), (1, 1, 6, 6)]
Inputs strides: [(6272, 6272, 224, 8), (288, 288, 48, 8)]
Inputs values: ['not shown', 'not shown']
Inputs type_num: [7, 12]
Outputs clients: [[HostFromGpu(gpuarray)(GpuCorrMM{valid, (1, 1), (1,
1)}.0)]]

Debugprint of the apply node:
GpuCorrMM{valid, (1, 1), (1, 1)} [id A] <GpuArrayType<None>(int64, (False,
False, False, False))> ''
|GpuContiguous [id B] <GpuArrayType<None>(int64, (False, False, False,
False))> ''
| |GpuFromHost<None> [id C] <GpuArrayType<None>(int64, (False, False,
False, False))> ''
| |RandomFunction{binomial}.1 [id D] <TensorType(int64, 4D)> ''
| |<RandomStateType> [id E] <RandomStateType>
| |MakeVector{dtype='int64'} [id F] <TensorType(int64, vector)> ''
| | |Shape_i{0} [id G] <TensorType(int64, scalar)> ''
| | | |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
| | |Shape_i{1} [id I] <TensorType(int64, scalar)> ''
| | | |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
| | |Shape_i{2} [id J] <TensorType(int64, scalar)> ''
| | | |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
| | |Shape_i{3} [id K] <TensorType(int64, scalar)> ''
| | |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
| |TensorConstant{1} [id L] <TensorType(int8, scalar)>
| |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
|GpuContiguous [id M] <GpuArrayType<None>(float64, (False, False, False,
False))> ''
|GpuFromHost<None> [id N] <GpuArrayType<None>(float64, (False, False,
False, False))> ''
|Subtensor{::, ::, ::int64, ::int64} [id O] <TensorType(float64, 4D)>
''
|<TensorType(float64, 4D)> [id P] <TensorType(float64, 4D)>
|Constant{-1} [id Q] <int64>
|Constant{-1} [id Q] <int64>

Storage map footprint:
- GpuContiguous.0, Shape: (1, 1, 28, 28), ElemSize: 8 Byte(s), TotalSize:
6272 Byte(s)
- <TensorType(float64, 4D)>, Input, Shape: (1, 1, 28, 28), ElemSize: 8
Byte(s), TotalSize: 6272 Byte(s)
- GpuContiguous.0, Shape: (1, 1, 6, 6), ElemSize: 8 Byte(s), TotalSize:
288 Byte(s)
- <TensorType(float64, 4D)>, Input, Shape: (1, 1, 6, 6), ElemSize: 8
Byte(s), TotalSize: 288 Byte(s)
- Constant{-1}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- TensorConstant{1}, Shape: (), ElemSize: 1 Byte(s), TotalSize: 1.0 Byte(s)
TotalSize: 13129.0 Byte(s) 0.000 GB
TotalSize inputs: 6569.0 Byte(s) 0.000 GB

Am I doing something wrong here? Any idea how I might get around this? It
works if i split up the code into two functions, one that does the sampling
and returns out the tensor, and then one that takes in this result and does
the convolution. But it'd be really stupid to pass the value back out to
the CPU RAM from the GPU RAM just to get around this...

Any advice would be hugely appreciated!

Cheers,
Dave

--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jesse Livezey

2017-07-18 19:18:53 UTC

Permalink

The conv2d operation doesn't support int64 (th_sampled) and it looks like
it doesn't fail gracefully with a sensible error message when the GpuCorrMM
op is used.

If you cast th_sampled to float32 it should work fine. You'll also need to
cast kernel.

Post by David Anderson
Hi there!
I'm implementing a convolutional operation and I'm getting an unexpected
error when I try perform a convolution on a Binomial sampled tensor.
RuntimeError: GpuCorrMM forward encountered an error running gemm: 5
The error can be re-created with the following code (At least on my
import numpy as np
import theano as th
from theano import tensor as T
from theano.tensor.shared_randomstreams import RandomStreams
rng = np.random.RandomState()
theano_rng = RandomStreams(rng.randint(2 ** 30))
th_input = T.tensor4()
th_filter = T.tensor4()
th_sampled = theano_rng.binomial(size=th_input.shape, n=1, p=th_input)
th_output = T.nnet.conv2d(th_sampled, th_filter)
op = th.function(
inputs=[th_input, th_filter],
outputs=th_output
)
input_sample = np.random.rand(1, 1, 28, 28)
kernel = np.random.rand(1, 1, 6, 6)
op(input_sample, kernel)
Interestingly, the error is NOT shown for other distribution samples, like theano_rng.normal(),
which has type RandomFunction{normal}.1 instead
of RandomFunction{binomial}.1
floatX=float64,device=cuda,nvcc.flags=-D_FORCE_INLINES,exception_verbosity
=high
File "tmp2.py", line 23, in <module>
op(input_sample, kernel)
File
"/home/dave/miniconda2/lib/python2.7/site-packages/theano/compile/function_module.py",
line 898, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File
"/home/dave/miniconda2/lib/python2.7/site-packages/theano/gof/link.py",
line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File
"/home/dave/miniconda2/lib/python2.7/site-packages/theano/compile/function_module.py",
line 884, in __call__
self.fn() if output_subset is None else\
RuntimeError: GpuCorrMM forward encountered an error running gemm: 5
Apply node that caused the error: GpuCorrMM{valid, (1, 1), (1,
1)}(GpuContiguous.0, GpuContiguous.0)
Toposort index: 11
Inputs types: [GpuArrayType<None>(int64, (False, False, False, False)),
GpuArrayType<None>(float64, (False, False, False, False))]
Inputs shapes: [(1, 1, 28, 28), (1, 1, 6, 6)]
Inputs strides: [(6272, 6272, 224, 8), (288, 288, 48, 8)]
Inputs values: ['not shown', 'not shown']
Inputs type_num: [7, 12]
Outputs clients: [[HostFromGpu(gpuarray)(GpuCorrMM{valid, (1, 1), (1,
1)}.0)]]
GpuCorrMM{valid, (1, 1), (1, 1)} [id A] <GpuArrayType<None>(int64, (False,
False, False, False))> ''
|GpuContiguous [id B] <GpuArrayType<None>(int64, (False, False, False,
False))> ''
| |GpuFromHost<None> [id C] <GpuArrayType<None>(int64, (False, False,
False, False))> ''
| |RandomFunction{binomial}.1 [id D] <TensorType(int64, 4D)> ''
| |<RandomStateType> [id E] <RandomStateType>
| |MakeVector{dtype='int64'} [id F] <TensorType(int64, vector)> ''
| | |Shape_i{0} [id G] <TensorType(int64, scalar)> ''
| | | |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
| | |Shape_i{1} [id I] <TensorType(int64, scalar)> ''
| | | |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
| | |Shape_i{2} [id J] <TensorType(int64, scalar)> ''
| | | |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
| | |Shape_i{3} [id K] <TensorType(int64, scalar)> ''
| | |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
| |TensorConstant{1} [id L] <TensorType(int8, scalar)>
| |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
|GpuContiguous [id M] <GpuArrayType<None>(float64, (False, False, False,
False))> ''
|GpuFromHost<None> [id N] <GpuArrayType<None>(float64, (False, False,
False, False))> ''
|Subtensor{::, ::, ::int64, ::int64} [id O] <TensorType(float64, 4D)>
''
|<TensorType(float64, 4D)> [id P] <TensorType(float64, 4D)>
|Constant{-1} [id Q] <int64>
|Constant{-1} [id Q] <int64>
6272 Byte(s)
- <TensorType(float64, 4D)>, Input, Shape: (1, 1, 28, 28), ElemSize: 8
Byte(s), TotalSize: 6272 Byte(s)
288 Byte(s)
- <TensorType(float64, 4D)>, Input, Shape: (1, 1, 6, 6), ElemSize: 8
Byte(s), TotalSize: 288 Byte(s)
- Constant{-1}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- TensorConstant{1}, Shape: (), ElemSize: 1 Byte(s), TotalSize: 1.0 Byte(s)
TotalSize: 13129.0 Byte(s) 0.000 GB
TotalSize inputs: 6569.0 Byte(s) 0.000 GB
Am I doing something wrong here? Any idea how I might get around this? It
works if i split up the code into two functions, one that does the sampling
and returns out the tensor, and then one that takes in this result and does
the convolution. But it'd be really stupid to pass the value back out to
the CPU RAM from the GPU RAM just to get around this...
Any advice would be hugely appreciated!
Cheers,
Dave

Jesse Livezey

2017-07-18 19:24:38 UTC

Permalink

FYI, I created an issue to improve the error messages
https://github.com/Theano/Theano/issues/6167

Post by Jesse Livezey
The conv2d operation doesn't support int64 (th_sampled) and it looks like
it doesn't fail gracefully with a sensible error message when the GpuCorrMM
op is used.
If you cast th_sampled to float32 it should work fine. You'll also need to
cast kernel.

Post by David Anderson
Hi there!
I'm implementing a convolutional operation and I'm getting an unexpected
error when I try perform a convolution on a Binomial sampled tensor.
RuntimeError: GpuCorrMM forward encountered an error running gemm: 5
The error can be re-created with the following code (At least on my
import numpy as np
import theano as th
from theano import tensor as T
from theano.tensor.shared_randomstreams import RandomStreams
rng = np.random.RandomState()
theano_rng = RandomStreams(rng.randint(2 ** 30))
th_input = T.tensor4()
th_filter = T.tensor4()
th_sampled = theano_rng.binomial(size=th_input.shape, n=1, p=th_input)
th_output = T.nnet.conv2d(th_sampled, th_filter)
op = th.function(
inputs=[th_input, th_filter],
outputs=th_output
)
input_sample = np.random.rand(1, 1, 28, 28)
kernel = np.random.rand(1, 1, 6, 6)
op(input_sample, kernel)
Interestingly, the error is NOT shown for other distribution samples,
like theano_rng.normal(), which has type RandomFunction{normal}.1
instead of RandomFunction{binomial}.1
floatX=float64,device=cuda,nvcc.flags=-D_FORCE_INLINES,
exception_verbosity=high
File "tmp2.py", line 23, in <module>
op(input_sample, kernel)
File
"/home/dave/miniconda2/lib/python2.7/site-packages/theano/compile/function_module.py",
line 898, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File
"/home/dave/miniconda2/lib/python2.7/site-packages/theano/gof/link.py",
line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File
"/home/dave/miniconda2/lib/python2.7/site-packages/theano/compile/function_module.py",
line 884, in __call__
self.fn() if output_subset is None else\
RuntimeError: GpuCorrMM forward encountered an error running gemm: 5
Apply node that caused the error: GpuCorrMM{valid, (1, 1), (1,
1)}(GpuContiguous.0, GpuContiguous.0)
Toposort index: 11
Inputs types: [GpuArrayType<None>(int64, (False, False, False, False)),
GpuArrayType<None>(float64, (False, False, False, False))]
Inputs shapes: [(1, 1, 28, 28), (1, 1, 6, 6)]
Inputs strides: [(6272, 6272, 224, 8), (288, 288, 48, 8)]
Inputs values: ['not shown', 'not shown']
Inputs type_num: [7, 12]
Outputs clients: [[HostFromGpu(gpuarray)(GpuCorrMM{valid, (1, 1), (1,
1)}.0)]]
GpuCorrMM{valid, (1, 1), (1, 1)} [id A] <GpuArrayType<None>(int64,
(False, False, False, False))> ''
|GpuContiguous [id B] <GpuArrayType<None>(int64, (False, False, False,
False))> ''
| |GpuFromHost<None> [id C] <GpuArrayType<None>(int64, (False, False,
False, False))> ''
| |RandomFunction{binomial}.1 [id D] <TensorType(int64, 4D)> ''
| |<RandomStateType> [id E] <RandomStateType>
| |MakeVector{dtype='int64'} [id F] <TensorType(int64, vector)> ''
| | |Shape_i{0} [id G] <TensorType(int64, scalar)> ''
| | | |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
| | |Shape_i{1} [id I] <TensorType(int64, scalar)> ''
| | | |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
| | |Shape_i{2} [id J] <TensorType(int64, scalar)> ''
| | | |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
| | |Shape_i{3} [id K] <TensorType(int64, scalar)> ''
| | |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
| |TensorConstant{1} [id L] <TensorType(int8, scalar)>
| |<TensorType(float64, 4D)> [id H] <TensorType(float64, 4D)>
|GpuContiguous [id M] <GpuArrayType<None>(float64, (False, False, False,
False))> ''
|GpuFromHost<None> [id N] <GpuArrayType<None>(float64, (False, False,
False, False))> ''
|Subtensor{::, ::, ::int64, ::int64} [id O] <TensorType(float64,
4D)> ''
|<TensorType(float64, 4D)> [id P] <TensorType(float64, 4D)>
|Constant{-1} [id Q] <int64>
|Constant{-1} [id Q] <int64>
- GpuContiguous.0, Shape: (1, 1, 28, 28), ElemSize: 8 Byte(s),
TotalSize: 6272 Byte(s)
- <TensorType(float64, 4D)>, Input, Shape: (1, 1, 28, 28), ElemSize: 8
Byte(s), TotalSize: 6272 Byte(s)
288 Byte(s)
- <TensorType(float64, 4D)>, Input, Shape: (1, 1, 6, 6), ElemSize: 8
Byte(s), TotalSize: 288 Byte(s)
- Constant{-1}, Shape: (), ElemSize: 8 Byte(s), TotalSize: 8.0 Byte(s)
- TensorConstant{1}, Shape: (), ElemSize: 1 Byte(s), TotalSize: 1.0 Byte(s)
TotalSize: 13129.0 Byte(s) 0.000 GB
TotalSize inputs: 6569.0 Byte(s) 0.000 GB
Am I doing something wrong here? Any idea how I might get around this? It
works if i split up the code into two functions, one that does the sampling
and returns out the tensor, and then one that takes in this result and does
the convolution. But it'd be really stupid to pass the value back out to
the CPU RAM from the GPU RAM just to get around this...
Any advice would be hugely appreciated!
Cheers,
Dave