[theano-users] CUDNN_STATUS_EXECUTION_FAILED when using dnn.conv.algo_bwd

Discussion:

[theano-users] CUDNN_STATUS_EXECUTION_FAILED when using dnn.conv.algo_bwd_filter=deterministic

Fabian Stemmer

2017-06-07 15:19:31 UTC

Hi,

I'm using theano.tensor.nnet.conv2d in my model and I want to set
dnn.conv.algo_bwd_filter=deterministic to make this run deterministically
on GPUs. I work on three different GPU architectures (K10, M40, P6000) and
setting the mentioned flag works well on the K10, but fails with error
message CUDNN_STATUS_EXECUTION_FAILED on the other two. I have tried
several combinations of theano, nvidia driver and cuDNN versions, but none
fix the issue.

Below are details about the respective GPU configurations I tried and the
full error message. Any help you can give me is greatly appreciated.

Thanks
Fabian

*Shared setup (all GPUs):*Theano 0.8.2 / 0.9.0 / 0.10.0.dev1 (commit
6b59449186b04225484b98951192c5867e0719ca, which was the latest at the time
of this writing)
cuda 8.0
cuDNN 5105
THEANO_FLAGS=mode=FAST_RUN,floatX=float32,lib.cnmem=1,
*dnn.conv.algo_bwd_filter=deterministic*,device=cuda //device=gpu for
theano 0.8.2

*GPU and Nvidia driver:*
Tesla K10 Architecture (Driver 361.93.03)
Tesla M40 Architecture (Driver: 375.26)
Quadro P6000 (Driver 375.26)

Alternative driver versions (all tested on Tesla M40):

1. 361.93.03 - Current Production Driver on K10/K20/K80 servers - No
difference. Application fails on the M40 node
2. 375.26 - Current Production driver on M40/P100/P6000 servers - App
fails
3. 375.51 - Most recent driver with CUDA Repo equivalent - App fails
4. 375.66 - Most recent official driver for Quadro/Tesla cards - App
fails

I also tried upgrading to cuDNN 6.0 and still got the same error.

*Full error message (on Quadro P6000, using theano 0.10.0.dev1:*

Using cuDNN version 5105 on context None
Mapped name None to device cuda: Quadro P6000 (0000:04:00.0)
Traceback (most recent call last):
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/bin/n3lu_train",
line 9, in <module>
load_entry_point('n3lu', 'console_scripts', 'n3lu_train')()
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
line 507, in main
valid_error, test_error = exp.run()
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
line 475, in run
return self.run_one(self.train_corpus, self.valid_corpus)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
line 384, in run_one
learner.run()
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/learning.py",
line 448, in run
train_outputs = self.train(*batch)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/compile/function_module.py",
line 898, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/gof/link.py",
line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/compile/function_module.py",
line 884, in __call__
self.fn() if output_subset is None else\
*RuntimeError: error doing operation: CUDNN_STATUS_EXECUTION_FAILED*
Apply node that caused the error: GpuDnnConvGradW{algo='deterministic',
inplace=True}(GpuContiguous.0, GpuContiguous.0,
GpuAllocEmpty{dtype='float32', context_name=None}.0,
GpuDnnConvDesc{border_mode=(1, 0), subsample=(1, 1), conv_mode='cross',
precision='float32'}.0, Constant{1.0}, Constant{0.0})
Toposort index: 234
Inputs types: [GpuArrayType<None>(float32, (True, True, False, False)),
GpuArrayType<None>(float32, (True, False, False, False)),
GpuArrayType<None>(float32, (False, True, False, False)),
<theano.gof.type.CDataType object at 0x7ff56926a090>, Scalar(float32),
Scalar(float32)]
Inputs shapes: [(1, 1, 541211, 10), (1, 50, 541211, 1), (50, 1, 3, 10), 'No
shapes', (), ()]
Inputs strides: [(21648440, 21648440, 40, 4), (108242200, 2164844, 4, 4),
(120, 120, 40, 4), 'No strides', (), ()]
Inputs values: ['not shown', 'not shown', 'not shown', <capsule object NULL
at 0x7ff55d00fe10>, 1.0, 0.0]
Outputs clients: [[GpuIncSubtensor{Inc;::, ::, ::,
int64:int64:}(GpuAlloc<None>{memset_0=True}.0,
GpuDnnConvGradW{algo='deterministic', inplace=True}.0, Constant{0},
Constant{10})]]

--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Pascal Lamblin

2017-06-19 19:39:17 UTC

Permalink

Hi,

Unfortunately, it looks like a runtime issue in cuDNN rather than somehting
in the Theano wrapper, but I could be wrong.
A recent PR introduced more algorithms that you can specify for
dnn.conv.algo_bwd_filter. In particular,
dnn.conv.algo_bwd_filter=fft_tiling should be deterministic as well.

Does it work with an input and kernel that are smaller than 541211 on that
dimension?
Does it work using corrMM instead of cuDNN?

Post by Fabian Stemmer
Hi,
I'm using theano.tensor.nnet.conv2d in my model and I want to set
dnn.conv.algo_bwd_filter=deterministic to make this run deterministically
on GPUs. I work on three different GPU architectures (K10, M40, P6000) and
setting the mentioned flag works well on the K10, but fails with error
message CUDNN_STATUS_EXECUTION_FAILED on the other two. I have tried
several combinations of theano, nvidia driver and cuDNN versions, but none
fix the issue.
Below are details about the respective GPU configurations I tried and the
full error message. Any help you can give me is greatly appreciated.
Thanks
Fabian
*Shared setup (all GPUs):*Theano 0.8.2 / 0.9.0 / 0.10.0.dev1 (commit
6b59449186b04225484b98951192c5867e0719ca, which was the latest at the time
of this writing)
cuda 8.0
cuDNN 5105
THEANO_FLAGS=mode=FAST_RUN,floatX=float32,lib.cnmem=1,
*dnn.conv.algo_bwd_filter=deterministic*,device=cuda //device=gpu for
theano 0.8.2
*GPU and Nvidia driver:*
Tesla K10 Architecture (Driver 361.93.03)
Tesla M40 Architecture (Driver: 375.26)
Quadro P6000 (Driver 375.26)
1. 361.93.03 - Current Production Driver on K10/K20/K80 servers - No
difference. Application fails on the M40 node
2. 375.26 - Current Production driver on M40/P100/P6000 servers - App
fails
3. 375.51 - Most recent driver with CUDA Repo equivalent - App fails
4. 375.66 - Most recent official driver for Quadro/Tesla cards - App
fails
I also tried upgrading to cuDNN 6.0 and still got the same error.
*Full error message (on Quadro P6000, using theano 0.10.0.dev1:*
Using cuDNN version 5105 on context None
Mapped name None to device cuda: Quadro P6000 (0000:04:00.0)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/bin/n3lu_train",
line 9, in <module>
load_entry_point('n3lu', 'console_scripts', 'n3lu_train')()
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
line 507, in main
valid_error, test_error = exp.run()
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
line 475, in run
return self.run_one(self.train_corpus, self.valid_corpus)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
line 384, in run_one
learner.run()
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/learning.py",
line 448, in run
train_outputs = self.train(*batch)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/compile/function_module.py",
line 898, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/gof/link.py",
line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/compile/function_module.py",
line 884, in __call__
self.fn() if output_subset is None else\
*RuntimeError: error doing operation: CUDNN_STATUS_EXECUTION_FAILED*
Apply node that caused the error: GpuDnnConvGradW{algo='deterministic',
inplace=True}(GpuContiguous.0, GpuContiguous.0,
GpuAllocEmpty{dtype='float32', context_name=None}.0,
GpuDnnConvDesc{border_mode=(1, 0), subsample=(1, 1), conv_mode='cross',
precision='float32'}.0, Constant{1.0}, Constant{0.0})
Toposort index: 234
Inputs types: [GpuArrayType<None>(float32, (True, True, False, False)),
GpuArrayType<None>(float32, (True, False, False, False)),
GpuArrayType<None>(float32, (False, True, False, False)),
<theano.gof.type.CDataType object at 0x7ff56926a090>, Scalar(float32),
Scalar(float32)]
Inputs shapes: [(1, 1, 541211, 10), (1, 50, 541211, 1), (50, 1, 3, 10),
'No shapes', (), ()]
Inputs strides: [(21648440, 21648440, 40, 4), (108242200, 2164844, 4, 4),
(120, 120, 40, 4), 'No strides', (), ()]
Inputs values: ['not shown', 'not shown', 'not shown', <capsule object
NULL at 0x7ff55d00fe10>, 1.0, 0.0]
Outputs clients: [[GpuIncSubtensor{Inc;::, ::, ::,
int64:int64:}(GpuAlloc<None>{memset_0=True}.0,
GpuDnnConvGradW{algo='deterministic', inplace=True}.0, Constant{0},
Constant{10})]]

Pascal Lamblin

2017-06-19 20:02:22 UTC

Permalink

Post by Pascal Lamblin
Hi,
Unfortunately, it looks like a runtime issue in cuDNN rather than
somehting in the Theano wrapper, but I could be wrong.
A recent PR introduced more algorithms that you can specify for
dnn.conv.algo_bwd_filter. In particular,
dnn.conv.algo_bwd_filter=fft_tiling should be deterministic as well.

Actually, I just realized the value gets rejected by the configuration, but
if we bypass it in theano/configdefaults.py it should work. This should be
fixed soon.

Post by Pascal Lamblin
Does it work with an input and kernel that are smaller than 541211 on that
dimension?
Does it work using corrMM instead of cuDNN?

Frédéric Bastien

2017-06-19 22:15:31 UTC

Permalink

Try cudnn v6. The GPU that have problem are more recent. Maybe it was not
implemented case in v5.

Post by Pascal Lamblin

Actually, I just realized the value gets rejected by the configuration,
but if we bypass it in theano/configdefaults.py it should work. This should
be fixed soon.

Post by Pascal Lamblin
Does it work with an input and kernel that are smaller than 541211 on
that dimension?
Does it work using corrMM instead of cuDNN?

Post by Fabian Stemmer
Hi,
I'm using theano.tensor.nnet.conv2d in my model and I want to set
dnn.conv.algo_bwd_filter=deterministic to make this run deterministically
on GPUs. I work on three different GPU architectures (K10, M40, P6000) and
setting the mentioned flag works well on the K10, but fails with error
message CUDNN_STATUS_EXECUTION_FAILED on the other two. I have tried
several combinations of theano, nvidia driver and cuDNN versions, but none
fix the issue.
Below are details about the respective GPU configurations I tried and
the full error message. Any help you can give me is greatly appreciated.
Thanks
Fabian
*Shared setup (all GPUs):*Theano 0.8.2 / 0.9.0 / 0.10.0.dev1 (commit
6b59449186b04225484b98951192c5867e0719ca, which was the latest at the time
of this writing)
cuda 8.0
cuDNN 5105
THEANO_FLAGS=mode=FAST_RUN,floatX=float32,lib.cnmem=1,
*dnn.conv.algo_bwd_filter=deterministic*,device=cuda //device=gpu for
theano 0.8.2
*GPU and Nvidia driver:*
Tesla K10 Architecture (Driver 361.93.03)
Tesla M40 Architecture (Driver: 375.26)
Quadro P6000 (Driver 375.26)
1. 361.93.03 - Current Production Driver on K10/K20/K80 servers - No
difference. Application fails on the M40 node
2. 375.26 - Current Production driver on M40/P100/P6000 servers -
App fails
3. 375.51 - Most recent driver with CUDA Repo equivalent - App fails
4. 375.66 - Most recent official driver for Quadro/Tesla cards - App
fails
I also tried upgrading to cuDNN 6.0 and still got the same error.
*Full error message (on Quadro P6000, using theano 0.10.0.dev1:*
Using cuDNN version 5105 on context None
Mapped name None to device cuda: Quadro P6000 (0000:04:00.0)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/bin/n3lu_train",
line 9, in <module>
load_entry_point('n3lu', 'console_scripts', 'n3lu_train')()
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
line 507, in main
valid_error, test_error = exp.run()
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
line 475, in run
return self.run_one(self.train_corpus, self.valid_corpus)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
line 384, in run_one
learner.run()
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/learning.py",
line 448, in run
train_outputs = self.train(*batch)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/compile/function_module.py",
line 898, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/gof/link.py",
line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/compile/function_module.py",
line 884, in __call__
self.fn() if output_subset is None else\
*RuntimeError: error doing operation: CUDNN_STATUS_EXECUTION_FAILED*
Apply node that caused the error: GpuDnnConvGradW{algo='deterministic',
inplace=True}(GpuContiguous.0, GpuContiguous.0,
GpuAllocEmpty{dtype='float32', context_name=None}.0,
GpuDnnConvDesc{border_mode=(1, 0), subsample=(1, 1), conv_mode='cross',
precision='float32'}.0, Constant{1.0}, Constant{0.0})
Toposort index: 234
Inputs types: [GpuArrayType<None>(float32, (True, True, False, False)),
GpuArrayType<None>(float32, (True, False, False, False)),
GpuArrayType<None>(float32, (False, True, False, False)),
<theano.gof.type.CDataType object at 0x7ff56926a090>, Scalar(float32),
Scalar(float32)]
Inputs shapes: [(1, 1, 541211, 10), (1, 50, 541211, 1), (50, 1, 3, 10),
'No shapes', (), ()]
Inputs strides: [(21648440, 21648440, 40, 4), (108242200, 2164844, 4,
4), (120, 120, 40, 4), 'No strides', (), ()]
Inputs values: ['not shown', 'not shown', 'not shown', <capsule object
NULL at 0x7ff55d00fe10>, 1.0, 0.0]
Outputs clients: [[GpuIncSubtensor{Inc;::, ::, ::,
int64:int64:}(GpuAlloc<None>{memset_0=True}.0,
GpuDnnConvGradW{algo='deterministic', inplace=True}.0, Constant{0},
Constant{10})]]
--

---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

Fabian Stemmer

2017-06-20 09:28:34 UTC

Permalink

I tried using cudnn v6, but still got the same error.

I also added 'fft_tiling' to SUPPORTED_DNN_CONV_ALGO_RUNTIME in
cofigdefaults.py, to be able to test it, but still got the cuDNN error (see
below).

I then added 'optimizer_excluding=conv_dnn' to my THEANO_FLAGS, which gave
me GpuCorrMM nodes in the computational graph. This runs without errors.

GpuCorrMM gives me deterministic results, so I can use it as an alternative
to the deterministic cuDNN algorithm.

Thanks for your help.

Post by FrÃ©dÃ©ric Bastien
Try cudnn v6. The GPU that have problem are more recent. Maybe it was not
implemented case in v5.

Post by Pascal Lamblin

Actually, I just realized the value gets rejected by the configuration,
but if we bypass it in theano/configdefaults.py it should work. This should
be fixed soon.

Post by Pascal Lamblin
Does it work with an input and kernel that are smaller than 541211 on
that dimension?
Does it work using corrMM instead of cuDNN?

Post by Fabian Stemmer
Hi,
I'm using theano.tensor.nnet.conv2d in my model and I want to set
dnn.conv.algo_bwd_filter=deterministic to make this run deterministically
on GPUs. I work on three different GPU architectures (K10, M40, P6000) and
setting the mentioned flag works well on the K10, but fails with error
message CUDNN_STATUS_EXECUTION_FAILED on the other two. I have tried
several combinations of theano, nvidia driver and cuDNN versions, but none
fix the issue.
Below are details about the respective GPU configurations I tried and
the full error message. Any help you can give me is greatly appreciated.
Thanks
Fabian
*Shared setup (all GPUs):*Theano 0.8.2 / 0.9.0 / 0.10.0.dev1 (commit
6b59449186b04225484b98951192c5867e0719ca, which was the latest at the time
of this writing)
cuda 8.0
cuDNN 5105
THEANO_FLAGS=mode=FAST_RUN,floatX=float32,lib.cnmem=1,
*dnn.conv.algo_bwd_filter=deterministic*,device=cuda //device=gpu for
theano 0.8.2
*GPU and Nvidia driver:*
Tesla K10 Architecture (Driver 361.93.03)
Tesla M40 Architecture (Driver: 375.26)
Quadro P6000 (Driver 375.26)
1. 361.93.03 - Current Production Driver on K10/K20/K80 servers -
No difference. Application fails on the M40 node
2. 375.26 - Current Production driver on M40/P100/P6000 servers -
App fails
3. 375.51 - Most recent driver with CUDA Repo equivalent - App fails
4. 375.66 - Most recent official driver for Quadro/Tesla cards -
App fails
I also tried upgrading to cuDNN 6.0 and still got the same error.
*Full error message (on Quadro P6000, using theano 0.10.0.dev1:*
Using cuDNN version 5105 on context None
Mapped name None to device cuda: Quadro P6000 (0000:04:00.0)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/bin/n3lu_train",
line 9, in <module>
load_entry_point('n3lu', 'console_scripts', 'n3lu_train')()
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
line 507, in main
valid_error, test_error = exp.run()
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
line 475, in run
return self.run_one(self.train_corpus, self.valid_corpus)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
line 384, in run_one
learner.run()
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/learning.py",
line 448, in run
train_outputs = self.train(*batch)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/compile/function_module.py",
line 898, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/gof/link.py",
line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/compile/function_module.py",
line 884, in __call__
self.fn() if output_subset is None else\
*RuntimeError: error doing operation: CUDNN_STATUS_EXECUTION_FAILED*
Apply node that caused the error: GpuDnnConvGradW{algo='deterministic',
inplace=True}(GpuContiguous.0, GpuContiguous.0,
GpuAllocEmpty{dtype='float32', context_name=None}.0,
GpuDnnConvDesc{border_mode=(1, 0), subsample=(1, 1), conv_mode='cross',
precision='float32'}.0, Constant{1.0}, Constant{0.0})
Toposort index: 234
Inputs types: [GpuArrayType<None>(float32, (True, True, False, False)),
GpuArrayType<None>(float32, (True, False, False, False)),
GpuArrayType<None>(float32, (False, True, False, False)),
<theano.gof.type.CDataType object at 0x7ff56926a090>, Scalar(float32),
Scalar(float32)]
Inputs shapes: [(1, 1, 541211, 10), (1, 50, 541211, 1), (50, 1, 3, 10),
'No shapes', (), ()]
Inputs strides: [(21648440, 21648440, 40, 4), (108242200, 2164844, 4,
4), (120, 120, 40, 4), 'No strides', (), ()]
Inputs values: ['not shown', 'not shown', 'not shown', <capsule object
NULL at 0x7ff55d00fe10>, 1.0, 0.0]
Outputs clients: [[GpuIncSubtensor{Inc;::, ::, ::,
int64:int64:}(GpuAlloc<None>{memset_0=True}.0,
GpuDnnConvGradW{algo='deterministic', inplace=True}.0, Constant{0},
Constant{10})]]
--

Pascal Lamblin

2017-06-20 15:40:42 UTC

Permalink

Thanks for the update.
I managed to reproduce the issue with cuDNN v6 as well, with a simple
script (below).
- with 'deterministic' it fails with CUDNN_STATUS_EXECUTION_FAILED
- with 'fft_tiling' it fails with CUDNN_STATUS_NOT_SUPPORTED
- with 'fft', surprisingly, it works. 'fft' is supposed to be deterministic
as well, so you could also use that one.

Thanks for the report, we'll forward that to Nvidia.

```
import theano
import numpy as np

x = theano.shared(np.ones((1, 1, 541211, 10), 'f'))
y = theano.shared(np.ones((1, 50, 541211, 1), 'f'))
z = theano.tensor.nnet.abstract_conv.conv2d_grad_wrt_weights(x, y,
filter_shape=(50, 1, 3, 10), border_mode=(1, 0), filter_flip=False)
f = theano.function([], z)
f()
```

Post by Fabian Stemmer
I tried using cudnn v6, but still got the same error.
I also added 'fft_tiling' to SUPPORTED_DNN_CONV_ALGO_RUNTIME in
cofigdefaults.py, to be able to test it, but still got the cuDNN error (see
below).
I then added 'optimizer_excluding=conv_dnn' to my THEANO_FLAGS, which gave
me GpuCorrMM nodes in the computational graph. This runs without errors.
GpuCorrMM gives me deterministic results, so I can use it as an
alternative to the deterministic cuDNN algorithm.
Thanks for your help.

Post by FrÃ©dÃ©ric Bastien
Try cudnn v6. The GPU that have problem are more recent. Maybe it was not
implemented case in v5.

Post by Pascal Lamblin

Actually, I just realized the value gets rejected by the configuration,
but if we bypass it in theano/configdefaults.py it should work. This should
be fixed soon.

Post by Pascal Lamblin
Does it work with an input and kernel that are smaller than 541211 on
that dimension?
Does it work using corrMM instead of cuDNN?

Post by Fabian Stemmer
Hi,
I'm using theano.tensor.nnet.conv2d in my model and I want to set
dnn.conv.algo_bwd_filter=deterministic to make this run deterministically
on GPUs. I work on three different GPU architectures (K10, M40, P6000) and
setting the mentioned flag works well on the K10, but fails with error
message CUDNN_STATUS_EXECUTION_FAILED on the other two. I have tried
several combinations of theano, nvidia driver and cuDNN versions, but none
fix the issue.
Below are details about the respective GPU configurations I tried and
the full error message. Any help you can give me is greatly appreciated.
Thanks
Fabian
*Shared setup (all GPUs):*Theano 0.8.2 / 0.9.0 / 0.10.0.dev1 (commit
6b59449186b04225484b98951192c5867e0719ca, which was the latest at the time
of this writing)
cuda 8.0
cuDNN 5105
THEANO_FLAGS=mode=FAST_RUN,floatX=float32,lib.cnmem=1,
*dnn.conv.algo_bwd_filter=deterministic*,device=cuda //device=gpu for
theano 0.8.2
*GPU and Nvidia driver:*
Tesla K10 Architecture (Driver 361.93.03)
Tesla M40 Architecture (Driver: 375.26)
Quadro P6000 (Driver 375.26)
1. 361.93.03 - Current Production Driver on K10/K20/K80 servers -
No difference. Application fails on the M40 node
2. 375.26 - Current Production driver on M40/P100/P6000 servers -
App fails
3. 375.51 - Most recent driver with CUDA Repo equivalent - App fails
4. 375.66 - Most recent official driver for Quadro/Tesla cards -
App fails
I also tried upgrading to cuDNN 6.0 and still got the same error.
*Full error message (on Quadro P6000, using theano 0.10.0.dev1:*
Using cuDNN version 5105 on context None
Mapped name None to device cuda: Quadro P6000 (0000:04:00.0)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/bin/n3lu_train",
line 9, in <module>
load_entry_point('n3lu', 'console_scripts', 'n3lu_train')()
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
line 507, in main
valid_error, test_error = exp.run()
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
line 475, in run
return self.run_one(self.train_corpus, self.valid_corpus)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
line 384, in run_one
learner.run()
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/learning.py",
line 448, in run
train_outputs = self.train(*batch)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/compile/function_module.py",
line 898, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/gof/link.py",
line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/compile/function_module.py",
line 884, in __call__
self.fn() if output_subset is None else\
*RuntimeError: error doing operation: CUDNN_STATUS_EXECUTION_FAILED*
GpuDnnConvGradW{algo='deterministic', inplace=True}(GpuContiguous.0,
GpuContiguous.0, GpuAllocEmpty{dtype='float32', context_name=None}.0,
GpuDnnConvDesc{border_mode=(1, 0), subsample=(1, 1), conv_mode='cross',
precision='float32'}.0, Constant{1.0}, Constant{0.0})
Toposort index: 234
Inputs types: [GpuArrayType<None>(float32, (True, True, False,
False)), GpuArrayType<None>(float32, (True, False, False, False)),
GpuArrayType<None>(float32, (False, True, False, False)),
<theano.gof.type.CDataType object at 0x7ff56926a090>, Scalar(float32),
Scalar(float32)]
Inputs shapes: [(1, 1, 541211, 10), (1, 50, 541211, 1), (50, 1, 3,
10), 'No shapes', (), ()]
Inputs strides: [(21648440, 21648440, 40, 4), (108242200, 2164844, 4,
4), (120, 120, 40, 4), 'No strides', (), ()]
Inputs values: ['not shown', 'not shown', 'not shown', <capsule object
NULL at 0x7ff55d00fe10>, 1.0, 0.0]
Outputs clients: [[GpuIncSubtensor{Inc;::, ::, ::,
int64:int64:}(GpuAlloc<None>{memset_0=True}.0,
GpuDnnConvGradW{algo='deterministic', inplace=True}.0, Constant{0},
Constant{10})]]
--

---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.