Fabian Stemmer
2017-06-07 15:19:31 UTC
Hi,
I'm using theano.tensor.nnet.conv2d in my model and I want to set
dnn.conv.algo_bwd_filter=deterministic to make this run deterministically
on GPUs. I work on three different GPU architectures (K10, M40, P6000) and
setting the mentioned flag works well on the K10, but fails with error
message CUDNN_STATUS_EXECUTION_FAILED on the other two. I have tried
several combinations of theano, nvidia driver and cuDNN versions, but none
fix the issue.
Below are details about the respective GPU configurations I tried and the
full error message. Any help you can give me is greatly appreciated.
Thanks
Fabian
*Shared setup (all GPUs):*Theano 0.8.2 / 0.9.0 / 0.10.0.dev1 (commit
6b59449186b04225484b98951192c5867e0719ca, which was the latest at the time
of this writing)
cuda 8.0
cuDNN 5105
THEANO_FLAGS=mode=FAST_RUN,floatX=float32,lib.cnmem=1,
*dnn.conv.algo_bwd_filter=deterministic*,device=cuda //device=gpu for
theano 0.8.2
*GPU and Nvidia driver:*
Tesla K10 Architecture (Driver 361.93.03)
Tesla M40 Architecture (Driver: 375.26)
Quadro P6000 (Driver 375.26)
Alternative driver versions (all tested on Tesla M40):
1. 361.93.03 - Current Production Driver on K10/K20/K80 servers - No
difference. Application fails on the M40 node
2. 375.26 - Current Production driver on M40/P100/P6000 servers - App
fails
3. 375.51 - Most recent driver with CUDA Repo equivalent - App fails
4. 375.66 - Most recent official driver for Quadro/Tesla cards - App
fails
I also tried upgrading to cuDNN 6.0 and still got the same error.
*Full error message (on Quadro P6000, using theano 0.10.0.dev1:*
Using cuDNN version 5105 on context None
Mapped name None to device cuda: Quadro P6000 (0000:04:00.0)
Traceback (most recent call last):
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/bin/n3lu_train",
line 9, in <module>
load_entry_point('n3lu', 'console_scripts', 'n3lu_train')()
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
line 507, in main
valid_error, test_error = exp.run()
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
line 475, in run
return self.run_one(self.train_corpus, self.valid_corpus)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
line 384, in run_one
learner.run()
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/learning.py",
line 448, in run
train_outputs = self.train(*batch)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/compile/function_module.py",
line 898, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/gof/link.py",
line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/compile/function_module.py",
line 884, in __call__
self.fn() if output_subset is None else\
*RuntimeError: error doing operation: CUDNN_STATUS_EXECUTION_FAILED*
Apply node that caused the error: GpuDnnConvGradW{algo='deterministic',
inplace=True}(GpuContiguous.0, GpuContiguous.0,
GpuAllocEmpty{dtype='float32', context_name=None}.0,
GpuDnnConvDesc{border_mode=(1, 0), subsample=(1, 1), conv_mode='cross',
precision='float32'}.0, Constant{1.0}, Constant{0.0})
Toposort index: 234
Inputs types: [GpuArrayType<None>(float32, (True, True, False, False)),
GpuArrayType<None>(float32, (True, False, False, False)),
GpuArrayType<None>(float32, (False, True, False, False)),
<theano.gof.type.CDataType object at 0x7ff56926a090>, Scalar(float32),
Scalar(float32)]
Inputs shapes: [(1, 1, 541211, 10), (1, 50, 541211, 1), (50, 1, 3, 10), 'No
shapes', (), ()]
Inputs strides: [(21648440, 21648440, 40, 4), (108242200, 2164844, 4, 4),
(120, 120, 40, 4), 'No strides', (), ()]
Inputs values: ['not shown', 'not shown', 'not shown', <capsule object NULL
at 0x7ff55d00fe10>, 1.0, 0.0]
Outputs clients: [[GpuIncSubtensor{Inc;::, ::, ::,
int64:int64:}(GpuAlloc<None>{memset_0=True}.0,
GpuDnnConvGradW{algo='deterministic', inplace=True}.0, Constant{0},
Constant{10})]]
I'm using theano.tensor.nnet.conv2d in my model and I want to set
dnn.conv.algo_bwd_filter=deterministic to make this run deterministically
on GPUs. I work on three different GPU architectures (K10, M40, P6000) and
setting the mentioned flag works well on the K10, but fails with error
message CUDNN_STATUS_EXECUTION_FAILED on the other two. I have tried
several combinations of theano, nvidia driver and cuDNN versions, but none
fix the issue.
Below are details about the respective GPU configurations I tried and the
full error message. Any help you can give me is greatly appreciated.
Thanks
Fabian
*Shared setup (all GPUs):*Theano 0.8.2 / 0.9.0 / 0.10.0.dev1 (commit
6b59449186b04225484b98951192c5867e0719ca, which was the latest at the time
of this writing)
cuda 8.0
cuDNN 5105
THEANO_FLAGS=mode=FAST_RUN,floatX=float32,lib.cnmem=1,
*dnn.conv.algo_bwd_filter=deterministic*,device=cuda //device=gpu for
theano 0.8.2
*GPU and Nvidia driver:*
Tesla K10 Architecture (Driver 361.93.03)
Tesla M40 Architecture (Driver: 375.26)
Quadro P6000 (Driver 375.26)
Alternative driver versions (all tested on Tesla M40):
1. 361.93.03 - Current Production Driver on K10/K20/K80 servers - No
difference. Application fails on the M40 node
2. 375.26 - Current Production driver on M40/P100/P6000 servers - App
fails
3. 375.51 - Most recent driver with CUDA Repo equivalent - App fails
4. 375.66 - Most recent official driver for Quadro/Tesla cards - App
fails
I also tried upgrading to cuDNN 6.0 and still got the same error.
*Full error message (on Quadro P6000, using theano 0.10.0.dev1:*
Using cuDNN version 5105 on context None
Mapped name None to device cuda: Quadro P6000 (0000:04:00.0)
Traceback (most recent call last):
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/bin/n3lu_train",
line 9, in <module>
load_entry_point('n3lu', 'console_scripts', 'n3lu_train')()
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
line 507, in main
valid_error, test_error = exp.run()
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
line 475, in run
return self.run_one(self.train_corpus, self.valid_corpus)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/training.py",
line 384, in run_one
learner.run()
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/n3lu/n3lu/learning.py",
line 448, in run
train_outputs = self.train(*batch)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/compile/function_module.py",
line 898, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/gof/link.py",
line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File
"/gpfs/hcnlp/data/users/fabian_stemmer/n3lu/environments/n3lu_0.5.2/py/lib/python2.7/site-packages/theano/compile/function_module.py",
line 884, in __call__
self.fn() if output_subset is None else\
*RuntimeError: error doing operation: CUDNN_STATUS_EXECUTION_FAILED*
Apply node that caused the error: GpuDnnConvGradW{algo='deterministic',
inplace=True}(GpuContiguous.0, GpuContiguous.0,
GpuAllocEmpty{dtype='float32', context_name=None}.0,
GpuDnnConvDesc{border_mode=(1, 0), subsample=(1, 1), conv_mode='cross',
precision='float32'}.0, Constant{1.0}, Constant{0.0})
Toposort index: 234
Inputs types: [GpuArrayType<None>(float32, (True, True, False, False)),
GpuArrayType<None>(float32, (True, False, False, False)),
GpuArrayType<None>(float32, (False, True, False, False)),
<theano.gof.type.CDataType object at 0x7ff56926a090>, Scalar(float32),
Scalar(float32)]
Inputs shapes: [(1, 1, 541211, 10), (1, 50, 541211, 1), (50, 1, 3, 10), 'No
shapes', (), ()]
Inputs strides: [(21648440, 21648440, 40, 4), (108242200, 2164844, 4, 4),
(120, 120, 40, 4), 'No strides', (), ()]
Inputs values: ['not shown', 'not shown', 'not shown', <capsule object NULL
at 0x7ff55d00fe10>, 1.0, 0.0]
Outputs clients: [[GpuIncSubtensor{Inc;::, ::, ::,
int64:int64:}(GpuAlloc<None>{memset_0=True}.0,
GpuDnnConvGradW{algo='deterministic', inplace=True}.0, Constant{0},
Constant{10})]]
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.