[theano-users] Implementing a GPU op

Discussion:

Christopher Bourez

2017-07-11 16:12:03 UTC

Hi,

I'm trying to implement a simple GPU op but it always gives me a
Segmentation fault during compilation, without other message.

For example :
import theano
from theano.gpuarray.basic_ops import GpuEye

x = theano.tensor.iscalar('x')
y = theano.tensor.iscalar('y')
z = GpuEye(dtype='float32', context_name=None)(x,y,
theano.tensor.constant(0))

theano.printing.debugprint(z)
print("Compiling")
f = theano.function( [x,y], z)
theano.printing.debugprint(f)
print("Results")
print(f(3, 3))

I've also tried with the softmax gpu function. Is there something I'm
missing ?

I copied the file, created a complete new op, and the segmentation fault
appears when I'm defining a Kernel in gpu_kernels() method of the op.

Thank you a lot for your help

--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Pascal Lamblin

2017-07-12 00:48:44 UTC

Permalink

Does it work if you do not modify the source for GpuEye at all?
If it does, then maybe sharing your new source would get you more help.

Post by Christopher Bourez
Hi,
I'm trying to implement a simple GPU op but it always gives me a
Segmentation fault during compilation, without other message.
import theano
from theano.gpuarray.basic_ops import GpuEye
x = theano.tensor.iscalar('x')
y = theano.tensor.iscalar('y')
z = GpuEye(dtype='float32', context_name=None)(x,y,
theano.tensor.constant(0))
theano.printing.debugprint(z)
print("Compiling")
f = theano.function( [x,y], z)
theano.printing.debugprint(f)
print("Results")
print(f(3, 3))
I've also tried with the softmax gpu function. Is there something I'm
missing ?
I copied the file, created a complete new op, and the segmentation fault
appears when I'm defining a Kernel in gpu_kernels() method of the op.
Thank you a lot for your help

Christopher Bourez

2017-07-12 07:56:08 UTC

Permalink

I don't know what you mean by "not modifying" the source for GpuEye:
- In this example, I'm importing a not modifyed GpuEye op from Theano
basic ops
- If I'm using theano.tensor.eye, then it does not use the GpuEye

Also, are you sure this test
https://github.com/Theano/Theano/blob/2625464534147fd70da60a3a3ddcb63ed8e5a416/theano/gpuarray/tests/test_basic_ops.py#L401
works well ?

Post by Pascal Lamblin
Does it work if you do not modify the source for GpuEye at all?
If it does, then maybe sharing your new source would get you more help.

Christopher Bourez

2017-07-12 07:58:34 UTC

Permalink

I've also tried to create an example with theano.gpuarray.nnet.GpuSoftmax but
after compilation it got replaced another implementation*GpuDnnSoftmax : *

*Elemwise{mul,no_inplace} [id A] '' |HostFromGpu(gpuarray) [id B] ''
| |GpuSoftmax [id C] '' | |GpuFromHost<dev0> [id D] '' | |x
[id E] |InplaceDimShuffle{x,x} [id F] '' |TensorConstant{2} [id
G]CompilingHostFromGpu(gpuarray) [id A] '' 5 |GpuElemwise{Mul}[(0,
1)]<gpuarray> [id B] '' 4 |GpuArrayConstant{[[ 2.]]} [id C]
|InplaceGpuDimShuffle{0,1} [id D] '' 3
|GpuDnnSoftmax{mode='channel', algo='accurate'} [id E] '' 2
|GpuContiguous [id F] '' 1 |InplaceGpuDimShuffle{0,1,x,x} [id G]
'' 0 |<GpuArrayType<dev0>(float32, (False, False))> [id H]*I'm
looking of a good example with a GPU Kernel.

Post by Christopher Bourez
- In this example, I'm importing a not modifyed GpuEye op from Theano
basic ops
- If I'm using theano.tensor.eye, then it does not use the GpuEye
Also, are you sure this test
https://github.com/Theano/Theano/blob/2625464534147fd70da60a3a3ddcb63ed8e5a416/theano/gpuarray/tests/test_basic_ops.py#L401
works well ?

Post by Pascal Lamblin
Does it work if you do not modify the source for GpuEye at all?
If it does, then maybe sharing your new source would get you more help.

Christopher Bourez

2017-07-12 08:05:30 UTC

Permalink

A second thing that is not clear to me in the documentation of Theano is
how you specify a C implementation and GPU implementation of the same own
op. Thank you

Post by Christopher Bourez
I've also tried to create an example with theano.gpuarray.nnet.GpuSoftmax but
after compilation it got replaced another implementation*GpuDnnSoftmax : *
*Elemwise{mul,no_inplace} [id A] '' |HostFromGpu(gpuarray) [id B] ''
| |GpuSoftmax [id C] '' | |GpuFromHost<dev0> [id D] '' | |x
[id E] |InplaceDimShuffle{x,x} [id F] '' |TensorConstant{2} [id
G]CompilingHostFromGpu(gpuarray) [id A] '' 5 |GpuElemwise{Mul}[(0,
1)]<gpuarray> [id B] '' 4 |GpuArrayConstant{[[ 2.]]} [id C]
|InplaceGpuDimShuffle{0,1} [id D] '' 3
|GpuDnnSoftmax{mode='channel', algo='accurate'} [id E] '' 2
|GpuContiguous [id F] '' 1 |InplaceGpuDimShuffle{0,1,x,x} [id G]
'' 0 |<GpuArrayType<dev0>(float32, (False, False))> [id H]*I'm
looking of a good example with a GPU Kernel.

Post by Pascal Lamblin
Does it work if you do not modify the source for GpuEye at all?
If it does, then maybe sharing your new source would get you more help.

Post by Christopher Bourez
Hi,
I'm trying to implement a simple GPU op but it always gives me a
Segmentation fault during compilation, without other message.
import theano
from theano.gpuarray.basic_ops import GpuEye
x = theano.tensor.iscalar('x')
y = theano.tensor.iscalar('y')
z = GpuEye(dtype='float32', context_name=None)(x,y,
theano.tensor.constant(0))
theano.printing.debugprint(z)
print("Compiling")
f = theano.function( [x,y], z)
theano.printing.debugprint(f)
print("Results")
print(f(3, 3))
I've also tried with the softmax gpu function. Is there something I'm
missing ?
I copied the file, created a complete new op, and the segmentation
fault appears when I'm defining a Kernel in gpu_kernels() method of the op.
Thank you a lot for your help

Christopher Bourez

2017-07-12 08:13:43 UTC

Permalink

What surprises me is to get seg faults in the theano function, while I
would have expected them to occur during evaluation on values...

Post by Christopher Bourez
A second thing that is not clear to me in the documentation of Theano is
how you specify a C implementation and GPU implementation of the same own
op. Thank you

Post by Christopher Bourez
I've also tried to create an example with theano.gpuarray.nnet.GpuSoftmax
but after compilation it got replaced another implementation*GpuDnnSoftmax
: *
*Elemwise{mul,no_inplace} [id A] '' |HostFromGpu(gpuarray) [id B] ''
| |GpuSoftmax [id C] '' | |GpuFromHost<dev0> [id D] '' | |x
[id E] |InplaceDimShuffle{x,x} [id F] '' |TensorConstant{2} [id
G]CompilingHostFromGpu(gpuarray) [id A] '' 5 |GpuElemwise{Mul}[(0,
1)]<gpuarray> [id B] '' 4 |GpuArrayConstant{[[ 2.]]} [id C]
|InplaceGpuDimShuffle{0,1} [id D] '' 3
|GpuDnnSoftmax{mode='channel', algo='accurate'} [id E] '' 2
|GpuContiguous [id F] '' 1 |InplaceGpuDimShuffle{0,1,x,x} [id G]
'' 0 |<GpuArrayType<dev0>(float32, (False, False))> [id H]*I'm
looking of a good example with a GPU Kernel.

Post by Pascal Lamblin
Does it work if you do not modify the source for GpuEye at all?
If it does, then maybe sharing your new source would get you more help.

Post by Christopher Bourez
Hi,
I'm trying to implement a simple GPU op but it always gives me a
Segmentation fault during compilation, without other message.
import theano
from theano.gpuarray.basic_ops import GpuEye
x = theano.tensor.iscalar('x')
y = theano.tensor.iscalar('y')
z = GpuEye(dtype='float32', context_name=None)(x,y,
theano.tensor.constant(0))
theano.printing.debugprint(z)
print("Compiling")
f = theano.function( [x,y], z)
theano.printing.debugprint(f)
print("Results")
print(f(3, 3))
I've also tried with the softmax gpu function. Is there something I'm
missing ?
I copied the file, created a complete new op, and the segmentation
fault appears when I'm defining a Kernel in gpu_kernels() method of the op.
Thank you a lot for your help

Pascal Lamblin

2017-07-15 23:43:40 UTC

Permalink

Post by Christopher Bourez
- In this example, I'm importing a not modifyed GpuEye op from Theano
basic ops
- If I'm using theano.tensor.eye, then it does not use the GpuEye

OK, I assumed that you had started from the implementation of GpuEye to
implement a new GPU Op.
Your original example seems to work for me, though, so it may have to do
with your setup:

In [3]: import theano
...: from theano.gpuarray.basic_ops import GpuEye
...:
...: x = theano.tensor.iscalar('x')
...: y = theano.tensor.iscalar('y')
...: z = GpuEye(dtype='float32', context_name=None)(x,y, theano.tensor.
constant(0))
...:
...: theano.printing.debugprint(z)
...: print("Compiling")
...: f = theano.function( [x,y], z)
...: theano.printing.debugprint(f)
...: print("Results")
...: print(f(3, 3))
...:
GpuEye{dtype='float32', context_name=None} [id A] ''
|x [id B]
|y [id C]
|TensorConstant{0} [id D]
Compiling
GpuEye{dtype='float32', context_name=None} [id A] '' 0
|x [id B]
|y [id C]
|TensorConstant{0} [id D]
Results
[[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 0. 1.]]

Also, are you sure this test

Post by Christopher Bourez
https://github.com/Theano/Theano/blob/2625464534147fd70da60a3a3ddcb63ed8e5a416/theano/gpuarray/tests/test_basic_ops.py#L401
works well ?

Yes, it gets tested in our daily buildbot and on several pull requests per
week, by our continuous integration systems. I also just launched it
manually:
$ theano-nose theano/gpuarray/tests/test_basic_ops.py:test_gpueye
Can not use cuDNN on context None: Disabled by dnn.enabled flag
Mapped name None to device cuda: GeForce GTX 580 (0000:02:00.0)
.............................................
----------------------------------------------------------------------
Ran 45 tests in 21.645s

OK

I've also tried to create an example with theano.gpuarray.nnet.GpuSoftmax but

Post by Christopher Bourez
after compilation it got replaced another implementation*GpuDnnSoftmax : *

Yes, there is an optimization that does that if cuDNN is available. You
should be able to disable it with `optimizer_excluding=local_softmax_dnn`.

A second thing that is not clear to me in the documentation of Theano is

Post by Christopher Bourez
how you specify a C implementation and GPU implementation of the same own
op. Thank you

You do not specify C and GPU implementations for the same Op, what we have
in general is two different Ops, one that has CPU inputs and outputs, and
computes on CPU, and another one with GPU inputs and outputs, that computes
on GPU.
This is necessary because the Variables in Theano are strongly typed, and
the device is part of the type.
There are optimizations that replace CPU Ops by GPU ones, inserting
transfer Ops (GpuFromHost, HostFromGpu) if necessary.
GPU Ops, like CPU ones, can have C (using CUDA) or Python implementations
(or both).

What surprises me is to get seg faults in the theano function, while I

Post by Christopher Bourez
would have expected them to occur during evaluation on values...

It is strange indeed. It may be possible that some GPU operations are
executed on GPU during the compilation phase, for constant folding
(constant propagation) for instance.
Does it happen as well with the latest master from GitHub?

Post by Christopher Bourez

Post by Christopher Bourez
*Elemwise{mul,no_inplace} [id A] '' |HostFromGpu(gpuarray) [id B] ''
| |GpuSoftmax [id C] '' | |GpuFromHost<dev0> [id D] '' | |x
[id E] |InplaceDimShuffle{x,x} [id F] '' |TensorConstant{2} [id
G]CompilingHostFromGpu(gpuarray) [id A] '' 5 |GpuElemwise{Mul}[(0,
1)]<gpuarray> [id B] '' 4 |GpuArrayConstant{[[ 2.]]} [id C]
|InplaceGpuDimShuffle{0,1} [id D] '' 3
|GpuDnnSoftmax{mode='channel', algo='accurate'} [id E] '' 2
|GpuContiguous [id F] '' 1 |InplaceGpuDimShuffle{0,1,x,x} [id G]
'' 0 |<GpuArrayType<dev0>(float32, (False, False))> [id H]*I'm
looking of a good example with a GPU Kernel.

Post by Pascal Lamblin
Does it work if you do not modify the source for GpuEye at all?
If it does, then maybe sharing your new source would get you more help.

Post by Christopher Bourez
Hi,
I'm trying to implement a simple GPU op but it always gives me a
Segmentation fault during compilation, without other message.
import theano
from theano.gpuarray.basic_ops import GpuEye
x = theano.tensor.iscalar('x')
y = theano.tensor.iscalar('y')
z = GpuEye(dtype='float32', context_name=None)(x,y,
theano.tensor.constant(0))
theano.printing.debugprint(z)
print("Compiling")
f = theano.function( [x,y], z)
theano.printing.debugprint(f)
print("Results")
print(f(3, 3))
I've also tried with the softmax gpu function. Is there something I'm
missing ?
I copied the file, created a complete new op, and the segmentation
fault appears when I'm defining a Kernel in gpu_kernels() method of the op.
Thank you a lot for your help

Christopher Bourez

2017-07-16 10:48:53 UTC

Permalink

Post by Pascal Lamblin
Your original example seems to work for me, though, so it may have to do

I got it work when I removed device and contexts flags from my theanorc
config file and used the command

THEANO_FLAGS="init_gpu_device=cuda" python t.py

If I add the device flag set to cuda or cuda0, it gives me a seg fault.
I found this information when running the test below: "If you want
GPU-related tests to run on a specific GPU device, and not the default one,
you should use the init_gpu_device theano flag."

What does that mean for my configuration ? What shall I change ?

Yes, it gets tested in our daily buildbot and on several pull requests per

Post by Pascal Lamblin
week, by our continuous integration systems. I also just launched it
$ theano-nose theano/gpuarray/tests/test_basic_ops.py:test_gpueye
Can not use cuDNN on context None: Disabled by dnn.enabled flag
Mapped name None to device cuda: GeForce GTX 580 (0000:02:00.0)
.............................................
----------------------------------------------------------------------
Ran 45 tests in 21.645s
OK

I get :

ImportError: No module named test_basic_ops

When I run

THEANO_FLAGS="init_gpu_device=cuda" theano-nose
/usr/local/lib/python2.7/dist-packages/theano/gpuarray/tests/test_basic_ops.py:test_gpueye

I get

if hasattr(theano.tests, "TheanoNoseTester"):
AttributeError: 'module' object has no attribute 'tests'

Post by Pascal Lamblin
You do not specify C and GPU implementations for the same Op, what we have
in general is two different Ops, one that has CPU inputs and outputs, and
computes on CPU, and another one with GPU inputs and outputs, that computes
on GPU.
This is necessary because the Variables in Theano are strongly typed, and
the device is part of the type.
There are optimizations that replace CPU Ops by GPU ones, inserting
transfer Ops (GpuFromHost, HostFromGpu) if necessary.
GPU Ops, like CPU ones, can have C (using CUDA) or Python implementations
(or both).

Are the rules name-based ? If there is the string Gpu in the name? Or is
there any registration as other framework ?
Thanks a lot for clarification on the optimization rules.

Post by Pascal Lamblin
What surprises me is to get seg faults in the theano function, while I

Post by Christopher Bourez
would have expected them to occur during evaluation on values...

Installing the latest dev version from github did not improve results.

Christopher Bourez

2017-07-16 11:05:09 UTC

Permalink

I found another server on which the code works as on your computer :

Using cuDNN version 5110 on context None
Preallocating 9151/11439 Mb (0.800000) on cuda0
Mapped name None to device cuda0: Tesla K80 (0000:83:00.0)
GpuEye{dtype='float32', context_name=None} [id A] ''
|x [id B]
|y [id C]
Compiling
GpuEye{dtype='float32', context_name=None} [id A] '' 0
|x [id B]
|y [id C]
Results
[[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 0. 1.]

To reproduce the segmentation fault :

*conda install pygpu*

Fetching package metadata .........
Solving package specifications: .

Package plan for installation in environment /root/miniconda2:

The following packages will be UPDATED:

libgpuarray: 0.6.4-0 --> 0.6.8-0
pygpu: 0.6.4-py27_1 --> 0.6.8-py27_0

Proceed ([y]/n)? y

And then you can run again :

import theano
from theano.gpuarray.basic_ops import GpuEye
x = theano.tensor.iscalar('x')
y = theano.tensor.iscalar('y')
z = GpuEye(dtype='float32', context_name=None)(x,y,
theano.tensor.constant(0))
theano.printing.debugprint(z)
print("Compiling")
f = theano.function( [x,y], z)
theano.printing.debugprint(f)
print("Results")
print(f(3, 3))

Using cuDNN version 5110 on context None
Preallocating 9151/11439 Mb (0.800000) on cuda0
Mapped name None to device cuda0: Tesla K80 (0000:83:00.0)
GpuEye{dtype='float32', context_name=None} [id A] ''
|x [id B]
|y [id C]
Compiling
Segmentation fault (core dumped)

Christopher Bourez

2017-07-16 11:15:51 UTC

Permalink

Moreover if you install Theano from scratch :

conda install theano pygpu

Fetching package metadata .........
Solving package specifications: .

Package plan for installation in environment /root/miniconda2:

The following NEW packages will be INSTALLED:

libgpuarray: 0.6.8-0
pygpu: 0.6.8-py27_0
theano: 0.9.0-py27_0

you need to run the following correction after :

conda install libgpuarray=0.6.4-0 pygpu=0.6.4

And now that works on the server I was initially working on.

John Coolidge

2017-07-19 22:52:49 UTC

Permalink

Oh man, so glad I randomly clicked on your your message! I just wrote a
post about the same problem, except I was getting the segmentation fault
while simply trying to run some old code on the GPU with the new GPU
backend. I also use anaconda and using the init_gpu_device flag instead of
the ordinary device flag as well as downgrading pygpu and libgpuarray also
solved my problem. Not sure if the problem lies with Theano, but seems like
this could affect a fair number of people.

Post by Christopher Bourez
conda install theano pygpu
Fetching package metadata .........
Solving package specifications: .
libgpuarray: 0.6.8-0
pygpu: 0.6.8-py27_0
theano: 0.9.0-py27_0
conda install libgpuarray=0.6.4-0 pygpu=0.6.4
And now that works on the server I was initially working on.

Continue reading on narkive:

Search results for '[theano-users] Implementing a GPU op' (Questions and Answers)

replies

what's the difference between Intel Pentium Dothan and Centrino?

started 2006-04-05 19:54:26 UTC

laptops & notebooks

replies

Should i wait to buy AMD fx?

started 2011-10-22 12:07:07 UTC

desktops

replies

which is better xbox 360 or PlayStation 3?