Discussion:
[theano-users] Theano >= 1.0.1: Cannot not initialize GPU if already initialized in parent process
Mathias Müller
2018-05-24 10:24:44 UTC
Permalink
Hi,

A recent commit concerning libgpuarray/ pygpu:

https://github.com/Theano/Theano/commit/073288a377d70f59962f81561251ec52304c07ab

creates a problem when using Nematus (a popular machine translation system
based on theano):

RuntimeError("You can't initialize the GPU in a subprocess if the parent
process already did it")

There is indeed a parent Python training process, and it spawns a child
Python process for validation. Training and validation need to be on
different devices, and this is realized with THEANO_FLAGS.

I investigated this issue a bit more, and it turns out this is not a
problem for Theano==1.0.0, only for Theano>=1.0.1.

Why is this change necessary?

Thanks and regards
Mathias

P.S. I was not sure whether to open a Github issue or post here. If Github
is better please let me know.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Arnaud Bergeron
2018-05-24 18:54:08 UTC
Permalink
Even if you use a different device, forking a process after the GPU has been initialized leads to all sorts of strange behaviour if you're not lucky. This limitation comes from CUDA so it's not something we can fix.

We've added this check to make sure that people don't do this and from the number of reports we've seen lots of people were lucky in the past it seems.

The "proper" way to do this is to spawn the validation process before importing theano and communicating with it to tell it to validate.

Arnaud

(Also this is the proper channel for this type of question.)
Post by Mathias Müller
Hi,
https://github.com/Theano/Theano/commit/073288a377d70f59962f81561251ec52304c07ab
RuntimeError("You can't initialize the GPU in a subprocess if the parent process already did it")
There is indeed a parent Python training process, and it spawns a child Python process for validation. Training and validation need to be on different devices, and this is realized with THEANO_FLAGS.
I investigated this issue a bit more, and it turns out this is not a problem for Theano==1.0.0, only for Theano>=1.0.1.
Why is this change necessary?
Thanks and regards
Mathias
P.S. I was not sure whether to open a Github issue or post here. If Github is better please let me know.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...