Discussion:
[theano-users] MemoryError: alloc failed and Segmentation fault
roman.foell via theano-users
2017-09-05 07:17:16 UTC
Permalink
Hello,

I have running two programs in python, theano where the set of data with
around 65000 is quit huge.
I get for one of these programs this error below

Apply node that caused the error: Alloc(TensorConstant{(1, 1, 1) of 0.0},
TensorConstant{65530}, TensorConstant{150}, TensorConstant{150})
Toposort index: 26
Inputs types: [TensorType(float64, (True, True, True)), TensorType(int64,
scalar), TensorType(int64, scalar), TensorType(int64, scalar)]
Inputs shapes: [(1, 1, 1), (), (), ()]
Inputs strides: [(8, 8, 8), (), (), ()]
Inputs values: [array([[[ 0.]]]), array(65530), array(150), array(150)]
Outputs clients: [[IncSubtensor{InplaceInc;int64}(Alloc.0, Elemwise{
Composite{(i0 * (((i1 + i2) * i3) - i4) * i5 * i6)}}[(0, 3)].0, Constant{-1
}), IncSubtensor{Inc;int64}(Alloc.0, Elemwise{Composite{(i0 * (((i1 + i2) *
i3) - i4) * i5 * i6)}}[(0, 4)].0, Constant{-1})]]

Backtrace when the node is created(use Theano flag traceback.limit=N to
make it longer):
File
"/home/flo9fe/anaconda2/lib/python2.7/site-packages/theano/gradient.py",
line 1272, in access_grad_cache
term = access_term_cache(node)[idx]
File
"/home/flo9fe/anaconda2/lib/python2.7/site-packages/theano/gradient.py",
line 967, in access_term_cache
output_grads = [access_grad_cache(var) for var in node.outputs]
File
"/home/flo9fe/anaconda2/lib/python2.7/site-packages/theano/gradient.py",
line 1272, in access_grad_cache
term = access_term_cache(node)[idx]
File
"/home/flo9fe/anaconda2/lib/python2.7/site-packages/theano/gradient.py",
line 967, in access_term_cache
output_grads = [access_grad_cache(var) for var in node.outputs]
File
"/home/flo9fe/anaconda2/lib/python2.7/site-packages/theano/gradient.py",
line 1272, in access_grad_cache
term = access_term_cache(node)[idx]
File
"/home/flo9fe/anaconda2/lib/python2.7/site-packages/theano/gradient.py",
line 967, in access_term_cache
output_grads = [access_grad_cache(var) for var in node.outputs]
File
"/home/flo9fe/anaconda2/lib/python2.7/site-packages/theano/gradient.py",
line 1272, in access_grad_cache
term = access_term_cache(node)[idx]
File
"/home/flo9fe/anaconda2/lib/python2.7/site-packages/theano/gradient.py",
line 1108, in access_term_cache
new_output_grads)



and on another machine

Segmentation fault


Actually I have in the other program, which works fine, a theano.scan loop,
which is of the same size and which should produce also a tensor with size
(65000,150,150).
I'm working on a machine with 64 GB, so the alloc should not be a problem I
think.

I also tried to set ulimit -s unlimited, which didn't worked so far.

The code which I think produces the error is of the form

EPhiTPhi = np.zeros((150,150))
loop = np.int32(-1)
def EPhiTPhi_loop(..):
EPhiTPhi = some calculations to produce a 150 times 150 matrix
return EPhiTPhi

result, _ = theano.scan(EPhiTPhi_loop,
outputs_info = [EPhiTPhi],
n_steps = 65000,
non_sequences = [...])

EPhiTPhi_out = result[-1][-1]
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Frédéric Bastien
2017-09-08 14:10:43 UTC
Permalink
For the alloc, Theano try to allocate 5G for one node in the graph. So even
if you have 64G total on the computer, you will need way more then 5G.

Try to use smaller minibatch or lower other memory usage. You are really
using too much memory.

You could use scan_checkpoint that is done to lower the memory usage by
duplicating some computation. It is particularly useful for very long
sequence like you have:

http://deeplearning.net/software/theano/library/scan.html#reducing-scan-s-memory-usage

The segfault could be caused by missing memory and we hit that problem at
another place, but don't handle it well at that other place. Updating
Theano could help fix that if you are using Theano 0.9. Use the dev
version. Lowering the memory usage could also fix the segfault if my
assuption are right.

Frédéric

On Tue, Sep 5, 2017 at 3:17 AM roman.foell via theano-users <
Post by roman.foell via theano-users
Hello,
I have running two programs in python, theano where the set of data with
around 65000 is quit huge.
I get for one of these programs this error below
Apply node that caused the error: Alloc(TensorConstant{(1, 1, 1) of 0.0},
TensorConstant{65530}, TensorConstant{150}, TensorConstant{150})
Toposort index: 26
Inputs types: [TensorType(float64, (True, True, True)), TensorType(int64,
scalar), TensorType(int64, scalar), TensorType(int64, scalar)]
Inputs shapes: [(1, 1, 1), (), (), ()]
Inputs strides: [(8, 8, 8), (), (), ()]
Inputs values: [array([[[ 0.]]]), array(65530), array(150), array(150)]
Outputs clients: [[IncSubtensor{InplaceInc;int64}(Alloc.0, Elemwise{
Composite{(i0 * (((i1 + i2) * i3) - i4) * i5 * i6)}}[(0, 3)].0, Constant{-
1}), IncSubtensor{Inc;int64}(Alloc.0, Elemwise{Composite{(i0 * (((i1 + i2)
* i3) - i4) * i5 * i6)}}[(0, 4)].0, Constant{-1})]]
Backtrace when the node is created(use Theano flag traceback.limit=N to
File
"/home/flo9fe/anaconda2/lib/python2.7/site-packages/theano/gradient.py",
line 1272, in access_grad_cache
term = access_term_cache(node)[idx]
File
"/home/flo9fe/anaconda2/lib/python2.7/site-packages/theano/gradient.py",
line 967, in access_term_cache
output_grads = [access_grad_cache(var) for var in node.outputs]
File
"/home/flo9fe/anaconda2/lib/python2.7/site-packages/theano/gradient.py",
line 1272, in access_grad_cache
term = access_term_cache(node)[idx]
File
"/home/flo9fe/anaconda2/lib/python2.7/site-packages/theano/gradient.py",
line 967, in access_term_cache
output_grads = [access_grad_cache(var) for var in node.outputs]
File
"/home/flo9fe/anaconda2/lib/python2.7/site-packages/theano/gradient.py",
line 1272, in access_grad_cache
term = access_term_cache(node)[idx]
File
"/home/flo9fe/anaconda2/lib/python2.7/site-packages/theano/gradient.py",
line 967, in access_term_cache
output_grads = [access_grad_cache(var) for var in node.outputs]
File
"/home/flo9fe/anaconda2/lib/python2.7/site-packages/theano/gradient.py",
line 1272, in access_grad_cache
term = access_term_cache(node)[idx]
File
"/home/flo9fe/anaconda2/lib/python2.7/site-packages/theano/gradient.py",
line 1108, in access_term_cache
new_output_grads)
and on another machine
Segmentation fault
Actually I have in the other program, which works fine, a theano.scan
loop, which is of the same size and which should produce also a tensor
with size (65000,150,150).
I'm working on a machine with 64 GB, so the alloc should not be a problem
I think.
I also tried to set ulimit -s unlimited, which didn't worked so far.
The code which I think produces the error is of the form
EPhiTPhi = np.zeros((150,150))
loop = np.int32(-1)
EPhiTPhi = some calculations to produce a 150 times 150 matrix
return EPhiTPhi
result, _ = theano.scan(EPhiTPhi_loop,
outputs_info = [EPhiTPhi],
n_steps = 65000,
non_sequences = [...])
EPhiTPhi_out = result[-1][-1]
--
---
You received this message because you are subscribed to the Google Groups
"theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...