marcos quintana
2018-06-13 09:22:16 UTC
Hello,
I am trying to train a neural network with a high amount of data (around
20Gb sliced in some pickle files) and I see my CPU usage is not going above
5-10% and training performance is so slow. I have tests different
approaches for theanorc including multi-core or time-once dnn.conv but the
performance is not improving. This is my system description:
SW:
Ubuntu 16.04
Theano 1.0.0
Cuda 8.0
Cudnn 7
Pygpu 0.7.6
HW
CPU:
Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz
Socket(s): 1
Core(s) per socket: 6
GPU:
GeForce GTX 970
Initially I tried: python `python -c "import os, theano; print
os.path.dirname(theano.__file__)"`/misc/check_blas.py
With the following result:
Using cuDNN version 7101 on context None
Preallocating 3227/4034 Mb (0.800000) on cuda0
Mapped name None to device cuda0: GeForce GTX 970 (0000:01:00.0)
Using cuDNN version 7101 on context None
Preallocating 3227/4034 Mb (0.800000) on cuda0
Mapped name None to device cuda0: GeForce GTX 970 (0000:01:00.0)
Some results that you can compare against. They were 10 executions
of gemm in float64 with matrices of shape 2000x2000 (M=N=K=2000).
All memory layout was in C order.
CPU tested: Xeon E5345(2.33Ghz, 8M L2 cache, 1333Mhz FSB),
Xeon E5430(2.66Ghz, 12M L2 cache, 1333Mhz FSB),
Xeon E5450(3Ghz, 12M L2 cache, 1333Mhz FSB),
Xeon X5560(2.8Ghz, 12M L2 cache, hyper-threads?)
Core 2 E8500, Core i7 930(2.8Ghz, hyper-threads
enabled),
Core i7 950(3.07GHz, hyper-threads enabled)
Xeon X5550(2.67GHz, 8M l2 cache?, hyper-threads enabled)
Libraries tested:
* numpy with ATLAS from distribution (FC9) package (1 thread)
* manually compiled numpy and ATLAS with 2 threads
* goto 1.26 with 1, 2, 4 and 8 threads
* goto2 1.13 compiled with multiple threads enabled
Xeon Xeon Xeon Core2 i7 i7 Xeon Xeon
lib/nb threads E5345 E5430 E5450 E8500 930 950 X5560
X5550
numpy 1.3.0 blas
775.92s
numpy_FC9_atlas/1 39.2s 35.0s 30.7s 29.6s 21.5s 19.60s
goto/1 18.7s 16.1s 14.2s 13.7s 16.1s 14.67s
numpy_MAN_atlas/2 12.0s 11.6s 10.2s 9.2s 9.0s
goto/2 9.5s 8.1s 7.1s 7.3s 8.1s 7.4s
goto/4 4.9s 4.4s 3.7s - 4.1s 3.8s
goto/8 2.7s 2.4s 2.0s - 4.1s 3.8s
openblas/1 14.04s
openblas/2 7.16s
openblas/4 3.71s
openblas/8 3.70s
mkl 11.0.083/1 7.97s
mkl 10.2.2.025/1 13.7s
mkl 10.2.2.025/2 7.6s
mkl 10.2.2.025/4 4.0s
mkl 10.2.2.025/8 2.0s
goto2 1.13/1
14.37s
goto2 1.13/2
7.26s
goto2 1.13/4
3.70s
goto2 1.13/8
1.94s
goto2 1.13/16
3.16s
Test time in float32. There were 10 executions of gemm in
float32 with matrices of shape 5000x5000 (M=N=K=5000)
All memory layout was in C order.
cuda version 8.0 7.5 7.0
gpu
M40 0.45s 0.47s
k80 0.92s 0.96s
K6000/NOECC 0.71s 0.69s
P6000/NOECC 0.25s
Titan X (Pascal) 0.28s
GTX Titan X 0.45s 0.45s 0.47s
GTX Titan Black 0.66s 0.64s 0.64s
GTX 1080 0.35s
GTX 980 Ti 0.41s
GTX 970 0.66s
GTX 680 1.57s
GTX 750 Ti 2.01s 2.01s
GTX 750 2.46s 2.37s
GTX 660 2.32s 2.32s
GTX 580 2.42s
GTX 480 2.87s
TX1 7.6s (float32 storage and
computation)
GT 610 33.5s
Some Theano flags:
blas.ldflags= -L/usr/local/lib -lopenblas -lopenblas
compiledir=
/home/mqg/.theano/compiledir_Linux-4.4--generic-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64
floatX= float32
device= cuda0
Some OS information:
sys.platform= linux2
sys.version= 2.7.12 (default, Dec 4 2017, 14:50:18)
[GCC 5.4.0 20160609]
sys.prefix= /usr
Some environment variables:
MKL_NUM_THREADS= None
OMP_NUM_THREADS= None
GOTO_NUM_THREADS= None
Numpy config: (used when the Theano flag "blas.ldflags" is empty)
lapack_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
language = c
blas_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
language = c
openblas_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
language = c
blis_info:
NOT AVAILABLE
openblas_lapack_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
language = c
lapack_mkl_info:
NOT AVAILABLE
blas_mkl_info:
NOT AVAILABLE
Numpy dot module: numpy.core.multiarray
Numpy location:
/home/mqg/.local/lib/python2.7/site-packages/numpy/__init__.pyc
Numpy version: 1.14.2
Then I have followed âHow to test that Theano works properlyâ here:
http://deeplearning.net/software/theano/troubleshooting.html#test-blas .
With the following results:
1. python -c "import numpy; numpy.test()â. OK
Running unit tests for numpy
NumPy version 1.14.2
NumPy relaxed strides checking option: True
NumPy is installed in /home/mqg/.local/lib/python2.7/site-packages/numpy
Python version 2.7.12 (default, Dec 4 2017, 14:50:18) [GCC 5.4.0 20160609]
nose version 1.3.7
Ran 6806 tests in 21.776s
OK (KNOWNFAIL=13, SKIP=24)
2. python -c "import scipy; scipy.test()â
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/mqg/.local/lib/python2.7/site-packages/scipy/__init__.py",
line 61, in <module>
from numpy import show_config as show_numpy_config
File "/home/mqg/.local/lib/python2.7/site-packages/numpy/__init__.py",
line 142, in <module>
from . import add_newdocs
File "/home/mqg/.local/lib/python2.7/site-packages/numpy/add_newdocs.py",
line 13, in <module>
from numpy.lib import add_newdoc
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/lib/__init__.py", line
8, in <module>
from .type_check import *
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/lib/type_check.py",
line 11, in <module>
import numpy.core.numeric as _nx
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/core/__init__.py", line
74, in <module>
from numpy.testing import _numpy_tester
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/testing/__init__.py",
line 12, in <module>
from . import decorators as dec
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/testing/decorators.py",
line 6, in <module>
from .nose_tools.decorators import *
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/testing/nose_tools/decorators.py",
line 20, in <module>
from .utils import SkipTest, assert_warns
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/testing/nose_tools/utils.py",
line 15, in <module>
from tempfile import mkdtemp, mkstemp
File "/usr/lib/python2.7/tempfile.py", line 32, in <module>
import io as _io
File "io/__init__.py", line 97, in <module>
from .matlab import loadmat, savemat, whosmat, byteordercodes
File "io/matlab/__init__.py", line 13, in <module>
from .mio import loadmat, savemat, whosmat
File "io/matlab/mio.py", line 12, in <module>
from .miobase import get_matfile_version, docfiller
File "io/matlab/miobase.py", line 22, in <module>
from scipy.misc import doccer
File
"/home/mqg/.local/lib/python2.7/site-packages/scipy/misc/__init__.py", line
64, in <module>
from .common import *
File "/home/mqg/.local/lib/python2.7/site-packages/scipy/misc/common.py",
line 8, in <module>
from numpy import arange, newaxis, hstack, product, array, frombuffer
ImportError: cannot import name arange
3. THEANO_FLAGS=''; python -c "import theano; theano.test()â
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/theano/__init__.py", line
88, in <module>
from theano.configdefaults import config
File "/usr/local/lib/python2.7/dist-packages/theano/configdefaults.py",
line 6, in <module>
import numpy as np
File "/home/mqg/.local/lib/python2.7/site-packages/numpy/__init__.py",
line 142, in <module>
from . import add_newdocs
File "/home/mqg/.local/lib/python2.7/site-packages/numpy/add_newdocs.py",
line 13, in <module>
from numpy.lib import add_newdoc
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/lib/__init__.py", line
8, in <module>
from .type_check import *
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/lib/type_check.py",
line 11, in <module>
import numpy.core.numeric as _nx
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/core/__init__.py", line
74, in <module>
from numpy.testing import _numpy_tester
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/testing/__init__.py",
line 12, in <module>
from . import decorators as dec
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/testing/decorators.py",
line 6, in <module>
from .nose_tools.decorators import *
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/testing/nose_tools/decorators.py",
line 20, in <module>
from .utils import SkipTest, assert_warns
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/testing/nose_tools/utils.py",
line 15, in <module>
from tempfile import mkdtemp, mkstemp
File "/usr/lib/python2.7/tempfile.py", line 35, in <module>
from random import Random as _Random
File "random/__init__.py", line 99, in <module>
from .mtrand import *
File "numpy.pxd", line 92, in init mtrand
(numpy/random/mtrand/mtrand.c:37726)
AttributeError: 'module' object has no attribute âdtype'
And this is my current theanorc:
[global]
mode = FAST_RUN
device = cuda0
floatX = float32
OMP_NUM_THREADS = 6
openmp = True
optimizer_including = cudnn
[dnn.conv]
algo_fwd = time_once
algo_bwd_data = time_once
algo_bwd_filter = time_once
[cuda]
root = /usr/local/cuda-8.0
[gpuarray]
preallocate = 0.8
I would really appreciate if you could help me to speed up training process.
I am trying to train a neural network with a high amount of data (around
20Gb sliced in some pickle files) and I see my CPU usage is not going above
5-10% and training performance is so slow. I have tests different
approaches for theanorc including multi-core or time-once dnn.conv but the
performance is not improving. This is my system description:
SW:
Ubuntu 16.04
Theano 1.0.0
Cuda 8.0
Cudnn 7
Pygpu 0.7.6
HW
CPU:
Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz
Socket(s): 1
Core(s) per socket: 6
GPU:
GeForce GTX 970
Initially I tried: python `python -c "import os, theano; print
os.path.dirname(theano.__file__)"`/misc/check_blas.py
With the following result:
Using cuDNN version 7101 on context None
Preallocating 3227/4034 Mb (0.800000) on cuda0
Mapped name None to device cuda0: GeForce GTX 970 (0000:01:00.0)
Using cuDNN version 7101 on context None
Preallocating 3227/4034 Mb (0.800000) on cuda0
Mapped name None to device cuda0: GeForce GTX 970 (0000:01:00.0)
Some results that you can compare against. They were 10 executions
of gemm in float64 with matrices of shape 2000x2000 (M=N=K=2000).
All memory layout was in C order.
CPU tested: Xeon E5345(2.33Ghz, 8M L2 cache, 1333Mhz FSB),
Xeon E5430(2.66Ghz, 12M L2 cache, 1333Mhz FSB),
Xeon E5450(3Ghz, 12M L2 cache, 1333Mhz FSB),
Xeon X5560(2.8Ghz, 12M L2 cache, hyper-threads?)
Core 2 E8500, Core i7 930(2.8Ghz, hyper-threads
enabled),
Core i7 950(3.07GHz, hyper-threads enabled)
Xeon X5550(2.67GHz, 8M l2 cache?, hyper-threads enabled)
Libraries tested:
* numpy with ATLAS from distribution (FC9) package (1 thread)
* manually compiled numpy and ATLAS with 2 threads
* goto 1.26 with 1, 2, 4 and 8 threads
* goto2 1.13 compiled with multiple threads enabled
Xeon Xeon Xeon Core2 i7 i7 Xeon Xeon
lib/nb threads E5345 E5430 E5450 E8500 930 950 X5560
X5550
numpy 1.3.0 blas
775.92s
numpy_FC9_atlas/1 39.2s 35.0s 30.7s 29.6s 21.5s 19.60s
goto/1 18.7s 16.1s 14.2s 13.7s 16.1s 14.67s
numpy_MAN_atlas/2 12.0s 11.6s 10.2s 9.2s 9.0s
goto/2 9.5s 8.1s 7.1s 7.3s 8.1s 7.4s
goto/4 4.9s 4.4s 3.7s - 4.1s 3.8s
goto/8 2.7s 2.4s 2.0s - 4.1s 3.8s
openblas/1 14.04s
openblas/2 7.16s
openblas/4 3.71s
openblas/8 3.70s
mkl 11.0.083/1 7.97s
mkl 10.2.2.025/1 13.7s
mkl 10.2.2.025/2 7.6s
mkl 10.2.2.025/4 4.0s
mkl 10.2.2.025/8 2.0s
goto2 1.13/1
14.37s
goto2 1.13/2
7.26s
goto2 1.13/4
3.70s
goto2 1.13/8
1.94s
goto2 1.13/16
3.16s
Test time in float32. There were 10 executions of gemm in
float32 with matrices of shape 5000x5000 (M=N=K=5000)
All memory layout was in C order.
cuda version 8.0 7.5 7.0
gpu
M40 0.45s 0.47s
k80 0.92s 0.96s
K6000/NOECC 0.71s 0.69s
P6000/NOECC 0.25s
Titan X (Pascal) 0.28s
GTX Titan X 0.45s 0.45s 0.47s
GTX Titan Black 0.66s 0.64s 0.64s
GTX 1080 0.35s
GTX 980 Ti 0.41s
GTX 970 0.66s
GTX 680 1.57s
GTX 750 Ti 2.01s 2.01s
GTX 750 2.46s 2.37s
GTX 660 2.32s 2.32s
GTX 580 2.42s
GTX 480 2.87s
TX1 7.6s (float32 storage and
computation)
GT 610 33.5s
Some Theano flags:
blas.ldflags= -L/usr/local/lib -lopenblas -lopenblas
compiledir=
/home/mqg/.theano/compiledir_Linux-4.4--generic-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64
floatX= float32
device= cuda0
Some OS information:
sys.platform= linux2
sys.version= 2.7.12 (default, Dec 4 2017, 14:50:18)
[GCC 5.4.0 20160609]
sys.prefix= /usr
Some environment variables:
MKL_NUM_THREADS= None
OMP_NUM_THREADS= None
GOTO_NUM_THREADS= None
Numpy config: (used when the Theano flag "blas.ldflags" is empty)
lapack_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
language = c
blas_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
language = c
openblas_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
language = c
blis_info:
NOT AVAILABLE
openblas_lapack_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
language = c
lapack_mkl_info:
NOT AVAILABLE
blas_mkl_info:
NOT AVAILABLE
Numpy dot module: numpy.core.multiarray
Numpy location:
/home/mqg/.local/lib/python2.7/site-packages/numpy/__init__.pyc
Numpy version: 1.14.2
Then I have followed âHow to test that Theano works properlyâ here:
http://deeplearning.net/software/theano/troubleshooting.html#test-blas .
With the following results:
1. python -c "import numpy; numpy.test()â. OK
Running unit tests for numpy
NumPy version 1.14.2
NumPy relaxed strides checking option: True
NumPy is installed in /home/mqg/.local/lib/python2.7/site-packages/numpy
Python version 2.7.12 (default, Dec 4 2017, 14:50:18) [GCC 5.4.0 20160609]
nose version 1.3.7
Ran 6806 tests in 21.776s
OK (KNOWNFAIL=13, SKIP=24)
2. python -c "import scipy; scipy.test()â
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/mqg/.local/lib/python2.7/site-packages/scipy/__init__.py",
line 61, in <module>
from numpy import show_config as show_numpy_config
File "/home/mqg/.local/lib/python2.7/site-packages/numpy/__init__.py",
line 142, in <module>
from . import add_newdocs
File "/home/mqg/.local/lib/python2.7/site-packages/numpy/add_newdocs.py",
line 13, in <module>
from numpy.lib import add_newdoc
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/lib/__init__.py", line
8, in <module>
from .type_check import *
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/lib/type_check.py",
line 11, in <module>
import numpy.core.numeric as _nx
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/core/__init__.py", line
74, in <module>
from numpy.testing import _numpy_tester
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/testing/__init__.py",
line 12, in <module>
from . import decorators as dec
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/testing/decorators.py",
line 6, in <module>
from .nose_tools.decorators import *
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/testing/nose_tools/decorators.py",
line 20, in <module>
from .utils import SkipTest, assert_warns
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/testing/nose_tools/utils.py",
line 15, in <module>
from tempfile import mkdtemp, mkstemp
File "/usr/lib/python2.7/tempfile.py", line 32, in <module>
import io as _io
File "io/__init__.py", line 97, in <module>
from .matlab import loadmat, savemat, whosmat, byteordercodes
File "io/matlab/__init__.py", line 13, in <module>
from .mio import loadmat, savemat, whosmat
File "io/matlab/mio.py", line 12, in <module>
from .miobase import get_matfile_version, docfiller
File "io/matlab/miobase.py", line 22, in <module>
from scipy.misc import doccer
File
"/home/mqg/.local/lib/python2.7/site-packages/scipy/misc/__init__.py", line
64, in <module>
from .common import *
File "/home/mqg/.local/lib/python2.7/site-packages/scipy/misc/common.py",
line 8, in <module>
from numpy import arange, newaxis, hstack, product, array, frombuffer
ImportError: cannot import name arange
3. THEANO_FLAGS=''; python -c "import theano; theano.test()â
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/theano/__init__.py", line
88, in <module>
from theano.configdefaults import config
File "/usr/local/lib/python2.7/dist-packages/theano/configdefaults.py",
line 6, in <module>
import numpy as np
File "/home/mqg/.local/lib/python2.7/site-packages/numpy/__init__.py",
line 142, in <module>
from . import add_newdocs
File "/home/mqg/.local/lib/python2.7/site-packages/numpy/add_newdocs.py",
line 13, in <module>
from numpy.lib import add_newdoc
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/lib/__init__.py", line
8, in <module>
from .type_check import *
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/lib/type_check.py",
line 11, in <module>
import numpy.core.numeric as _nx
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/core/__init__.py", line
74, in <module>
from numpy.testing import _numpy_tester
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/testing/__init__.py",
line 12, in <module>
from . import decorators as dec
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/testing/decorators.py",
line 6, in <module>
from .nose_tools.decorators import *
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/testing/nose_tools/decorators.py",
line 20, in <module>
from .utils import SkipTest, assert_warns
File
"/home/mqg/.local/lib/python2.7/site-packages/numpy/testing/nose_tools/utils.py",
line 15, in <module>
from tempfile import mkdtemp, mkstemp
File "/usr/lib/python2.7/tempfile.py", line 35, in <module>
from random import Random as _Random
File "random/__init__.py", line 99, in <module>
from .mtrand import *
File "numpy.pxd", line 92, in init mtrand
(numpy/random/mtrand/mtrand.c:37726)
AttributeError: 'module' object has no attribute âdtype'
And this is my current theanorc:
[global]
mode = FAST_RUN
device = cuda0
floatX = float32
OMP_NUM_THREADS = 6
openmp = True
optimizer_including = cudnn
[dnn.conv]
algo_fwd = time_once
algo_bwd_data = time_once
algo_bwd_filter = time_once
[cuda]
root = /usr/local/cuda-8.0
[gpuarray]
preallocate = 0.8
I would really appreciate if you could help me to speed up training process.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.