Discussion:
[theano-users] im2col function?
David Leon
2018-07-31 03:57:44 UTC
Permalink
Hi Bastien, I'm trying out `theano.tensor.nnet.neighbours.images2neibs()`
for building a network model. However this op. does not support gradient
computation, so is this expected? Or is there other way to implement
`im2col` with gradient support?
I implemented the ignore_borders mode on the GPU. here is the new timing
https://github.com/Theano/Theano/pull/1466
$CUDA_LAUNCH_BLOCKING=1 THEANO_FLAGS=device=gpu,floatX=float32 python
speed_im2col.py
Using gpu device 0: GeForce GTX 470
Convolution-based method: 0.0039279460907
Neighbors-based method: 5.21579003334
New Neighbors-based ignore border method: 0.00154495239258
New Neighbors-based valid method: 0.00151705741882
Neigh faster than conv. Speedup: 2.58918749018x
Fred
Hi,
thanks for the profile. First sadly, someone refereed you to
theano/sandbox/neighbourhoods.py instead of theano/sandbox/neighbours.py.
It was not in the doc of the first file, but the first is an old
implementation that have just python code. So yes it is very slow. It also
do not implement GPU code.
I will document clearly in that file that it is very slow and we don't
recommend it. But we keep it as it is more general. So it can do stuff that
the one in neighbours.py can't do.
So, go see the documentation of the "fast" version that is also
http://deeplearning.net/software/theano/library/sandbox/neighbours.html
the mode "ignore_border" isn't implemented on the GPU, but the mode
'valid' is. If the windows is a multiple of the image input, they are
equivalent.
I join in attachment a modifier version of your timing script. This
script was always including in the timming the transfer of the input to the
GPU. Is that what you want? Normally people have the data already on the
GPU, so the transfer is should not be there. The transfer was taking 90% of
the execution time of the fast version on the GPU, so I removed it. I added
# On the CPU
python speed_im2col.py
Convolution-based method: 0.0772249698639
Neighbors-based method: 5.10480904579
New Neighbors-based ignore border method: 0.0276699066162
New Neighbors-based valid method: 0.0298180580139
Neigh faster than conv. Speedup: 2.5898725473x
# On the GPU
$THEANO_FLAGS=device=gpu,floatX=float32 python speed_im2col.py
Using gpu device 0: GeForce GTX 470
Convolution-based method: 0.00376892089844
Neighbors-based method: 5.11869096756
New Neighbors-based ignore border method: 0.0278849601746
New Neighbors-based valid method: 0.00154709815979
Neigh faster than conv. Speedup: 2.43612266913x
On the GPU, we see that the valid mode is much faster then the
ignore_border as it is implemented on the GPU. So the ignore_border mode
incure a memory transfer + run on the CPU.
Also, we see that the new neighbours version is always the faster, on the
CPU and the GPU.
Do you need the ignore_border mode? If so, I could add it. It is not very
long to do.
Fred
Quick update... re-examining my code, it looks like I never tried the
scan implementation (accidentally used the for-loop implementation twice).
In trying it out, my scan implementation runs out of memory for some reason
with patch sizes >= 7, image size=3x500x300. It also runs slower than the
for-loop-based implementations when patch sizes < 7. Here's my code so far,
import math
import numpy as np
import time
import theano as th
import theano.tensor as T
im = T.tensor3("img", dtype=th.config.floatX)
(n_channels, rows, cols) = (im.shape[0], im.shape[1], im.shape[2])
im_pad = T.zeros((n_channels,
T.cast(T.ceil(1.0 * rows / psize) * psize, "int32"),
T.cast(T.ceil(1.0 * cols / psize) * psize, "int32")))
im_pad = T.set_subtensor(im_pad[:, 0:rows, 0:cols], im)
final = T.zeros((n_channels, im_pad.shape[1], im_pad.shape[2], psize,
psize))
im_shift = None
im_shift = T.concatenate((im_pad[:, x:], im_pad[:, :x]), axis=1)
im_shift = T.concatenate((im_shift[:, :, y:], im_shift[:, :, :y]),
axis=2)
im_shift = T.reshape(
im_shift, (n_channels, im_pad.shape[1] / psize, psize,
im_pad.shape[2] / psize, psize))
im_shift = im_shift.dimshuffle((0, 1, 3, 2, 4))
final = T.set_subtensor(final[:, x::psize, y::psize], im_shift)
final = th.Out(th.sandbox.cuda.basic_ops.gpu_from_host(
final[:, 0:rows - psize + 1, 0:cols - psize + 1]), borrow=True)
return th.function([im], final)
im = T.tensor3("img", dtype=th.config.floatX)
(n_channels, rows, cols) = (im.shape[0], im.shape[1], im.shape[2])
im_pad = T.zeros((n_channels,
T.cast(T.ceil(1.0 * rows / psize) * psize, "int32"),
T.cast(T.ceil(1.0 * cols / psize) * psize, "int32")))
im_pad = T.set_subtensor(im_pad[:, 0:rows, 0:cols], im)
final = T.zeros((n_channels, im_pad.shape[1], im_pad.shape[2], psize,
psize))
xrng, yrng = [a.flatten() for a in
np.meshgrid(range(psize), range(psize))]
xrng = th.shared(xrng)
yrng = th.shared(yrng)
final = th.scan(lambda x, y, i, f, p, n: scan_func(x, y, i, f, p, n),
sequences=[xrng, yrng], outputs_info=[final],
non_sequences=[im_pad, psize, n_channels])[0]
return th.function([im],
th.Out(th.sandbox.cuda.basic_ops.gpu_from_host(
final[:, 0:rows - psize + 1, 0:cols - psize + 1]), borrow=True))
im_shift = T.concatenate((im_pad[:, x:], im_pad[:, :x]), axis=1)
im_shift = T.concatenate((im_shift[:, :, y:], im_shift[:, :, :y]),
axis=2)
im_shift = T.reshape(
im_shift, (n_channels, im_pad.shape[1] / psize, psize,
im_pad.shape[2] / psize, psize))
im_shift = im_shift.dimshuffle((0, 1, 3, 2, 4))
return T.set_subtensor(final[:, x::psize, y::psize], im_shift)
Thanks!
Cheers,
Karthik
Hi Fred, Pascal,
Checking the source, it looks like the code is implemented in C, but it
does not make use of vectorized ops. I have a custom built version of numpy
integrated with MKL, which makes things run a bit faster, it seems. The
x = T.dtensor3()
neighs = NeighbourhoodsFromImages(1, (5,5), ignore_border=True)(x)
f = th.function([x], neighs)
tic = time.time()
f(im)
print time.time() - tic
This ends up being roughly 4x slower than the code in
http://stackoverflow.com/questions/10896841/find-a-3x3-
sliding-window-over-an-image/10906015#10906015.
As per Pascal's suggestion, I've rewritten the code in the
stackoverflow post to incorporate the GPU via Theano. This consistently
runs 6-10x faster with my Tesla M2070 (you guys are awesome!).
Unfortunately, however, the code now takes a really long time to compile
(~2 mins with 5x5 patches).
Compile times appear to grow quadratically in the patch size; I've
tried using for-loops as well as scan in my implementations, but both take
long times to compile. Is there a way to reduce the compile time? I've also
tried pickling the compiled functions as suggested in another post (so I
can compile just once and reload later), but it looks like unpickling them
causes Theano to recompile the functions.
I'll ensure to post my code here once it's been finalized. :)
Thanks,
Karthik
Hi,
This function is implemented in C. So it should be fast. What case did
you saw that was slow?
Also, I made a PR to add it in the documentation on the web page, so
https://github.com/Theano/Theano/pull/1424
Fred
On Wed, Jun 19, 2013 at 11:43 AM, Pascal Lamblin <
You can also try to express symbolically
in Theano the function proposed in
http://stackoverflow.com/questions/10896841/find-a-3x3-
sliding-window-over-an-image/10906015#10906015
since it only uses operations that have an equivalent in Theano
(you can
use tensor.dimshuffle instead of swapaxes, and tensor.join instead
of
vstack/colum_stack).
And
final = set_subtensor(final[x::3, y::3], ...)
instead of
final[x::3,y::3] = ...
--
Pascal
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it,
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google
Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/groups/opt_out.
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...