[theano-users] Are grouped convolutions slower than regular convolutions?

p***@gmail.com

2017-10-05 21:28:43 UTC

Hello,

I wanted to reduce parameter count by switching the regular 2d convolution
below:
input: (b, ch_in, h, w)
kernel: (ch_out, ch_in, k_h, h_w)
output: (b, ch_out, h, w) ('same' convolution)
fn used: theano.tensor.nnet.conv2d

to a depth wise separate convolution followed by pointwise 1x1 convolution:
input: (b, ch_in, h, w)
kernel_spatial: (ch_in, 1, k_h, h_w)
intermediate: (b, ch_in, h, w) ('same' convolution)
kernel_1x1: (ch_out, ch_in, 1, 1)
output: (b, ch_out, h, w)

I implemented the latter two ways:

1. theano.tensor.nnet.conv2d with num_groups=ch_in for spatial conv
and theano.tensor.nnet.conv2d again for 1x1 conv

2. theano.tensor.nnet.abstract_conv.separable_conv2d which performs both
convolutions in one function

In both cases, *separable convolution was slower than regular convolution
by ~3x. I tried with both cudnn5 and cudnn6. The results were slower on
cudnn6.*

Thinking perhaps there were non-cudnn ops slowing down the process, I
profiled the #2 above and the snippet below suggests the slowdown is within
the cudnn ops:

<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply>
<Class name>
* 42.0% 42.0% 192.444s 1.52e-02s C 12680 40
theano.gpuarray.dnn.GpuDnnConvGradW*
20.7% 62.7% 94.781s 7.67e-03s C 12363 39
theano.gpuarray.dnn.GpuDnnConvGradI
15.8% 78.5% 72.418s 5.71e-03s C 12680 40
theano.gpuarray.dnn.GpuDnnConv
9.0% 87.5% 41.199s 2.89e-03s C 14265 45
theano.gpuarray.dnn.GpuDnnReduction
4.7% 92.2% 21.666s 1.29e-04s C 167693 529
theano.gpuarray.elemwise.GpuElemwise
1.9% 94.1% 8.878s 1.27e-03s C 6974 22
theano.gpuarray.dnn.GpuDnnBatchNormGrad
1.9% 96.0% 8.556s 1.23e-03s C 6974 22
theano.gpuarray.dnn.GpuDnnBatchNorm
1.4% 97.4% 6.295s 1.99e-02s Py 317 1
theano.tensor.subtensor.AdvancedIncSubtensor
0.6% 98.0% 2.620s 4.13e-03s C 634 2
theano.gpuarray.elemwise.GpuCAReduceCuda
0.4% 98.4% 1.987s 2.85e-04s C 6974 22
theano.gpuarray.dnn.GpuDnnBatchNormInference
0.4% 98.8% 1.728s 5.45e-03s C 317 1
theano.tensor.basic.Alloc
0.2% 99.0% 1.090s 3.44e-03s Py 317 1
theano.tensor.basic.ARange
0.2% 99.2% 0.769s 6.07e-04s C 1268 4
theano.gpuarray.basic_ops.GpuFromHost
0.2% 99.3% 0.754s 1.19e-03s C 634 2
theano.gpuarray.basic_ops.HostFromGpu
0.2% 99.5% 0.713s 5.62e-04s C 1268 4
theano.gpuarray.basic_ops.GpuJoin
0.1% 99.6% 0.633s 4.99e-04s C 1268 4
theano.gpuarray.dnn.GpuDnnPoolGrad
0.1% 99.7% 0.528s 5.91e-06s C 89394 282
theano.gpuarray.basic_ops.GpuContiguous
0.1% 99.8% 0.283s 2.23e-04s C 1268 4
theano.gpuarray.dnn.GpuDnnPool
0.0% 99.8% 0.142s 8.15e-07s C 174350 550
theano.tensor.subtensor.Subtensor
0.0% 99.9% 0.142s 2.04e-05s C 6974 22
theano.gpuarray.rng_mrg.GPUA_mrg_uniform

Is this a cudnn or Theano issue?

Thank you!

--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.