Adi Renduchintala
2017-07-17 19:04:27 UTC
I have a large data array X and Y representing input and outputs of a
classifier. I need to use mini-batch gradient descent to train a classifier
(say logistic regression).
I have stored the parameters Weights as a theano.shared variable. Instead
of copying small batches of data to the GPU, I also place all the data on
the GPU as a theano.shared variable and only pass indexes to the GPU.
The scan function that accepts batches of indexes fr into X_shared and
Y_shared and then computes a loss.
*I want to call a gradient update function inside scan so that every scan
routine uses an updated version of weights. Is this possible/efficient? If
so how can I accomplish this?*
I essentially want to do mini-batch gradient descent inside the gpu.
I have seen examples where the cpu send the gpu random indexes, but I want
to know if I can avoid that!
This is my scan function:
#shared variable for weights
weights = theano.shared(np.float32(np.random.rand(data_shape), 'weights')#shared data
X_shared = theano.shared(np.float32(MY_DATA_X), 'X')
Y_shared = theano.shared(np.float32(MY_DATA_Y), 'Y')
def scan_shared_loss_acc(fr, total_mean_loss):
#fr is a batch of random indexes into the data
fr = T.cast(fr, 'int32')
x = T.cast(X_shared[fr], 'int32')
y = T.cast(Y_shared[fr], 'int32')
y_pred = get_prediction(x, y, weights)
loss = T.nnet.categorical_crossentropy(y_pred, y).mean()
#---- I want to do a gradient update here ----#
#--- is the update below efficient? ----#
#grad_weights = T.grad(loss, weights)
#weights = weights - learning_rate * grad_weights
total_mean_loss = total_mean_loss + losses
return total_mean_loss
classifier. I need to use mini-batch gradient descent to train a classifier
(say logistic regression).
I have stored the parameters Weights as a theano.shared variable. Instead
of copying small batches of data to the GPU, I also place all the data on
the GPU as a theano.shared variable and only pass indexes to the GPU.
The scan function that accepts batches of indexes fr into X_shared and
Y_shared and then computes a loss.
*I want to call a gradient update function inside scan so that every scan
routine uses an updated version of weights. Is this possible/efficient? If
so how can I accomplish this?*
I essentially want to do mini-batch gradient descent inside the gpu.
I have seen examples where the cpu send the gpu random indexes, but I want
to know if I can avoid that!
This is my scan function:
#shared variable for weights
weights = theano.shared(np.float32(np.random.rand(data_shape), 'weights')#shared data
X_shared = theano.shared(np.float32(MY_DATA_X), 'X')
Y_shared = theano.shared(np.float32(MY_DATA_Y), 'Y')
def scan_shared_loss_acc(fr, total_mean_loss):
#fr is a batch of random indexes into the data
fr = T.cast(fr, 'int32')
x = T.cast(X_shared[fr], 'int32')
y = T.cast(Y_shared[fr], 'int32')
y_pred = get_prediction(x, y, weights)
loss = T.nnet.categorical_crossentropy(y_pred, y).mean()
#---- I want to do a gradient update here ----#
#--- is the update below efficient? ----#
#grad_weights = T.grad(loss, weights)
#weights = weights - learning_rate * grad_weights
total_mean_loss = total_mean_loss + losses
return total_mean_loss
--
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.