This is AdaGrad but with a moving window weighted average

window_grad {CNN}

R Documentation

so the gradient is not accumulated over the entire history of the run.
it's also referred to as Idea #1 in Zeiler paper on AdaDelta.

window_grad(batch.size,
    l2.decay = 0.001,
    ro = 0.95);

batch.size

[as integer]

l2.decay

[as double]

MLkit

this function returns data object of type TrainerAlgorithm.

[Package CNN version 1.0.0.0 Index]