Suppose you have a model that depends on real-valued parameters, and that you would like to constrain these parameters to be non-negative. For simplicity, suppose the model has a single parameter
. Let
denote the error function. To constrain
to be non-negative, parameterise
as the square of a real-valued parameter
:
We can now minimise
by choosing
without constraints, e.g. by using gradient descent. Let
be the learning rate. We have
by the chain rule. Thus
Thus we’ve obtained a multiplicative update rule for
that is in terms of
, only. In particular, we don’t need
anymore!