Re-parameterising for non-negativity yields multiplicative updates

Suppose you have a model that depends on real-valued parameters, and that you would like to constrain these parameters to be non-negative. For simplicity, suppose the model has a single parameter . Let denote the error function. To constrain to be non-negative, parameterise as the square of a real-valued parameter :

   

We can now minimise by choosing without constraints, e.g. by using gradient descent. Let be the learning rate. We have

   

by the chain rule. Thus

   

Thus we’ve obtained a multiplicative update rule for that is in terms of , only. In particular, we don’t need anymore!