Suppose you have a model that depends on real-valued parameters, and that you would like to constrain these parameters to be non-negative. For simplicity, suppose the model has a single parameter . Let denote the error function. To constrain to be non-negative, parameterise as the square of a real-valued parameter :
We can now minimise by choosing without constraints, e.g. by using gradient descent. Let be the learning rate. We have
by the chain rule. Thus
Thus we’ve obtained a multiplicative update rule for that is in terms of , only. In particular, we don’t need anymore!