On Tensor Networks and the Nature of Non-Linearity

What does this have to do with calculus? Well let’s convert $f(x)=x\cdot x \cdot x$ into a multi-linear function, i.e. $f(x,y,z) = x\cdot y \cdot z$, except that we know $x = y = z$ (or that they’re highly correlated). It is multi-linear because if we assumed data independence, then the function is linear with respect to each variable. For example, $f(5+3,y,z) = 8yz$ which is the same as $f(5,y,z) + f(3,y,z) = 5yz + 3yz = 8yz$. But since we know our data is not independent, technically this function isn’t linear because we can’t independently control for each variable. But let’s for a momement pretend we didn’t know our input data is correlated in this way.

So remember, we already remember from school that the derivative of $f(x)=x^3$ is $\frac{df}{dx} = 3x^2$. We can’t take “the derivative” of our new multi-linear function anymore, we can only take partial derivatives with respect to each variable. To take the partial derivative of $x$, we assume the other variables $y,z$ are held as constants, and then it becomes a simple linear equation. We do the same for $y,z$.

But wait, we can’t really hold the other variables constant because we know they’re perfectly correlated (actually equal). What we’re getting at is a case of the total derivative (see < https://en.wikipedia.org/wiki/Total_derivative >), which is how you take the derivative of a multivariable function with interacting variables. In this particular case, since $x = y = z$, we just need to combine (sum) all these partial derivatives together and make all the variables the same, say $x$.

Exactly what we get with through the traditional route. Or consider a slightly more complicated function: $$ f(x) = 3x^3+x^2 \rightarrow \frac{df}{dx} = 9x^2 + 2x \

f(a,b,c,d,e) = 3abc+de \rightarrow \frac{df}{d{abcde}} = 3bc+3ac+3ab + e + d \rightarrow 9x^2 + 2x

$$

Hence an intuition for copying as non-linearity even makes computing derivatives more intuitive, at least for me.