Interpretation of matrix calculus with Fréchet derivative

1 minute read

After a period of learning the Fréchet derivative, I look back to think about the problem of computing the derivative of $y = \Vert A x - b \Vert^2$ with respect to the variable $x$. I knew the derivative of this function, $2A^\mathsf{T}(Ax - b)$, when I was a second year master student (2015). And I have used this conclusion effectively in combination with the product rule in multiple applications. Unfortunately, I feel confused about how I came to this conclusion at that time. So I rethink the problem from the viewpoint of Fréchet derivative, and then I’ll give an answer based on this viewpoint.

Traditionally, matrix calculus is presented as a notation for organizing partial derivatives.^[Xu’s write up on Matrix derivative.] It collects the various partial derivatives of a single function with respect to many variables into vectors that can be treated as single entities.^[Wiki page of Matrix calculus.] In our case, the derivative of a scalar $y$ by a vector

$$\begin{equation*} x = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}, \end{equation*}$$

is written (in denominator layout notation) as

$$\begin{equation*} \frac{\partial y}{\partial x} = \begin{bmatrix} {\frac{\partial y}{\partial x_{1}}} \\ {\frac{\partial y}{\partial x_{2}}} \\ \vdots \\ {\frac{\partial y}{\partial x_{n}}} \end{bmatrix}. \end{equation*}$$

Apply this formula to our problem, you can get the derivative is $2A^\mathsf{T}(Ax - b)$.

Now, let’s interpret the problem with the help of Fréchet derivative. Then write $g(x) = A x - b$, $h(x) = x^\mathsf{T}x$, and $y = h \circ g$, we have $Dh(x)(u) = 2x^\mathsf{T}u$ and $Dg(x)(u) = Au$, use the chain rule, so

$$\begin{align*} Dy(x)(u) &= D(h\circ g)(x) (u) \\ &= Dh(g(x))\circ Dg(x)(u) \\ &= Dh(g(x))(Au) \\ &= 2(Ax-b)^\mathsf{T}Au, \end{align*}$$

and then

$$\begin{equation*} \frac{\partial y}{\partial x} = 2 A^\mathsf{T}(Ax - b). \end{equation*}$$