Diagonalization - Least squares revisited

This chapter is about diagonalization of the normal matrix, and refining the least squares method by the truncation of its eigenvalues. You will also come across the term of "Singular Value Decomposition" (abbreviated as SVD) for diagonalization.

Diagonal matrix

\[ D = diag(\lambda_1,...,\lambda_m) \]

Diagonalization of symmetric matrices

Do you remember this classic result? Any real symmetric matrix is diagonalizable in a basis of orthonormal eigenvectors. If \( N \) is our normal matrix from the previous chapters, then it can be decomposed in the following form:

Diagonalization of the normal matrix

\[ N = PD^tP \]

\[ D = diag(\lambda_1,...,\lambda_m) \]

\[ \,^tPP = I_m \]

In the formula above:

\( N = \,^tAA \) is the normal matrix.
\( D = \,^tPNP \) is the diagonal matrix of eigenvalues \( (\lambda_i)_{i=1,m}\).
\( P \) is the square matrix of orthonormal eigenvectors \( (P^{-1}=\,^tP) \).

Since \(N\) is positive by construction, all eigenvalues are positive. For practical reasons, we can order the eigenvalues by descending order.

Eigenvalues

\[ \forall i \in [|1,m|], \lambda_i \ge 0 \]

\[ \lambda_1 \ge ... \ge \lambda_i \ge ... \ge \lambda_m \]

Change of coordinates

The diagonal matrix and the normal matrix represent the same endomorphism (or the same quadratic form) in two different sets of basis vectors of the parameter space (the canonical basis and the eigenvector basis). We can study the problem in the new basis made of the eigenvectors of the normal matrix. Let's call \( \mathcal{B}_m \) the canonical basis and \( \mathcal{B}'_m \) the eigenvector basis. We have:

\( X = mat_{\mathcal{B}_m}(p - p_0) \)
\( X' = mat_{\mathcal{B}'_m}(p - p_0) \)

The transition from one system of coordinates to the other is given by:

\( X = PX' \)
\( S = PS' \)

We can then notice that:

\( \,^tXNX = \,^tX'\,^tPNPX' = \,^tX'DX' \)
\( \,^tSX= \,^tS'\,^tPX = \,^tS'X' \)

And we can rewrite the expression of \(Q(p) = \,^tXNX - 2\,^tSX + \alpha \) into:

Change of coordinates

\[ Q(p) = \,^tX'DX' - 2\,^tS'X' + \alpha \]

This time the matrix of the quadratic term is diagonal, so the development is straightforward:

Development

\[ Q(p) = \sum_{i=1}^{m}{\lambda_i x'^2_i} - 2 \sum_{i=1}^{m}{s'_i x'_i} + \alpha \]

What do we notice?

On each axis \( x'_i\), we have the equation of a parabola. All cross terms of degree 2 have disappeared in the new system of coordinates, which makes things simple, and which of course, was the purpose of the diagonalization. This function is a paraboloid, and its principal directions are given by the eigenvectors.

We can now rewrite the equation to make the minimum along each axis self-evident.

Rewriting

\[ Q(p) = \sum_{i=1}^{m}{ \lambda_i \left(x'_i - \frac{s'_i}{\lambda_i} \right)^2 } - \sum_{i=1}^{m} { \frac{ s'^2_i}{\lambda_i} } + \alpha \]

On the left side we have terms that are either positive or null (squares), and on the right side we have a constant. We easily conclude that the minimum is obtained when the squares are null.

We also notice that the leading term of degree 2 along each axis \(i\), i.e. the "steepness" of each parabola, is given by the eigenvalue \( \lambda_i \). The higher the value of \( \lambda_i \), the steeper the curve, and the lower, the flatter.

Minimum

The minimum of \(Q\) is easily obtained for \( (x'_i)_{i=1,m} = \left( \frac{s'_i}{\lambda_i} \right)_{i=1,m} \), i.e. \(X' = D^{-1}S' \).

Minimum

Eigenvector basis

\[ X'_{min} = mat_{\mathcal{B}'_m}(p_{min} - p_0) \]

\[ X'_{min} = D^{-1}S' = \left(\frac{s'_i}{\lambda_i}\right)_{i=1,m} \]

We can easily go back to the original system of coordinates and reconstruct \(X_{min} \) by the multiplication \( X_{min} = PX'_{min} \).

Minimum

Canonical basis

\[ X_{min} = mat_{\mathcal{B}_m}(p_{min} - p_0) \]

\[ X_{min} = PX'_{min} \]

The solution parameter \(p_{min} \) we found in this part of the course is exactly the same as the one we found in the first part of the course. It is the linearized least squares minimum. Thus far, there is no difference in the algorithm, the only difference lies in the basis in which we express the problem (canonical basis or eigenvectors basis).

Improvement in residuals

We can also easily figure out the improvement in residuals from \(Q(p_0)\) to \(Q_{min}\), expressed with the new coordinates. In the case of \(p_0\), \( x'_i = 0 \), in the case of \(p_{min}\), \(x'_i = s'_i/\lambda_i \). We conclude that:

Improvement in residuals

\[ \Delta Q = - \sum_{i=1}^{m}{ \frac{ s'^2_i}{\lambda_i} } \]

Summary

Let's summarize all the main elements of the problem expressed in the basis of eigenvectors.

Summary

\[ Q(p) = \sum_{i=1}^{m}{\lambda_i x'^2_i} - 2 \sum_{i=1}^{m}{s'_i x'_i} + \alpha \]

\[ X'_{min} = \left(\frac{s'_i}{\lambda_i}\right)_{i=1,m} \]

\[ \Delta Q = - \sum_{i=1}^{m}{ \frac{ s'^2_i}{\lambda_i} } \]

\[ X_{min} = PX'_{min} \]

Advantages

The main advantage of this formulation is that the eigenvectors represent the principal directions of the paraboloid. Therefore, the expressions are simpler (no cross terms of degree 2), and furthermore, we can process each direction independently.

It is visually easy to understand the formula. We can easily figure out what it means to go to the minimum of the paraboloid. It means to go to the minimum of each parabola along every axis. We can also easily imagine what it would mean, should we decide to do so, to go to the minimum of the parabolas only along certain selected axes. That's the topic for the next chapter.

In any case, whatever we choose to do (minimum or partial minimum), we can always reconstruct the solution parameter in the original basis by the simple multiplication \( X_{min} = PX'_{min} \).