Diagonalization

# Truncation

This chapter is about the truncation of eigenvalues. That is where the main improvement lies. This chapter explains the main advantage over the traditional least squares method.

##### Truncated solution

\[ \forall \, i \ge k+1, \, x'_i = 0 \]

## Motivation for the truncation

### Summary

In the previous page, we expressed the problem in the new basis of eigenvectors. Let's summarize all the elements one more time:

This is the expression of the paraboloid \(Q\):

##### Paraboloid expressed in the eigenvectors basis

\[ Q(p) = \,^tX'DX' - 2\,^tS'X' + \alpha \]

\[ Q(p) = \sum_{i=1}^{m}{\lambda_i x'^2_i} - 2 \sum_{i=1}^{m}{s'_i x'_i} + \alpha \]

These are the correction \(X'_{min}\) on the parameter, i.e. the "total horizontal move", and the gain \(\Delta Q\) in the residuals, i.e. the "total vertical move":

##### Total horizontal and vertical moves

###### Towards the paraboloid minimum

\[ X'_{min} = D^{-1} S' = \left(\frac{s'_i}{\lambda_i}\right)_{i=1,m} \]

\[ \Delta Q = - \sum_{i=1}^{m}{ \frac{ s'^2_i}{\lambda_i} } \]

These are the "individual horizontal moves" \( (\Delta p_i) \) along each axis \(i\), and the "individual vertical moves" \( (\Delta Q_i) \) for each axis \(i\).

##### Individual horizontal and vertical moves

###### Towards the paraboloid minimum

\[ \Delta p_i = x'_i = \frac{ s'_i}{\lambda_i} \]

\[ \Delta Q_i = - \frac{ s'^2_i}{\lambda_i} = -\lambda_i \left(\Delta p_i\right)^2 \]

### Analysis of the horiontal and vertical moves

Let's look at the amplitudes of the horizontal and vertical moves in function of the eigenvalue \( \lambda_i\).

What happens when you go to the minimum?

The eigenvalue \(\lambda_i\) is the leading coefficient of each parabola. When the parabola is steep, you make a small horizontal move and a large vertical move. When the parabola is flat, on the opposite, you make a large horizontal move and a small vertical move.

In other words, when \(\lambda_i\) is big, it only takes a small correction on the parameters to provide a large improvement in the residuals. And when \(\lambda_i\) is small, it takes a huge correction on the parameters to provide only a small improvement in the residuals.

Which one would you say is the most efficient?

### The (forgotten) problem of linearity

Let's also take into account the following. Do you remember the assumption we left aside in the very beginning? We must not wander too far away from \( p_0 \) if we want our function \(Q\) to remain relevant.

What do we do when we go to the minimum of the parabolas along flat axes? We make a large horizontal move (proportional to \(1/\lambda_i\)), and we take the risk of moving away from the linearity zone. Besides, we do all this without achieving any significant improvement in the residuals. This may not be such a good idea. It may even be a very bad idea.

### The case for leaving some axes out

Remember that our initial problem was to find the minimum of \( \mathcal{S} \), not the minimum of \(Q\). Therefore, there is no reason to be obsessed with the minimum of \(Q\). It is perfectly reasonable to make a selection of the best axes, i.e. the steepest axes, where the improvement in residuals is large and the risk of getting outside of the linear validity zone is none. Along those axes, it makes perfect sense to go to the minimum. Along the other axes, it is not necessary to move . Let's just stay where we are.

### An illustration with gravity field

The application case will show you exactly the nature of the problem. In the case of gravity field maps, we will see in the next chapter that all the noise gathers in the bottom eigenvectors (the ones associated with the lowest eigenvalues). Going to the minimum along those flat axes only adds a *very big* amount of *very bad* noise (the correction is a noisy eigenvector multiplied by a large number, proportional to \( 1/\lambda_i \)). And the result is an insignificant improvement in residuals. This is suboptimal, not to say stupid.

### What next?

So it is wiser to select the axes we want to minimize. But with this new information, we have just created a new problem for ourselves. Which axes do we keep in, and which ones do we leave out?

## Truncation

### Mathematically speaking

As we explained above, the idea of the truncation is to make moves to the minimum only along the selected directions. How does it work mathematically? This is simple. The only thing we need to do is to "neutralize" the unwanted moves, i.e. nullify the corrections.

The axes we want to keep are the ones with the highest eigenvalues (the steepest parabolas). Since the eigenvalues are ordered by descending order, the problem simply consists of selecting a rank \(k\) after which we declare the corrections equal to zero.

This is what we call "truncation".

### The truncated solution

Let's call \( X'_{trunc} \) the new solution. We will have two expressions for \( x'_i \), the usual one until the rank \(k\) (we go to the minimum of the parabola), and *zero* after the rank \( k \) (we stay where we are).

##### Truncated solution

\[ X'_{trunc} = (x'_i)_{i=1,m} \]

Before the threshold, we have:

##### Truncated solution

###### Before the threshold k

\[ x'_i = \frac{ s'_i} { \lambda_i }, \forall \, i \in [|1,k|] \]

And after the threshold, we have:

##### Truncated solution

###### After the threshold k

\[ x'_i = 0, \forall \, i \in [|k+1,m|] \]

### Matrix formulation

The process is called "truncation of eigenvalues" because it is equivalent to truncating the diagonal inverse matrix \(D^{-1}\) in the solution.

##### Matrix formulation

\[ D^{-1}_{trunc} = diag(\frac{1}{\lambda_1}, ..., \frac{1}{\lambda_k}, 0, ..., 0) \]

\[ X'_{trunc} = D^{-1}_{trunc} \, S' \]

### Improvement in residuals

The improvement in residuals is given by the same usual formula, except that only the contributions until rank \(k\) are kept. This is the advantage of diagonalization, each direction can be processed independently without any effect on the others.

##### Improvement in the residuals

\[ \Delta Q_{trunc} = - \sum_{i=1}^{k}{ \frac{ s'^2_i }{ \lambda_i } } \]

### Reconstruction in the canonical basis

After we built our solution vector in the eigenvector basis, we can reconstruct the solution vector in the original canonical basis with the usual multiplication.

Using the usual notations:

- \( X'_{trunc} = mat_{\mathcal{B}'_m}(p_{trunc} - p_0) \)
- \( X_{trunc} = mat_{\mathcal{B}_m}(p_{trunc} - p_0) \)

We have:

##### Reconstruction

\[ X_{trunc} = PX'_{trunc} \]

### Linear combination

Note that the vector \( X'_{trunc} \) has non null coordinates only on the first \(k\) components, whereas the vector \(X_{trunc} \) has non null coordinates on all its components. In other words, the solution \( p_{trunc} \) is a linear combination of the \(k\) first eigenvectors, but it is a linear combination of all of the \(m\) canonical basis vectors.

## Choice of the truncation threshold

So how do you perform the truncation? How do you choose the truncation level?

Despite everything we've done so far, all the craft actually resides here. We will describe four options for you. However, the art to finely tune the recipe to your application case will have to remain... your own field of experiment!

### Fixed number of eigenvalues

You can choose a truncation threshold as a fixed number of eigenvalues \(k\).

For example, the spherical harmonic coefficients up to degree 90 represent a space of dimension 8281. Let's suppose your problem and your normal matrix are in that space. The spherical harmonic coefficients up to degree 50 represent a space of dimension 2601. For the truncation, you could for example choose the subspace generated by the first 2601 eigenvectors of the degree 90 normal matrix (a subspace of the bigger space). The dimension would be exactly the same as the one of the space of spherical harmonics up to degree 50. So you could try that threshold (dimension 2601) and compare the results between the two experiments. If you're curious about the result, we did it for you in the application chapter (with 80 instead of 90).

This method would mean a fixed threshold of \(k=2601\) for all your normal matrices.

### Percentage of the total improvement in residuals

You can also choose a threshold as a percentage \( \beta \) of the total improvement in residuals (if you go the minimum along all axes). The index \(k\) would be the rank when the combined vertical move of the first \( k \) corrections reaches \( \beta \% \) of the total vertical move.

##### Percentage \( \beta \) of the total improvement in residuals

###### Threshold k such as

\[ \sum_{i=1}^{k}{ \frac{ s'^2_i}{\lambda_i} } \le \beta \sum_{i=1}^{m}{ \frac{ s'^2_i}{\lambda_i} } \lt \sum_{i=1}^{k+1}{ \frac{ s'^2_i}{\lambda_i} } \]

For example, you could select a fixed percentage of \( \beta = 97\% \). Then the number of eigenvalues would vary for each normal matrix.

### Inflexion point

You can also plot the curve of eigenvalues, and truncate the diagonal matrix when you see an inflexion point, i.e. in the middle of the "S curve". Both the percentage and the number of eigenvalues would change for each normal matrix.

### Gradual truncation

You can also imagine techniques where you don't brutally cut eigenvalues at one threshold, but instead gradually attenuate the eigenvalues between two thresholds \(k\) and \(l\). To be exactly precise, it is not the eigenvalues that you gradually attenuate to zero, but the inverse of the eigenvalues (the \(D^{-1}\) matrix between ranks \(k\) and \(l\)). You could imagine any kind of attenuation law (linear, parabolic, exponential). If we use the letter \(\gamma\) for the attenuation law, we get the following solution:

##### Gradual truncation

###### Two thresholds at k and l (1<k<l<m)

\[ D^{-1}_{trunc} = diag(\frac{1}{\lambda_1}, ..., \frac{1}{\lambda_k}, \gamma \left(\frac{1}{\lambda_{k+1}}\right), ..., \gamma \left(\frac{1}{\lambda_l}\right), 0, ..., 0) \]

\[ X'_{trunc} = D^{-1}_{trunc} \, S' \]

This can also be written as:

##### Gradual truncation

###### Two thresholds at k and l (1<k<l<m)

\[ x'_i = \frac{s'_i}{\lambda_i}, \forall \, i \in [|1,k|] \]

\[ x'_i = s'_i \times \gamma \left( \frac{ 1 }{ \lambda_i } \right), \forall \, i \in [|k+1,l|] \]

\[ x'_i = 0, \forall \, i \in [|l+1,m|] \]

### Reconstruction

In all cases, you reconstruct your solution in the original basis by the multiplication \( X_{trunc} = PX'_{trunc} \).

##### Reconstruction

\[ X_{trunc} = PX'_{trunc} \]

That's it!

## Application

And now, to finish off the course, we prepared some concrete (and entertaining) illustrations on gravity field determination for you.