Least squares

# Iterations

This chapter details the iteration process mentioned at the end of the previous chapter.

```
// Iterations
for (let i = 0; i <= 2147483647; i++) {
iterate()
}
```

## Iterations

At this point, we have found the minimum of \( Q \). This is not yet the minimum of \( \mathcal{S} \). However, we can try an iteration process. If \( \mathcal{S} \) has a minimum (local or global), the iteration process might converge, and it might provide a satisfactory solution.

Let's dive into it.

As we already said, \( \mathcal{S}(p_{min}) \) is not \( \mathcal{S}_{min}\). The parameter \(p_{min}\) only provides the minimum for \(Q\). This means that you can theoretically find a better value for \(\mathcal{S}\) *if* you find a better \( p_{min} \).

How to find a better \(p_{min}\)? Well, if you find a better \(Q\) (a better approximation of \(\mathcal{S}\)), you might find a better \(p_{min}\) as a consequence. How to find a better \(Q\)? Since \(p_{min}\) is closer to the minimum of \( \mathcal{S} \) than \( p_0 \), you might want to linearize \(f \) around \(p_{min}\) instead of \(p_0\). That new linearization might give you a better \(Q\). This new \(Q\) will give you a new \(p_{min}\), and the odds are very strong that the new \( \mathcal{S}(p_{min}) \) will be better than the old one.

You see where we are going. Iterations. If you build a series of quadratic approximations \( (Q_k) \) of \( \mathcal{S}\) closer and closer to the minimum of \( \mathcal{S}\), your results \( (p_k) \) will likely get better and better. If the process converges, i.e. if after a certain rank \(k\), successive results always point to the same point, then you might just have found the minimum of \( \mathcal{S} \).

## First iteration

Let's reproduce all the steps from the previous chapter, but this time explicitly give the subscript \( _0 \) to all quantities that depend on \( p_0 \) the initial parameter (the a-priori gravity field model).

- \( \ell_0 \) is the linear approximation of \( f \) in \( p_0 \)
- \( Q_0 \) is the quadratic approximation of \( \mathcal{S} \) in \( p_0 \)
- \( A_0 \) is the observation matrix (the partial derivatives of \( f \) in \( p_0 \))
- \( N_0 = \,^tA_0A_0 \) is the normal matrix
- \( S_0 = \,^tA_0B_0 \)
- \( B_0 = y - f(p_0) \) is the vector of residuals obtained in \( p_0 \)
- \( X_0 = p - p_0 \) is the correction to \( p_0 \)
- \( \alpha_0 = \,^tB_0B_0 = \mathcal{S}(p_0) \) is the sum of the squares of the initial residuals in \( p_0 \)

##### First approximation of \( \mathcal{S}\), around \( p_0 \)

\[ Q_0(p) = \,^tX_0N_0X_0 - 2\,^tS_0 X_0 + \alpha_0 \]

Just as we did in the previous chapter, we can easily determine the minimum of \(Q_0\). Let's call it \( p_1 \).

##### Iteration 1

\[ p_{min} = p_1 \]

\[ Q_0(p_1) = Q_{0_{min}} = \min_p Q_0(p) \]

\[ \Delta_0(p) = p_1 - p_0 = N_0^{-1} S_0 \]

\[ \Delta Q_0 = - \,^tS_0 N_0^{-1} S_0 \]

## Second iteration

We have now obtained a corrected model \( p_1 \), which is better than \( p_0 \). If we linearize \(f\) around \( p_1 \) this time, we may obtain a better approximation of \(\mathcal{S}\) around its expected minimum. Let's call it \( Q_1 \) and find its minimum.

##### Second approximation of \( \mathcal{S}\), around \( p_1 \)

\[ Q_1(p) = \,^tX_1N_1X_1 - 2\,^t S_1 X_1 + \alpha_1 \]

This is the same paraboloid situation. Let's call \(p_2\) the minimum of \( Q_1 \).

##### Iteration 2

\[ p_{min} = p_2 \]

\[ Q_1(p_2) = Q_{1_{min}} = \min_p Q_1(p) \]

\[ \Delta_1(p) = p_2 - p_1 = N_1^{-1} S_1 \]

\[ \Delta Q_1 = - \,^t S_1 N_1^{-1} S_1 \]

## Next iterations

We can reproduce the process to infinity.

##### Iteration k

\[ Q_k(p) = \,^tX_kN_kX_k - 2\,^t S_k X_k + \alpha_k \]

\[ p_{min} = p_{k+1} \]

\[Q_k(p_{k+1}) = Q_{k_{min}} = \min_p Q_k(p) \]

\[ \Delta_{k} (p) = p_{k+1} - p_k = N_k^{-1 }S_k \]

\[ \Delta Q_k = - \,^tS_kN_k^{-1}S_k \]

## Convergence

If \( \mathcal{S} \) is regular enough, the successive quadratic approximations should bring us closer and closer to a minimum. We should therefore obtain a series \( (p_k) \) such as \( \mathcal{S}(p_{k+1}) \lt \mathcal{S}(p_k) \).

But when does the process stop?

If after several steps, improvements in residuals from one iteration to the other become minimal, we can decide to put an end to the process, and declare a victor.

For example, we can fix an arbitrary threshold \( \varepsilon \), and declare that as soon as \( | \mathcal{S}(p_{k+1}) - \mathcal{S}(p_k) | \lt \varepsilon \), we stop the process. Or similarly, we can apply the \( \varepsilon\) criterion to \(Q\), i.e. \( | \Delta Q_k | \lt \varepsilon \) (the decrease along the paraboloid instead of the actual function). It means that the improvement in residuals is now too small to justify the effort of another iteration. We could also apply the criterion to the norm of the correction on the vector, i.e. \( || p_{k+1} - p_k || \lt \varepsilon \), essentially observing that our minimum doesn't move anymore.

Once, after several iterations, our criterion is attained, we declare that "convergence" is achieved.

##### Convergence criterion

###### k such as

\[ |\mathcal{S}(p_{k+1}) - \mathcal{S}(p_k)| \lt \varepsilon \]

## Non-convergence

What are the other possibilities?

It is always possible (even practically recommended) to fix a maximum number of iterations by default, for example to prevent infinite loops or to limit computing time. In that case, if convergence is not achieved when the maximum number of iterations is reached, we can stop the process and decide to use the result of the last iteration. We typically call this situation "convergence unachieved".

We can also encounter divergence, depending on the characteristics of \( \mathcal{S} \). For example, if we find a \( k \) such as \( \mathcal{S}(p_{k+1}) \gt \mathcal{S}(p_k) \), then the process is diverging. In this case, we can either keep the result of the last iteration, or declare the situation fishy and discard the result. In this case, we may have to find another method to find a solution to our problem.

## Edison's algorithm

The inventor Thomas A. Edison knew a lot about iterations. It is said that he made 10,000 attempts before inventing the electric bulb. Here is what he had to say about his algorithm:

##### Iterations in practice

###### (Thomas A. Edison)

« The most certain way to succeed is always to try just one more time »

In coding implementation, this is an infinite loop.