Definitions - Least squares revisited

This chapter is an introduction to the least squares method, using a simplified case of gravity field determination as an illustration. Since least squares are well known and familiar to many, you're welcome, if you feel inclined to do so, to skip the texts and read only the framed results. They say, using mathematical language, the very same thing in a much more concise manner.

Sum of the squares of the residuals

\[ \mathcal{S}(p) = \sum_{ i=1 }^{ n } \left( y_i - f(p)_i \right)^2 \]

Parameters (or model)

Let's suppose we want to calculate the orbit of a satellite. We need a model of the gravity field of the Earth in order to calculate the attraction felt by the satellite. If we consider that the Earth is a single point, the model will only consist of the the gravitational constant \( G \) and the mass of the Earth \( M \). If we wish to be more sophisticated, we can develop the Earth's potential in a sum of spherical harmonics. The model will consist of a development in spherical harmonic coefficients up to a certain degree. The coefficients \( C_{nm} \) and \( S_{nm} \) will constitute our model (\( n \) and \( m \) being respectively the degree and order).

In terms of vocabulary, we also use the name parameter (or parameters) to refer to the model. Indeed, the model is an input parameter of the calculation, and, as such, will directly impact its results. The model is part of the theoretical side of the problem. We feed the values of the model (for example the mass of the Earth) to the software, which uses it the application of the laws of physics to calculate (or predict) the satellite's orbit. The initial model available for computations has a certain level of quality. Later, we may want to improve this quality by adding corrections to the model, derived from new or better observations. For example, the mass of the Earth or the spherical harmonic coefficients may be different than what we thought at first, and maybe we now have measurements of some sort that could allow us to obtain more precise values. This is actually the purpose of the least squares method: to improve the initial model using the available measurements.

Let's call \( m \) the number of variables in the model. We can represent the model by a vector \( p \) of dimension \( m \). For example, if we develop the gravity field model in spherical harmonics up to degree 90, that represents 8281 coefficients, so \(m = 8281\).

Parameters

(or model)

\[ p = (p_j)_{j=1,m} \in \mathbb{R}^m \]

Measurements (or observations)

On the other side of the equation, we have the measurements. Measurements come from observation and represent reality. They come from instruments: the human eye, a thermometer, a graduated scale, a weight scale, an accelerometer, etc. Measurements are a good sampling of reality, but they are not exactly reality, because instruments and are not perfect. Measurements inevitably contain some level of imprecision.

In the case the GRACE and GRACE Follow-On missions, which are used to recover Earth's gravity field (the spherical harmonic coefficients), there are two main types of measurements used in the process: the GPS measurements and the KBR measurements (GPS stands for Global Positioning System and KBR stands for K-Band Ranging). The GPS measurements are observations of the position of the satellite in space (this is an oversimplified statement, but it is enough for our demonstration), and the KBR measurements represent the distance between the two twin GRACE satellites.

Let's call \( n \) the number of measurements that we have at our disposal in a given problem. We can represent the measurements by a vector \( y \) of dimension \( n \). In our example, we have around 80,000 GPS measurements per day, and around 20,000 KBR measurements per day. If we cumulate this information over 30 days to calculate a monthly gravity field, that makes around 3 million measurements per month, \(n=3{,}000{,}000\).

Measurements

(or observations)

\[ y = (y_i)_{i=1,n} \in \mathbb{R}^n \]

Theoretical measurements

Measurements are simple to understand. They represent a physical quantity measured by an instrument at a given instant (for example a GPS position or the KBR distance between two satellites). They are exact, to the extent of the instrumental accuracy.

Theoretical measurements represent the same physical quantities, but this time, predicted by the model and the laws of physics, instead of measured. Let's suppose we have a model of Earth's gravity potential in the form of a series of spherical harmonic coefficients. We can apply the laws of physics and predict the positions of the satellite over time. We will then get a series of numbers that are comparable to the real measurements. They are called "theoretical measurements".

Let's call \( f \) the function that transforms the parameters into the physical quantities (in our case, the positions in space and the intersatellite distances).

\( f \) represents the laws of physics as implemented in the software.
\( f(p) \) is a vector of size \( n \), which represents the theoretical measurements.

In the case of orbit determination, \( f \) is not a simple function, in the "usual sense" of the term. It is not analytical, and there is no formula for it. Instead, it is a long process of numerical integration and orbit propagation. It usually takes around one hour and a half to compute the orbit determination for each daily arc (with all the partial derivatives of gravity field). After this one hour and a half, you get all your \(f(p)\) values for the \(p\) you provided in input.

Theoretical measurements

(obtained from the model, the laws of physics and the software)

\[ f : \mathbb{R}^m \rightarrow \mathbb{R}^n \] \[ p \mapsto f(p) \] \[ (p_j)_{j=1,m} \mapsto (f(p)_i)_{i=1,n} \]

\( f(p) \) is comparable to \( y \). They both represent the same physical quantities. One is obtained by the software from the application of theory and the other by observation.

Of course, the first idea that comes to mind is: "Can we compare them?".

Residuals

So let's compare the theoretical measurements and the real measurements, but let's first take note of the following (cold hard) facts:

The measurements are not perfectly precise (there are technical flaws in our instruments).
The model is not perfectly precise (our knowledge of reality is not accurate).
The theory is not perfectly precise (we thought Newton was right until Einstein came around).
The application of the theory by the software is not perfectly precise (could there be bugs?).

It should therefore be no surprise to anyone that we obtain differences between the real measurements and the theoretical measurements. Let's give a name to those differences. We call them "measurement residuals". Let's use the letter \( b \) as a vector of \( \mathbb{R}^n \) to describe them.

Measurement residuals

(differences between measurements and theoretical measurements)

\[ b = y - f(p) \]

\[ (b_i)_{i=1,n} = ( y_i - f(p)_i )_{i=1,n} \in \mathbb{R}^n \]

Let's make two comments.

First, if everything was perfect (theory, software, model and instruments), theoretical measurements and real measurements would be exactly the same, and the residuals would be zero. There would be no reason for this course to exist, and we would all give up our pens and computers and go fishing for the afternoon.

Second, what is the physical meaning of residuals?

Small residuals indicate that the predictions agree with the observations. It must mean that the model is good. If we have a good gravity field model in input, then the calculated GPS positions and KBR distances will be close to the measured ones, and the residuals will be small. In other words, if \(p\) is good, \(f(p)\) will be close to \(y\), and \(y-f(p)\) will be small.

As a conclusion, the level of residuals give us a good estimation of the quality of the model. The smaller the residuals, the better the model.

Of course this is only true if the theory, the software and the instruments do not represent a main source of error in the process. If there's a bug in the code, a flaw in the theory or a completely unreliable instrument, the residuals do not mean anything. But this case is a rare exception and can usually be solved quickly. There are other caveats. For example, you can artificially lower the level of residuals by adding a high number of additional empirical parameters to the system. Theoretical measurements will be improved but this will not reflect the quality of the "main" model. As always, the deeper you dive into a discipline, the more nuances and sophistications you encounter. But as a general rule for this introduction, let's keep it simple and say that « the smaller the residuals, the better the model ».

Sum of the squares of the residuals

There are as many residuals as there are measurements. That's a lot of them. So we would be happy with a single unique number to quantify the agreement between the model and the reality.

Intuitively enough, the sum of the squares of the residuals represents a good candidate for the job.

Sum of the squares of the residuals

(one single quantitative number)

\[ \mathcal{S}(p) = \sum_{ i=1 }^{ n } \left( y_i - f(p)_i \right)^2 = || y - f(p) ||^2 \]

This is none other than the square of the euclidian distance between the theoretical measurements vector and the measurements vector.

Let's have a closer look at that function. The sum of the squares of the residuals is a function from \( \mathbb{R}^m \) to \( \mathbb{R} \). It a takes a parameter and provides a number. This function could be plotted from the space of parameters in a "horizontal plane" \( (\mathbb{R}^m) \), towards the space of real numbers along the vertical axis \( (\mathbb{R}) \). This is easy to visualize if \( m \le 2 \), and of course, not so much if \( m = 8281 \). But we can always try to use our imagination.

The least squares minimum

The remaining step is straightforward. If "smaller means better, then "smallest means best". The best model \(p\) is thus the one that minimizes the sum of the squares of the residuals \(\mathcal{S}(p)\).

Model that best fits the observations

\[ p_{min} \]

\[ \mathcal{S}(p_{min}) = \mathcal{S}_{min} = \min_p \sum_{ i=1 }^{ n } \left( y_i - f(p)_i \right)^2 \]

Illustration case: let's suppose that our original gravity field model is \(p_0\), and that we are interested in the gravity field of the Earth for a specific given month. We can gather the 3,000,000 measurements for that given month, and apply a minimization algorithm to find the model that minimizes the sum of the squares of the measurement residuals. We will obtain a new gravity field model \(p_{min}\) that has the best possible agreement with the measurements for that given month. We have just improved our initial model.

Corrections

After the minimization process is achieved, we will have made a correction on the parameter \( \Delta p = p_{min} - p_0 \), and the new parameter will have improved the residuals by \( \Delta \mathcal{S} = \mathcal{S}(p_{min}) - \mathcal{S}(p_0) \).

Corrections

\[ \Delta p = p_{min} - p_0 \in \mathbb{R}^m \]

\[ \Delta \mathcal{S} = \mathcal{S}(p_{min}) - \mathcal{S}(p_0) \in \mathbb{R}^{-} \]

What next?

So now that we have laid out our problem, the main task still remains. How do we find this minimum?