Skip to main content

Section A.1 Linear Algebra

Subsection A.1.1 Normal Equation

Definition A.1.1. Normal Equation.

Let \(\mathcal{V}\subseteq\mathbb{R}^n\) be a vector space and \(\mathcal{V}^\perp \perp \mathcal{V}\text{.}\) Then the normal equation of the linear system \(\vec{A}\vec{x}=\vec{b}\) is defined as:
\begin{equation*} A^T A \vec{x} = A^T \vec{b} \end{equation*}
where we assume \(A\in\mathbb{R}^{n\times k}\) and \(\vec{b}\in\mathbb{R}^n\text{.}\)
Note A.1.2.
The above framework used to define the normal equation implies that if a valid solution \(\vec{x}\) exists, it will be a \(k\times 1\) column vector.

Subsection A.1.2 Linear Regression

Subsubsection A.1.2.1 Scalar Arithmetic Notation & Derivation

Definition A.1.3. Line of Best Fit.
Given data vectors \(\vec{x}=\left(x_1, \dots, x_n\right)\) and \(\vec{y}=\left(y_1, \dots, y_n\right)\text{,}\) we seek the line of best fit:
\begin{equation*} y=mx+b \, . \end{equation*}
The following derivation of this line is visualized below as a single ’logical chain’ of matrix operations:
\begin{equation*} \underbrace{ \begin{bmatrix} x_1 & x_2 & \dots & x_n \\ 1 & 1 & \dots & 1 \\ \end{bmatrix} \begin{bmatrix} x_1 & 1 \\ x_2 & 1 \\ \vdots & \vdots \\ x_n & 1 \end{bmatrix} }_{ \begin{bmatrix} s_2 & s_1 \\ s_1 & n \end{bmatrix} } \begin{bmatrix} m \\ b \end{bmatrix} = \underbrace{ \begin{bmatrix} x_1 & x_2 & \dots & x_n \\ 1 & 1 & \dots & 1 \end{bmatrix} \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} }_{ \begin{bmatrix} c_{xy} \\ c_y \end{bmatrix} } \, . \end{equation*}
Here, the scalar sums are defined as:
\begin{equation*} s_1 = \sum_{i=1}^n x_i \, , \quad s_2 = \sum_{i=1}^n x_i^2 \, , \quad c_{xy} = \sum_{i=1}^n x_i y_i \, , \quad c_y = \sum_{i=1}^n y_i \end{equation*}

Subsubsection A.1.2.2 Least Squares Solution

Definition A.1.4.
The least squares solution to a system \(A\vec{x}=\vec{b}\) with no solution is:
\begin{equation*} A^T\underbrace{ A\vec{x}^* }_{ \mathord{\text{Proj}_{C(A)} \, \vec{b}} } = A^T \vec{b} \end{equation*}
Remark A.1.5.
Importantly,
\begin{equation*} A\vec{x}^*=\text{Proj}_{C(A)}\vec{b} \end{equation*}
minimizes the \(L^2\) vector norm:
\begin{equation*} \|A\vec{x}-\vec{b}\|^2\text{.} \end{equation*}
Note A.1.6.
By definition, \(A\vec{x}^*-\vec{b} \perp C(A)\text{.}\) Therefore,
\begin{equation*} A\vec{x}^*-\vec{b} \in C(A) \implies A\vec{x}^*-\vec{b} \in N(A^T)\text{.} \end{equation*}