Consider a linear equation system of equations and
variables:
|
(1) |
which can be expressed in matrix form:
|
(2) |
where
|
(3) |
In general, the existence and uniqueness of the solution
can be determined by the
fundamental theorem of linear algebra
based on the rank of the coefficient matrix
. Here we only consider some speical cases:
We now consider solving the over-determined linear system in
last case. The error or residual of the ith equation
is defined as
,
or in matrix form
with
. The total sum-of-squares error
of the system is:
To find the optimal solution that minimizes
, we set its derivative with respect to
to zero (see here):
and solve this matrix equation to get
|
(7) |
where
is the pseudo-inverse
of the non-square matrix .
When the equations are barely independent, matrix
may be a near singular matrix with some
eigenvalues close to zero, correspondingly its inverse
may have some huge eigenvalues.
Consequently the system may be ill-conditioned or ill-posed,
in the sense that a small change in the system due to noise
may cause a large change in the solution .
This problem can be addressed by regularization.
Specificaly, by which the solution is controled
to not take unreasonably high values. Specifically we
construct an objective function that contains a penalty
term for large , as well as the error term
:
|
(8) |
By minimizing
, we can obtain a solution
of small norm
as well as low error
. Same as before, the solution
can be obtained by setting the derivative of
to
zero
|
(9) |
and solving the resulting equation to get:
|
(10) |
We see that even if
is near singular,
matrix
is not due to the
additional term
.
By adjusting the hyperparameter , we can make a
proper tradeoff between accuracy and stability.
- Small : the solution is more accurate but
also more prone to noise and therefore less stable,
i.e., the variance error may be large. This is called
overfitting;
- Large : the solution is more stable as it
is less affacted by noise, but it may be less accurate.
This is called underfitting.
The issue of overfitting versus underfitting
will be more formally addressed in a
later chapter.