1.Linear regression
1.1 Preliminaries: linear algebra & matrix calculate
- Vectors: a= (Vector space)
- Inner product: (Euclidean space)
- Norm:
- Distance:
-
Matrix: If ,we can define AB
-
Square Matrix (n=m) → det(A),det(AB)=det(A)det(B) → tr(A), tr(AB) = tr(BA)
-
Symmetric matrix : there exists a matrix satisfying such that [spectral decomposition]
(i) The are called eigenvalues of A.
(ii) → A is called positive definite (p.d.) if are all positive for all nonzero a ∈ ℝⁿ.
→ A is called positive semi-definite (p.s.d.) if . for all a ∈ ℝⁿ.
-
Inverse of a partitioned matrix.
- Let M =. Suppose that H is invertible. Then
where is called the Schur complement of with respect to (w.r.t.) .
Taking inverse on both sides, we obtain that
- Similarly, if is invertible, then
where is the Schur complement of w.r.t. .
If both & are invertible, then we have
In particular, let , , . Then
(rank-one update of an inverse matrix)
- Let . Then
We have that
1.2 Linear models and least squares estimators}
-
Regression: goal is to predict based on the input/feature .
-
A linear model assumes that
where are unknown parameters, and is an error term.
- Let , and . Then model (1) can be written as
- Given , our prediction for at is
- We can define a loss function which measures the penalty paid for predicting when the true value is . e.g.,
- Idea: find an estimator such that the expected loss
-
Now suppose we are given a labeled training set which are iid sampled from .
-
By law of large numbers (LLN), can be approximated by
- Let and
Then
- In other words, our estimator should be such that is closest to . That is,
- Obviously, such a best explains the data and is called a least squares estimator.
Theorem 1.
(i) A is an LSE iff .
(ii) If has full rank, then the LSE is unique and is given by .
Obviously, part (ii) follows directly from part (i).
The 1st pf for part (i):
The 2nd pf for part (ii):
(Sufficiency ). Let satisfy .
as .
(Necessity ) Let be another LSE, that is,it minimize Then we must have that implying that and thus