This notebook goes with cs229 Linear Algebra Review and Reference
I put this notebook together as I read the document linked above. I had three goals...
-Eric Silberstein / Klaviyo
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
%matplotlib inline
Let's create an example matrix $A\in{\mathbb{R}^{3 \times 2}}$
A = np.arange(6).reshape(3,2)
A
$a_{22}$, the entry at the 2nd row and 2nd column
A[1,1]
$a^1$, the first column (alternative notation: $a_1$)
A[:,0]
$a^T_2$, the second row (alternative notation: $\bar{a}_1$)
Given two example vectors $x, y\in{\mathbb{R}^n}$, let's compute the dot product $x^Ty$
x = np.array([2,3])
x
y = np.array([4,5])
y
np.dot(x.T, y), np.dot(y.T, x), np.dot(x, y), np.dot(y, x)
x[0]*y[0] + x[1]*y[1]
Now let's compute the outer product $xy^T$
np.outer(x, y)
np.array([
[x[0]*y[0], x[0]*y[1]],
[x[1]*y[0], x[1]*y[1]],
])
x = x.reshape(2,1)
x
y = y.reshape(2,1)
y
np.dot(x,y.T)
Outer product example, use $x\boldsymbol{1}^T$ where $\boldsymbol{1}\in{\mathbb{R}^3}$ to compute a matrix where each column is x
np.outer(x, np.ones(3))
A = np.arange(4).reshape(2,2) + 2
A
x = np.array([6,7])
x
Let numpy calculate $Ax$
np.dot(A,x)
Calculate $Ax$ element-wise
np.array([A[0,0] * x[0] + A[0,1] * x[1], A[1,0] * x[0] + A[1,1] * x[1]])
Calculate $Ax$ by $ \begin{bmatrix} a^T_1x \\ a^T_2x \\ \end{bmatrix} $
np.array([np.dot(A[0].T, x), np.dot(A[1].T, x)])
Calculate $Ax$ by $ \begin{bmatrix} a^1 \end{bmatrix} x_1 + \begin{bmatrix} a^2 \end{bmatrix} x_2 $
np.dot(A[:,0], x[0]) + np.dot(A[:,1], x[1])
A = np.arange(9).reshape(3,3)
A
B = A * 2
B
Calculate AB using numpy
A.dot(B)
Calculate AB thinking of A as row vectors and B as column vectors
$ \begin{bmatrix} a^T_1b^1 & a^T_1b^2 & a^T_1b^3 \\ a^T_2b^1 & a^T_2b^2 & a^T_2b^3 \\ a^T_3b^1 & a^T_3b^2 & a^T_3b^3 \\ \end{bmatrix} $
np.array(
[
[np.dot(A[0], B[:,0]), np.dot(A[0], B[:,1]), np.dot(A[0], B[:,2])],
[np.dot(A[1], B[:,0]), np.dot(A[1], B[:,1]), np.dot(A[1], B[:,2])],
[np.dot(A[2], B[:,0]), np.dot(A[2], B[:,1]), np.dot(A[2], B[:,2])],
]
)
Calculate AB thinking of A as column vectors and B as row vectors
$ \displaystyle\sum_{i=1}^{n}a^ib^T_i $
np.outer(A[:,0], B[0]) + np.outer(A[:,1], B[1]) + np.outer(A[:,2], B[2])
Calculate AB thinking of A as a matrix and B as column vectors
$ \begin{bmatrix} Ab^1 & Ab^2 & Ab^3 \end{bmatrix} $
np.column_stack([np.dot(A, B[:,0]), np.dot(A, B[:,1]), np.dot(A, B[:,2])])
Calculate AB thinking of A as row vectors and B as a matrix
$ \begin{bmatrix} a^T_1B \\ a^T_2B \\ a^T_3B \end{bmatrix} $
np.stack([np.dot(A[0], B), np.dot(A[1], B), np.dot(A[2], B)])
Example of matrix multiplication being associative: $(AB)C = A(BC)$
C = np.random.rand(3,2)
C
np.allclose(np.dot(np.dot(A, B), C), np.dot(A, np.dot(B, C)))
Example of matrix multiplication being distributive: $A(B + C) = AB + AC$
C = np.random.rand(3,3)
np.allclose(np.dot(A, B + C), np.dot(A, B) + np.dot(A, C))
Matrix multiplication is not commutative: $AB \ne $BA
np.allclose(np.dot(A,B), np.dot(B,A))
Example of multiplication of non-square matrices:
np.dot(np.random.rand(2,3), np.random.rand(3,4))
Identity matrix example
I = np.eye(3,3)
I
Example of $AI = A = IA$
A
np.dot(A, I)
np.dot(I, A)
Diagonal matrix example
np.diag([3,4,5])
A
A.T
$(A^T)^T = A$
A.T.T
$(AB)^T = B^TA^T$
np.dot(A, B).T
np.dot(B.T, A.T)
$(A + B)^T = A^T + B^T$
(A + B).T
A.T + B.T
Example of a symmetric matrix
S = np.array([[1,2,3],[2,4,5],[3,5,6]])
S
S.T
Example of an anti-symmetric matrix
AS = np.array([[0,2,3],[-2,0,5],[-3,-5,0]])
AS
-AS.T
Form a symmetric matrix by $A + A^T, A\in{\mathbb{R}^{n \times n}}$
S = A + A.T
S
np.allclose(S, S.T)
Form an anti-symmetric matrix by $A - A^T$
AS = A - A.T
AS
np.allclose(AS, -AS.T)
Example of $A\in{\mathbb{R}^{n \times n}}$ represented as a sum of a symmetric matrix and an anti-symmetric matrix
$A = \frac{1}{2}(A + A^T) + \frac{1}{2}(A - A^T)$
np.allclose(A, S/2 + AS/2)
Sum of the diagonal elements
$ trA = \displaystyle\sum_{i=1}^nA_{ii} $
A
np.trace(A)
Example of $trAB = trBA$
np.trace(np.dot(A, B))
np.trace(np.dot(B, A))
Euclidian ($l_2$) norm of a vector
$ \|x\|_2 = \sqrt{\displaystyle\sum_{i=1}^{n}x_i^2} $
x = np.arange(3)
x
np.linalg.norm(x, ord=2)
np.sqrt(np.sum(x ** 2))
$l_1$ norm of a vector
$ \|x\|_1 = \sqrt{\displaystyle\sum_{i=1}^{n}\left|x_i\right|} $
np.linalg.norm(x, ord=1)
np.sum(np.abs(x))
Frobenius norm of a matrix
$ \|A\|_F = \sqrt{\displaystyle\sum_{i = 1}^m\displaystyle\sum_{j = 1}^na_{ij}^2} $
np.linalg.norm(A, ord='fro')
np.sqrt(np.sum(A ** 2))
Let's look at a matrix with 3 columns and rank 2
A
np.linalg.matrix_rank(A)
$-a^1 + 2a^2 = a^3$
-A[:,0] + 2 * A[:,1]
$-a^T_1 + 2a^T_2 = a^T_3$
-A[0] + 2 * A[1]
Let's look at a full rank matrix
R = np.random.rand(3, 3)
R
np.linalg.matrix_rank(R)
A is not full rank so it's not invertible
A = np.arange(9).reshape(3,3)
A
np.linalg.matrix_rank(A)
try:
np.linalg.inv(A)
except np.linalg.LinAlgError as err:
print(f"Can't invert A due to {err}")
Let's create an A that is invertible
A = np.array([[0,1,2], [3,4,5], [6,7,9]])
A
np.linalg.matrix_rank(A)
np.linalg.inv(A)
$A^{-1}A = I = AA^{-1}$
np.allclose(np.dot(np.linalg.inv(A), A), np.eye(3))
np.allclose(np.dot(A, np.linalg.inv(A)), np.eye(3))
$(A^{-1})^{-1} = A$
np.linalg.inv(np.linalg.inv(A))
$(AB)^{-1} = B^{-1}A^{-1}$
B = np.random.rand(3, 3)
B
np.linalg.matrix_rank(B)
np.allclose(np.linalg.inv(np.dot(A, B)), np.dot(np.linalg.inv(B), np.linalg.inv(A)))
$(A^{-1})^T = (A^T)^{-1}$
np.linalg.inv(A).T
np.linalg.inv(A.T)
Example of using the inverse to solve a system of equations
A = np.array([[1,2],[3,4]])
A
x = np.array([5,6])
x
b = np.dot(A, x)
b
$ A\in{\mathbb{R}^{2 \times 2}} \\ x,b\in{\mathbb{R}^2} $
Say we know A and b and want to solve for x in $Ax = b$, we can hand calculate x by...
$ x_1 + 2x_2 = 17 \\ 3x_1 + 4x_2 = 39 $
$ x_1 = 5 \\ x_2 = 6 $
...or we can do x = $A^{-1}b$
np.dot(np.linalg.inv(A), b)
Example of orthogonal vectors
x = np.array([1, 1])
x
y = np.array([1, -1])
y
np.dot(x, y)
Form an orthogonal matrix from x and y
x = x / np.linalg.norm(x)
x
y = y / np.linalg.norm(y)
y
$\|x\|_2 = 1$ and $\|y\|_2 = 1$ (x and y are normalized)
np.linalg.norm(x), np.linalg.norm(y)
U = np.column_stack((x, y))
U
$U^TU = I = UU^T$
np.dot(U.T, U)
np.dot(U, U.T)
Operating on a vector with an orthogonal matrix will not change its Euclidean norm
$ \|Uz\|_2 = \|z\|_2$ for $z\in{\mathbb{R}^n}, U\in{\mathbb{R}^{n \times n}}$ orthogonal
z = np.random.rand(2)
z
np.linalg.norm(z)
np.linalg.norm(np.dot(U, z))
Let's do a projection example that we visualize in 3D as projecting vector y to the XY plane.
y = np.array([1,1,1])
y
a1 = np.array([1,0,0]) # x axis
a2 = np.array([0,1,0]) # y axis
A = np.column_stack((a1, a2))
A
$ Proj(y; A) = A(A^TA)^{−1}A^Ty $
y_projection = np.dot(np.dot(np.dot(A,np.linalg.inv(np.dot(A.T, A))), A.T), y)
y_projection
ax = plt.axes(projection='3d')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
ax.set_xlim([0, 1])
ax.set_ylim([0, 1])
ax.set_zlim([0, 1])
ax.set_xticks([0,1])
ax.set_yticks([0,1])
ax.set_zticks([0,1])
origin = [0,0,0]
ax.quiver(*origin, *y, color='blue', label='y')
ax.quiver(*origin, *a1, color='green', label='a1')
ax.quiver(*origin, *a2, color='green', label='a2')
ax.quiver(*origin, *y_projection, color='red', label='projection')
ax.legend()
ax.legend(loc='upper left')
Now let's project y onto a different plane
a1 = np.array([1,0,0])
a2 = np.array([0,1,0.5])
A = np.column_stack((a1, a2))
A
y_projection = np.dot(np.dot(np.dot(A,np.linalg.inv(np.dot(A.T, A))), A.T), y)
y_projection
ax = plt.axes(projection='3d')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
ax.set_xlim([0, 1])
ax.set_ylim([0, 1])
ax.set_zlim([0, 1])
ax.set_xticks([0,0.5,1])
ax.set_yticks([0,0.5,1])
ax.set_zticks([0,0.5,1])
origin = [0,0,0]
ax.quiver(*origin, *y, color='blue', label='y')
ax.quiver(*origin, *a1, color='green', label='a1')
ax.quiver(*origin, *a2, color='green', label='a2')
ax.plot_surface(
np.column_stack([np.zeros(2), A[0]]),
np.column_stack([np.zeros(2), A[1]]),
np.column_stack([np.zeros(2), A[2]]),
color='green',
alpha=0.3)
ax.quiver(*origin, *y_projection, color='red', label='projection')
ax.legend()
ax.legend(loc='upper left')
plt.show()
Example of when A contains only a single column, this gives the special case for a projection of a vector on to a line:
$ Proj(y;a) = \displaystyle\frac{aa^T}{a^Ta}y $
a = np.array([1, 0])
a
y = np.array([.5, .5])
y
y_projection = np.dot(np.outer(a, a) / np.dot(a, a), y)
y_projection
origin = np.array([0,0])
plt.plot(*zip(origin, y), color='blue', label='y')
plt.plot(*zip(origin, a), color='green', label='a')
plt.plot(*zip(origin, y_projection), color='red', alpha=0.8, label='projection')
plt.legend()
plt.show()
Another example
a = np.array([1, 1])
a
y = np.array([.5, .2])
y
y_projection = np.dot(np.outer(a, a) / np.dot(a, a), y)
y_projection
origin = np.array([0,0])
plt.plot(*zip(origin, y), color='blue', label='y')
plt.plot(*zip(origin, a), color='green', label='a')
plt.plot(*zip(origin, y_projection), color='red', alpha=0.8, label='projection')
plt.legend()
plt.show()
Example of a vector in the nullspace
A = np.array([[1,0,0],[0,1,0],[0,0,0]])
A
x = np.array([0,0,5])
x
np.dot(A, x)
A = np.array([[1,3],[3,2]])
A
np.linalg.det(A)
Let's plot some of the linear combinations of the row vectors with coefficients $0 \leq \alpha_1 \leq 1$ and $0 \leq \alpha_2 \leq 1$
a1 = A[0]
a2 = A[1]
a1, a2
combinations = []
for alpha_1 in np.linspace(0,1,10):
for alpha_2 in np.linspace(0,1,10):
combinations.append(alpha_1 * a1 + alpha_2 * a2)
ax = plt.axes()
ax.set_xlim([0, 6])
ax.set_ylim([0, 6])
ax.scatter(*zip(*combinations))
origin = np.array([0,0])
ax.plot(*zip(origin, a1), color='red')
ax.plot(*zip(origin, a2), color='red')
plt.show()
Show and plot an example of $\left|I\right| = 1$
I = np.eye(2)
np.linalg.det(I)
combinations = []
for alpha_1 in np.linspace(0,1,10):
for alpha_2 in np.linspace(0,1,10):
combinations.append(alpha_1 * I[0] + alpha_2 * I[1])
ax = plt.axes()
ax.set_xlim([-1, 2])
ax.set_ylim([-1, 2])
ax.scatter(*zip(*combinations))
origin = np.array([0,0])
ax.plot(*zip(origin, I[0]), color='red')
ax.plot(*zip(origin, I[1]), color='red')
plt.show()
Example that if we multiply a single row in A by scalar t then new determinant is t|A|
A
t = 5
np.linalg.det(np.row_stack([A[0]*t, A[1]]))
t * np.linalg.det(A)
Example that if we exchange two rows in A then new determinant is -|A|
np.linalg.det(A)
B = np.roll(A, 1, axis=0)
B
np.linalg.det(B)
Example that if $A\in{\mathbb{R}^{n \times n}}, \left|A\right| = \left|A^T\right|$
np.linalg.det(A.T)
Example that if $A,B\in{\mathbb{R}^{n \times n}}, \left|AB\right| = \left|A\right|\left|B\right|$
B = np.random.rand(2,2)
B
np.linalg.det(A.dot(B))
np.linalg.det(A) * np.linalg.det(B)
Example that if A is singular, $|A| = 0$
A = np.array([[1,2],[2,4]])
A
np.linalg.det(A)
Example that if A is non-singular, $\left|A^{-1}\right| = \displaystyle\frac{1}{\left|A\right|}$
A = np.array([[1,2],[3,4]])
A
np.linalg.det(np.linalg.inv(A))
1 / np.linalg.det(A)
Let's write a recursive function that matches the general definition of the determinant:
$ \left|A\right| = \displaystyle\sum_{i=1}^{n}(-1)^{i+j}a_{ij}\left|A_{\backslash{i},\backslash{j}}\right| $
delete_ith_row_jth_column = lambda A, i, j: np.delete(np.delete(A, i, 0), j, 1)
delete_ith_row_jth_column(np.array([[1,2],[3,4]]), 0, 0)
def my_det(A):
if A.shape == (1,1):
return A[0][0]
else:
j = 0 # for clarity
det_sum = 0
for i in range(A.shape[0]):
det_sum += (-1) ** (i+j) * A[i][j] * my_det(delete_ith_row_jth_column(A, i, j))
return det_sum
A = np.array([[1,2,3],[4,5,6],[7,8,10]])
A
my_det(A)
np.linalg.det(A)
Let's write a function for the adjoint:
$(adj(A))_{ij} = (-1)^{(i+j)}\left|A_{\backslash{j},\backslash{i}}\right|$
def my_adj(A):
adj = np.zeros(A.shape)
for i in range(A.shape[0]):
for j in range(A.shape[0]):
adj[i,j] = (-1) ** (i + j) * np.linalg.det(delete_ith_row_jth_column(A, j, i))
return adj
A
my_adj(A)
Example of
$ A^{-1} = \displaystyle\frac{1}{|A|}adj(A) $
np.allclose(np.linalg.inv(A), 1/np.linalg.det(A) * my_adj(A))
Example of the quadratic form $x^TAx$
A = np.array([[1,2],[3,4]])
A
x = np.array([5,6])
x
np.dot(np.dot(x,A),x)
$A_{11}x_1x_1 + A_{12}x_1x_2 + A_{21}x_2x_1 + A_{22}x_2x_2$
1*5*5 + 2*5*6 + 3*6*5 + 4*6*6
Example of a symmetric matrix that is indefinite (because there exists $x1, x2\in{\mathbb{R}^n}$ such that $x^T_1Ax_1 > 0$ and $x^T_2Ax_2 < 0$)
A = np.array([[1,2],[2,1]])
A
np.allclose(A, A.T)
x = np.array([1,0])
np.dot(x,np.dot(A,x))
x = np.array([1,-1])
np.dot(x,np.dot(A,x))
Example of a positive definite matrix
A = np.array([[2,0],[0,2]])
A
since
$x^TAx = x_1(2x_1 + 0x_2) + x_2(0x_1 + 2x_2) = 2(x_1^2 + x_2^2)$
which is positive for all $x_1, x_2\in{\mathbb{R}}$ except $x = 0$
np.linalg.matrix_rank(A)
Example eigenvalue $\lambda$ and eigenvector $x$ of $A$
$ Ax = \lambda x $
A = np.array([[1,2],[3,4]])
A
eigenvalues, eigenvectors = np.linalg.eig(A)
eigenvalues, eigenvectors
lambd = eigenvalues[0]
lambd
x = eigenvectors[:,0]
x
np.linalg.norm(x)
np.dot(A, x)
lambd * x
$ (\lambda I - A)x = 0 $
np.dot(lambd * np.eye(2) - A, x)
$ |(\lambda I - A)| = 0 $
np.linalg.det(lambd * np.eye(2) - A)
my_det(lambd * np.eye(2) - A)
Example of an array whose eigenvalues and eigenvectors are complex
A = np.array([[1,2],[-3,4]])
A
np.linalg.eig(A)
The trace of a A is equal to the sum of its eigenvalues
$ trA = \displaystyle\sum_{i=1}^{n} \lambda_{i} $
A = np.array([[1,2],[3,4]])
A
eigenvalues, eigenvectors = np.linalg.eig(A)
eigenvalues, eigenvectors
np.trace(A)
np.sum(eigenvalues)
The determinant of A is equal to the product of its eigenvalues
$ |A| = \displaystyle\prod_{i=1}^{n} \lambda_{i} $
np.linalg.det(A)
np.prod(eigenvalues)
Suppose A is non-singular with eigenvalue λ and an associated eigenvector x. Then $1/\lambda$ is an eigenvalue of $A^{−1}$ with an associated eigenvector x, i.e.,
$ A^{-1} x = (1/\lambda)x $
np.dot(np.linalg.inv(A), x)
(1/lambd) * x
(To prove this, take the eigenvector equation, $Ax = \lambda x$ and left-multiply each side by $A^{−1}$.)
$ Ax = \lambda x $
$ A^{-1}Ax = A^{-1}\lambda x $
$ x = A^{-1}\lambda x $
$ (1/\lambda) x = A^{-1} x $
$ A^{-1} x = (1/\lambda) x $
The eigenvalues of a diagonal matrix $D = diag(d_1, . . . d_n)$ are just the diagonal entries $d_1,...d_n$.
D = np.diag([3,4,5])
D
np.linalg.eig(D)
Throughout this section, A is a symmetric real matrix.
A = np.array([[2,3],[3,2]])
A
np.array_equal(A, np.transpose(A))
np.linalg.eigvals(A)
There exists a set of eigenvectors $u_1, . . . , u_n$ such that a) for all $i, u_i$ is an eigenvector with eigenvalue $\lambda_i$ and b) $u_1, . . . , u_n$ are unit vectors and orthogonal to each other.
np.linalg.eig(A)[1]
u1, u2 = np.linalg.eig(A)[1].T
u1, u2
np.linalg.norm(u1)
np.linalg.norm(u2)
u1.dot(u2)
Example of $AU = U\Lambda$
U = np.linalg.eig(A)[1]
U
Lambda = np.diag(np.linalg.eig(A)[0])
Lambda
np.dot(A,U)
np.dot(U, Lambda)
Example of the diagonalization of A
$A = AUU^T = U\Lambda U^T$
np.dot(np.dot(U, Lambda), U.T)
Example of representing a vector x with respect to another basis U
U
x = np.array([6,7])
x
x_hat = np.dot(U.T, x)
x_hat
Example of computing Ax by $\Lambda\hat{x}$ and then converting back
Lambda
Lambda_x_hat = np.dot(Lambda, x_hat)
Lambda_x_hat
np.dot(U, Lambda_x_hat)
np.dot(A, x)
Let's calculate AAAx directly and by using the eigenvalues of A and converting back
np.dot(np.dot(np.dot(A,A),A), x)
Lambda
np.dot(U, np.array([5**3 * x_hat[0], -1**3 * x_hat[1]]))
np.dot(U, np.dot(Lambda ** 3, x_hat))
np.dot(A ** 3, x) # NOT equivalent
Let's see a graphical representation of an x and $\hat{x}$
To make this easier to see let's make U rotated 45 degrees and make x just a bit off the X axis
U = np.array([[np.sqrt(1/2), -np.sqrt(1/2)],[np.sqrt(1/2), np.sqrt(1/2)]])
U
x = np.array([2,0.3])
x
x_hat = np.dot(U, x)
x_hat
origin = [0,0]
plt.quiver(*origin, [0,1], [1,0], angles='xy', scale_units='xy', scale=1, color='blue', label='X and Y axes')
plt.quiver(*origin, x[0], x[1], angles='xy', scale_units='xy', scale=1, color='dodgerblue', label='x')
plt.quiver(*origin, U[0], U[1], angles='xy', scale_units='xy', scale=1, color='green', label='U axes')
plt.quiver(*origin, x_hat[0], x_hat[1], angles='xy', scale_units='xy', scale=1, color='limegreen', label='x hat')
plt.xlim(-1, 3)
plt.ylim(-1, 3)
plt.legend()
plt.show()
Let's plot the two column vectors of a 2x2 symmetric matrix and its two eigenvectors
A = np.array([[2,3],[3,2]])
A
U = np.linalg.eig(A)[1]
U
origin = [0,0]
plt.quiver(*origin, A[0], A[1], angles='xy', scale_units='xy', scale=1, color='blue', label='A')
plt.quiver(*origin, U[0], U[1], angles='xy', scale_units='xy', scale=1, color='green', label='U')
plt.xlim(-1, 3)
plt.ylim(-1, 3)
plt.legend()
plt.show()
Example of a non-symmetric matrix
A = np.array([[1,2],[3,4]])
A
U = np.linalg.eig(A)[1]
U
origin = [0,0]
plt.quiver(*origin, A[0], A[1], angles='xy', scale_units='xy', scale=1, color='blue', label='A')
plt.quiver(*origin, U[0], U[1], angles='xy', scale_units='xy', scale=1, color='green', label='U')
plt.xlim(-1, 4)
plt.ylim(-1, 4)
plt.legend()
plt.show()
Example where the eigenvectors and eigenvalues are complex
A = np.array([[-1,-2],[7,4]])
A
np.linalg.eig(A)
“Diagonalizing” quadratic form: compute $x^TAx$
A = np.array([[2,3],[3,2]])
A
x = np.array([3,4])
x
U = np.linalg.eig(A)[1]
U
lambda_1, lambda_2 = np.linalg.eig(A)[0]
lambda_1, lambda_2
x_hat = np.dot(np.transpose(U), x)
x_hat
np.dot(np.dot(np.transpose(x), A), x)
$ \displaystyle\sum_{i=1}^{n} \lambda_i\hat{x}_i^2 $
lambda_1 * x_hat[0] ** 2 + lambda_2 * x_hat[1] ** 2
To see that
$ max_{x\in{\mathbb{R}^n}} \space x^TAx \space subject \space to \space \|x\| = 1 $
is the largest eigenvalue, let's plot $x_1$ on the X axis and $x^TAx$ on the Y axis where $x_1$ is the first component of x and the second component is computed such that the norm of x is 1.
x2 = lambda x1: np.sqrt(1 - x1 ** 2)
x1_vals = np.linspace(-1,1,1000)
y_vals = []
for x1 in x1_vals:
x = np.array([x1, x2(x1)])
y_vals.append(np.dot(np.dot(np.transpose(x), A), x))
plt.plot(x1_vals, y_vals)
plt.show()
np.linalg.eig(A)
np.max(y_vals)
x1_vals[np.argmax(y_vals)], x2(x1_vals[np.argmax(y_vals)])
np.min(y_vals)
x1_vals[np.argmin(y_vals)], x2(x1_vals[np.argmin(y_vals)])
Let's write out an example using an example function of a vector:
$ x\in{\mathbb{R}^2}\\ f(x) = x_1^2 + x_2^2 + x_1x_2\\ \nabla_xf(x) = \begin{bmatrix} \frac{\partial{f(x)}}{\partial{x_1}} \\ \frac{\partial{f(x)}}{\partial{x_2}} \end{bmatrix} = \begin{bmatrix} 2x_1 + x_2\\ 2x_2 + x_1 \end{bmatrix} $
So if $ x = \begin{bmatrix} 3 \\ 4 \end{bmatrix} , \nabla_xf(x) = \begin{bmatrix} 10 \\ 11 \end{bmatrix} $
f = lambda x1, x2: x1**2 + x2**2 + x1*x2
epsilon = 0.01
(f(3+epsilon, 4) - f(3, 4)) / epsilon, (f(3, 4+epsilon) - f(3, 4)) / epsilon
x1, x2 = np.meshgrid(np.linspace(-5,5,10), np.linspace(-5,5,10))
ax = plt.axes(projection='3d')
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_zlabel('f')
ax.plot_surface(x1, x2, f(x1, x2), color='blue', alpha=0.5)
ax.scatter([3], [4], [f(3,4)], color='black')
ax.plot([3-2, 3+2], [4, 4], [f(3,4)-2*6, f(3,4)+2*6], color='red')
ax.plot([3, 3], [4-2, 4+2], [f(3,4)-2*8, f(3,4)+2*8], color='green')
plt.show()
Let's write an example using the same function as above
$ x\in{\mathbb{R}^2}\\ f(x) = x_1^2 + x_2^2 + x_1x_2\\ \nabla_xf(x) = \begin{bmatrix} \frac{\partial{f(x)}}{\partial{x_1}} \\ \frac{\partial{f(x)}}{\partial{x_2}} \end{bmatrix} = \begin{bmatrix} 2x_1 + x_2\\ 2x_2 + x_1 \end{bmatrix}\\ \nabla_x^2f(x) = \begin{bmatrix} \frac{\partial^2{f(x)}}{\partial{x_1}\partial{x_1}} \frac{\partial^2{f(x)}}{\partial{x_1}\partial{x_2}} \\ \frac{\partial^2{f(x)}}{\partial{x_2}\partial{x_1}} \frac{\partial^2{f(x)}}{\partial{x_2}\partial{x_2}} \end{bmatrix} = \begin{bmatrix} 2 & 1\\ 1 & 2 \end{bmatrix} $
Let's try an example of the quadratic function $f(x) = x^TAx$ for $A\in{\mathbb{S}^2}$
$\nabla_xX^TAx = 2Ax$
A = np.array([[2,3],[3,2]])
A
f = lambda x: np.dot(np.dot(x, A), x)
x = np.array([3,4])
gradient = 2 * np.dot(A, x)
gradient
epsilon = 0.01
(f(x + np.array([epsilon, 0])) - f(x)) / epsilon, (f(x + np.array([0, epsilon])) - f(x)) / epsilon
x1, x2 = np.meshgrid(np.linspace(-5, 5, 10), np.linspace(-5, 5, 10))
ax = plt.axes(projection='3d')
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_zlabel('f')
ax.plot_surface(x1, x2, np.apply_along_axis(f, 0, np.stack([x1, x2])), color='blue', alpha=0.5)
ax.scatter(*x, [f(x)], color='black')
ax.plot([x[0]-2, x[0]+2], [x[1], x[1]], [f(x)-2*gradient[0], f(x)+2*gradient[0]], color='red')
ax.plot([x[0], x[0]], [x[1]-2, x[1]+2], [f(x)-2*gradient[1], f(x)+2*gradient[1]], color='green')
plt.show()
Using the second projection example we did above in 3.9, let's calculate the x that minimizes $\|Ax - b\|_2^2$ with $(A^TA)^{-1}A^Tb$ just as we did originally, and also estimate x by searching.
A = np.array([[1 , 0], [0, 1], [0, 0.5]])
A
b = np.array([1,1,1])
b
x = np.dot(np.dot(np.linalg.inv(np.dot(A.T, A)), A.T), b)
x
np.dot(A, x)
xs = np.array(np.meshgrid(np.linspace(-2, 2, 100), np.linspace(-2, 2, 100))).T.reshape(-1, 2)
distances = np.apply_along_axis(lambda x: np.linalg.norm(np.dot(A, x) - b), 1, xs)
x = xs[np.argmin(distances)]
x
np.dot(A, x)
Compute the gradient of $\|Ax - b\|$ showing one interim step not shown in the review:
$ \nabla_x(x^TA^Tx - 2b^TAx + b^Tb = \\ \nabla_xx^TA^TAx - \nabla_x2b^TAx + \nabla_xb^Tb = \\ \nabla_xx^T(A^TA)x - 2\nabla_x(A^Tb)^Tx + \nabla_xb^Tb = \\ 2A^TAx - 2A^Tb + 0 = \\ 2A^TAx - 2A^Tb $
Let's try an example where we're trying to fit a line to 5 points.
A = np.random.rand(5,1)
A
# b is 2A with random noise added in
b = A * 2 + (np.random.rand(5,1) - 0.5)/2
b
x = np.dot(np.dot(np.linalg.inv(np.dot(A.T, A)), A.T), b)
x
plt.scatter(A, b)
plt.plot([0, 1], [0, x[0]], color='green')
plt.show()
Let's go through an example showing $\nabla_A|A| = |A|A^{-T}$ when $A\in{\mathbb{R}^{n\ \times\ n}}$
A = np.random.rand(2,2)
A
f = lambda A: np.linalg.det(A) # for convenience
epsilon = 0.0001
np.array(
[
[
f(A + np.array([[epsilon, 0], [0, 0]])) - f(A),
f(A + np.array([[0, epsilon], [0, 0]])) - f(A),
],
[
f(A + np.array([[0, 0], [epsilon, 0]])) - f(A),
f(A + np.array([[0, 0], [0, epsilon]])) - f(A),
]
]) / epsilon
np.dot(f(A), np.linalg.inv(A).T)
Now let's go through an example showing $\nabla_Alog|A| = A^{-1}$ when $A\in{\mathbb{S}^n_{++}}$
A = np.array([[3,2],[2,3]])
A
f = lambda A: np.log(np.linalg.det(A))
epsilon = 0.0001
np.array(
[
[
f(A + np.array([[epsilon, 0], [0, 0]])) - f(A),
f(A + np.array([[0, epsilon], [0, 0]])) - f(A),
],
[
f(A + np.array([[0, 0], [epsilon, 0]])) - f(A),
f(A + np.array([[0, 0], [0, epsilon]])) - f(A),
]
]) / epsilon
np.linalg.inv(A)