Linear Algebra Review¶

This notebook goes with cs229 Linear Algebra Review and Reference

I put this notebook together as I read the document linked above. I had three goals...

learn/review by playing with examples using real numbers
improve my fluency at translating between math notation and numpy
learn and practice LaTeX

-Eric Silberstein / Klaviyo

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
%matplotlib inline

1 Basic Concepts and Notation¶

1.1 Basic Notation¶

Let's create an example matrix $A\in{\mathbb{R}^{3 \times 2}}$

A = np.arange(6).reshape(3,2)
A

array([[0, 1],
       [2, 3],
       [4, 5]])

$a_{22}$, the entry at the 2nd row and 2nd column

A[1,1]

3

$a^1$, the first column (alternative notation: $a_1$)

A[:,0]

array([0, 2, 4])

$a^T_2$, the second row (alternative notation: $\bar{a}_1$)

2 Matrix Multiplication¶

2.1 Vector-Vector Products¶

Given two example vectors $x, y\in{\mathbb{R}^n}$, let's compute the dot product $x^Ty$

x = np.array([2,3])
x

array([2, 3])

y = np.array([4,5])
y

array([4, 5])

np.dot(x.T, y), np.dot(y.T, x), np.dot(x, y), np.dot(y, x)

(23, 23, 23, 23)

x[0]*y[0] + x[1]*y[1]

23

Now let's compute the outer product $xy^T$

np.outer(x, y)

array([[ 8, 10],
       [12, 15]])

np.array([
    [x[0]*y[0], x[0]*y[1]],
    [x[1]*y[0], x[1]*y[1]],
])

array([[ 8, 10],
       [12, 15]])

x = x.reshape(2,1)
x

array([[2],
       [3]])

y = y.reshape(2,1)
y

array([[4],
       [5]])

np.dot(x,y.T)

array([[ 8, 10],
       [12, 15]])

Outer product example, use $x\boldsymbol{1}^T$ where $\boldsymbol{1}\in{\mathbb{R}^3}$ to compute a matrix where each column is x

np.outer(x, np.ones(3))

array([[2., 2., 2.],
       [3., 3., 3.]])

2.2 Matrix-Vector Products¶

A = np.arange(4).reshape(2,2) + 2
A

array([[2, 3],
       [4, 5]])

x = np.array([6,7])
x

array([6, 7])

Let numpy calculate $Ax$

np.dot(A,x)

array([33, 59])

Calculate $Ax$ element-wise

np.array([A[0,0] * x[0] + A[0,1] * x[1], A[1,0] * x[0] + A[1,1] * x[1]])

array([33, 59])

Calculate $Ax$ by $ \begin{bmatrix} a^T_1x \\ a^T_2x \\ \end{bmatrix} $

np.array([np.dot(A[0].T, x), np.dot(A[1].T, x)])

array([33, 59])

Calculate $Ax$ by $ \begin{bmatrix} a^1 \end{bmatrix} x_1 + \begin{bmatrix} a^2 \end{bmatrix} x_2 $

np.dot(A[:,0], x[0]) + np.dot(A[:,1], x[1])

array([33, 59])

2.3 Matrix-Matrix Products¶

A = np.arange(9).reshape(3,3)
A

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

B = A * 2
B

array([[ 0,  2,  4],
       [ 6,  8, 10],
       [12, 14, 16]])

Calculate AB using numpy

A.dot(B)

array([[ 30,  36,  42],
       [ 84, 108, 132],
       [138, 180, 222]])

Calculate AB thinking of A as row vectors and B as column vectors

$ \begin{bmatrix} a^T_1b^1 & a^T_1b^2 & a^T_1b^3 \\ a^T_2b^1 & a^T_2b^2 & a^T_2b^3 \\ a^T_3b^1 & a^T_3b^2 & a^T_3b^3 \\ \end{bmatrix} $

np.array(
    [
        [np.dot(A[0], B[:,0]), np.dot(A[0], B[:,1]), np.dot(A[0], B[:,2])],
        [np.dot(A[1], B[:,0]), np.dot(A[1], B[:,1]), np.dot(A[1], B[:,2])],
        [np.dot(A[2], B[:,0]), np.dot(A[2], B[:,1]), np.dot(A[2], B[:,2])],
    ]
)

array([[ 30,  36,  42],
       [ 84, 108, 132],
       [138, 180, 222]])

Calculate AB thinking of A as column vectors and B as row vectors

$ \displaystyle\sum_{i=1}^{n}a^ib^T_i $

np.outer(A[:,0], B[0]) + np.outer(A[:,1], B[1]) + np.outer(A[:,2], B[2])

array([[ 30,  36,  42],
       [ 84, 108, 132],
       [138, 180, 222]])

Calculate AB thinking of A as a matrix and B as column vectors

$ \begin{bmatrix} Ab^1 & Ab^2 & Ab^3 \end{bmatrix} $

np.column_stack([np.dot(A, B[:,0]), np.dot(A, B[:,1]), np.dot(A, B[:,2])])

array([[ 30,  36,  42],
       [ 84, 108, 132],
       [138, 180, 222]])

Calculate AB thinking of A as row vectors and B as a matrix

$ \begin{bmatrix} a^T_1B \\ a^T_2B \\ a^T_3B \end{bmatrix} $

np.stack([np.dot(A[0], B), np.dot(A[1], B), np.dot(A[2], B)])

array([[ 30,  36,  42],
       [ 84, 108, 132],
       [138, 180, 222]])

Example of matrix multiplication being associative: $(AB)C = A(BC)$

C = np.random.rand(3,2)
C

array([[0.48016108, 0.78511536],
       [0.14709867, 0.23049283],
       [0.63444509, 0.77271909]])

np.allclose(np.dot(np.dot(A, B), C), np.dot(A, np.dot(B, C)))

True

Example of matrix multiplication being distributive: $A(B + C) = AB + AC$

C = np.random.rand(3,3)

np.allclose(np.dot(A, B + C), np.dot(A, B) + np.dot(A, C))

True

Matrix multiplication is not commutative: $AB \ne $BA

np.allclose(np.dot(A,B), np.dot(B,A))

True

Example of multiplication of non-square matrices:

np.dot(np.random.rand(2,3), np.random.rand(3,4))

array([[0.57285884, 0.45617738, 0.17109287, 0.227468  ],
       [1.0131327 , 0.57271021, 0.33625262, 0.73263324]])

3 Operations and Properties¶

3.1 The Identity Matrix and Diagonal Matrices¶

Identity matrix example

I = np.eye(3,3)
I

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

Example of $AI = A = IA$

A

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

np.dot(A, I)

array([[0., 1., 2.],
       [3., 4., 5.],
       [6., 7., 8.]])

np.dot(I, A)

array([[0., 1., 2.],
       [3., 4., 5.],
       [6., 7., 8.]])

Diagonal matrix example

np.diag([3,4,5])

array([[3, 0, 0],
       [0, 4, 0],
       [0, 0, 5]])

3.2 The Transpose¶

A

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

A.T

array([[0, 3, 6],
       [1, 4, 7],
       [2, 5, 8]])

$(A^T)^T = A$

A.T.T

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

$(AB)^T = B^TA^T$

np.dot(A, B).T

array([[ 30,  84, 138],
       [ 36, 108, 180],
       [ 42, 132, 222]])

np.dot(B.T, A.T)

array([[ 30,  84, 138],
       [ 36, 108, 180],
       [ 42, 132, 222]])

$(A + B)^T = A^T + B^T$

(A + B).T

array([[ 0,  9, 18],
       [ 3, 12, 21],
       [ 6, 15, 24]])

A.T + B.T

array([[ 0,  9, 18],
       [ 3, 12, 21],
       [ 6, 15, 24]])

3.3 Symmetric Matrices¶

Example of a symmetric matrix

S = np.array([[1,2,3],[2,4,5],[3,5,6]])
S

array([[1, 2, 3],
       [2, 4, 5],
       [3, 5, 6]])

S.T

array([[1, 2, 3],
       [2, 4, 5],
       [3, 5, 6]])

Example of an anti-symmetric matrix

AS = np.array([[0,2,3],[-2,0,5],[-3,-5,0]])
AS

array([[ 0,  2,  3],
       [-2,  0,  5],
       [-3, -5,  0]])

-AS.T

array([[ 0,  2,  3],
       [-2,  0,  5],
       [-3, -5,  0]])

Form a symmetric matrix by $A + A^T, A\in{\mathbb{R}^{n \times n}}$

S = A + A.T
S

array([[ 0,  4,  8],
       [ 4,  8, 12],
       [ 8, 12, 16]])

np.allclose(S, S.T)

True

Form an anti-symmetric matrix by $A - A^T$

AS = A - A.T
AS

array([[ 0, -2, -4],
       [ 2,  0, -2],
       [ 4,  2,  0]])

np.allclose(AS, -AS.T)

True

Example of $A\in{\mathbb{R}^{n \times n}}$ represented as a sum of a symmetric matrix and an anti-symmetric matrix

$A = \frac{1}{2}(A + A^T) + \frac{1}{2}(A - A^T)$

np.allclose(A, S/2 + AS/2)

True

3.4 The Trace¶

Sum of the diagonal elements

$ trA = \displaystyle\sum_{i=1}^nA_{ii} $

A

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

np.trace(A)

12

Example of $trAB = trBA$

np.trace(np.dot(A, B))

360

np.trace(np.dot(B, A))

360

3.5 Norms¶

Euclidian ($l_2$) norm of a vector

$ \|x\|_2 = \sqrt{\displaystyle\sum_{i=1}^{n}x_i^2} $

x = np.arange(3)
x

array([0, 1, 2])

np.linalg.norm(x, ord=2)

2.23606797749979

np.sqrt(np.sum(x ** 2))

2.23606797749979

$l_1$ norm of a vector

$ \|x\|_1 = \sqrt{\displaystyle\sum_{i=1}^{n}\left|x_i\right|} $

np.linalg.norm(x, ord=1)

3.0

np.sum(np.abs(x))

3

Frobenius norm of a matrix

$ \|A\|_F = \sqrt{\displaystyle\sum_{i = 1}^m\displaystyle\sum_{j = 1}^na_{ij}^2} $

np.linalg.norm(A, ord='fro')

14.2828568570857

np.sqrt(np.sum(A ** 2))

14.2828568570857

3.6 Linear Independence and Rank¶

Let's look at a matrix with 3 columns and rank 2

A

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

np.linalg.matrix_rank(A)

2

$-a^1 + 2a^2 = a^3$

-A[:,0] + 2 * A[:,1]

array([2, 5, 8])

$-a^T_1 + 2a^T_2 = a^T_3$

-A[0] + 2 * A[1]

array([6, 7, 8])

Let's look at a full rank matrix

R = np.random.rand(3, 3)
R

array([[0.82830024, 0.46109313, 0.27433051],
       [0.85200938, 0.26364719, 0.32397499],
       [0.31054344, 0.50805488, 0.26092114]])

np.linalg.matrix_rank(R)

3

3.7 The Inverse of a Square Matrix¶

A is not full rank so it's not invertible

A = np.arange(9).reshape(3,3)
A

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

np.linalg.matrix_rank(A)

2

try:
    np.linalg.inv(A)
except np.linalg.LinAlgError as err:
    print(f"Can't invert A due to {err}")

Can't invert A due to Singular matrix

Let's create an A that is invertible

A = np.array([[0,1,2], [3,4,5], [6,7,9]])
A

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 9]])

np.linalg.matrix_rank(A)

3

np.linalg.inv(A)

array([[-0.33333333, -1.66666667,  1.        ],
       [-1.        ,  4.        , -2.        ],
       [ 1.        , -2.        ,  1.        ]])

$A^{-1}A = I = AA^{-1}$

np.allclose(np.dot(np.linalg.inv(A), A), np.eye(3))

True

np.allclose(np.dot(A, np.linalg.inv(A)), np.eye(3))

True

$(A^{-1})^{-1} = A$

np.linalg.inv(np.linalg.inv(A))

array([[-0.,  1.,  2.],
       [ 3.,  4.,  5.],
       [ 6.,  7.,  9.]])

$(AB)^{-1} = B^{-1}A^{-1}$

B = np.random.rand(3, 3)
B

array([[0.84735242, 0.0950299 , 0.23436798],
       [0.00574613, 0.2061367 , 0.87375284],
       [0.75017117, 0.32154513, 0.74314387]])

np.linalg.matrix_rank(B)

3

np.allclose(np.linalg.inv(np.dot(A, B)), np.dot(np.linalg.inv(B), np.linalg.inv(A)))

True

$(A^{-1})^T = (A^T)^{-1}$

np.linalg.inv(A).T

array([[-0.33333333, -1.        ,  1.        ],
       [-1.66666667,  4.        , -2.        ],
       [ 1.        , -2.        ,  1.        ]])

np.linalg.inv(A.T)

array([[-0.33333333, -1.        ,  1.        ],
       [-1.66666667,  4.        , -2.        ],
       [ 1.        , -2.        ,  1.        ]])

Example of using the inverse to solve a system of equations

A = np.array([[1,2],[3,4]])
A

array([[1, 2],
       [3, 4]])

x = np.array([5,6])
x

array([5, 6])

b = np.dot(A, x)
b

array([17, 39])

$ A\in{\mathbb{R}^{2 \times 2}} \\ x,b\in{\mathbb{R}^2} $

Say we know A and b and want to solve for x in $Ax = b$, we can hand calculate x by...

$ x_1 + 2x_2 = 17 \\ 3x_1 + 4x_2 = 39 $

$ x_1 = 5 \\ x_2 = 6 $

...or we can do x = $A^{-1}b$

np.dot(np.linalg.inv(A), b)

array([5., 6.])

3.8 Orthogonal Matrices¶

Example of orthogonal vectors

x = np.array([1, 1])
x

array([1, 1])

y = np.array([1, -1])
y

array([ 1, -1])

np.dot(x, y)

0

Form an orthogonal matrix from x and y

x = x / np.linalg.norm(x)
x

array([0.70710678, 0.70710678])

y = y / np.linalg.norm(y)
y

array([ 0.70710678, -0.70710678])

$\|x\|_2 = 1$ and $\|y\|_2 = 1$ (x and y are normalized)

np.linalg.norm(x), np.linalg.norm(y)

(0.9999999999999999, 0.9999999999999999)

U = np.column_stack((x, y))
U

array([[ 0.70710678,  0.70710678],
       [ 0.70710678, -0.70710678]])

$U^TU = I = UU^T$

np.dot(U.T, U)

array([[1., 0.],
       [0., 1.]])

np.dot(U, U.T)

array([[1., 0.],
       [0., 1.]])

Operating on a vector with an orthogonal matrix will not change its Euclidean norm

$ \|Uz\|_2 = \|z\|_2$ for $z\in{\mathbb{R}^n}, U\in{\mathbb{R}^{n \times n}}$ orthogonal

z = np.random.rand(2)
z

array([0.80175368, 0.52946705])

np.linalg.norm(z)

0.9608039912644135

np.linalg.norm(np.dot(U, z))

0.9608039912644135

3.9 Range and Nullspace of a Matrix¶

Let's do a projection example that we visualize in 3D as projecting vector y to the XY plane.

y = np.array([1,1,1])
y

array([1, 1, 1])

a1 = np.array([1,0,0]) # x axis
a2 = np.array([0,1,0]) # y axis
A = np.column_stack((a1, a2))
A

array([[1, 0],
       [0, 1],
       [0, 0]])

$ Proj(y; A) = A(A^TA)^{−1}A^Ty $

y_projection = np.dot(np.dot(np.dot(A,np.linalg.inv(np.dot(A.T, A))), A.T), y)
y_projection

array([1., 1., 0.])

ax = plt.axes(projection='3d')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
ax.set_xlim([0, 1])
ax.set_ylim([0, 1])
ax.set_zlim([0, 1])
ax.set_xticks([0,1])
ax.set_yticks([0,1])
ax.set_zticks([0,1])
origin = [0,0,0]
ax.quiver(*origin, *y, color='blue', label='y')
ax.quiver(*origin, *a1, color='green', label='a1')
ax.quiver(*origin, *a2, color='green', label='a2')
ax.quiver(*origin, *y_projection, color='red', label='projection')
ax.legend()
ax.legend(loc='upper left')

<matplotlib.legend.Legend at 0x111a18f98>

Now let's project y onto a different plane

a1 = np.array([1,0,0])
a2 = np.array([0,1,0.5])
A = np.column_stack((a1, a2))
A

array([[1. , 0. ],
       [0. , 1. ],
       [0. , 0.5]])

y_projection = np.dot(np.dot(np.dot(A,np.linalg.inv(np.dot(A.T, A))), A.T), y)
y_projection

array([1. , 1.2, 0.6])

ax = plt.axes(projection='3d')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
ax.set_xlim([0, 1])
ax.set_ylim([0, 1])
ax.set_zlim([0, 1])
ax.set_xticks([0,0.5,1])
ax.set_yticks([0,0.5,1])
ax.set_zticks([0,0.5,1])
origin = [0,0,0]
ax.quiver(*origin, *y, color='blue', label='y')
ax.quiver(*origin, *a1, color='green', label='a1')
ax.quiver(*origin, *a2, color='green', label='a2')
ax.plot_surface(
    np.column_stack([np.zeros(2), A[0]]),
    np.column_stack([np.zeros(2), A[1]]),
    np.column_stack([np.zeros(2), A[2]]),
    color='green',
    alpha=0.3)
ax.quiver(*origin, *y_projection, color='red', label='projection')
ax.legend()
ax.legend(loc='upper left')
plt.show()

Example of when A contains only a single column, this gives the special case for a projection of a vector on to a line:

$ Proj(y;a) = \displaystyle\frac{aa^T}{a^Ta}y $

a = np.array([1, 0])
a

array([1, 0])

y = np.array([.5, .5])
y

array([0.5, 0.5])

y_projection = np.dot(np.outer(a, a) / np.dot(a, a), y)
y_projection

array([0.5, 0. ])

origin = np.array([0,0])
plt.plot(*zip(origin, y), color='blue', label='y')
plt.plot(*zip(origin, a), color='green', label='a')
plt.plot(*zip(origin, y_projection), color='red', alpha=0.8, label='projection')
plt.legend()
plt.show()

Another example

a = np.array([1, 1])
a

array([1, 1])

y = np.array([.5, .2])
y

array([0.5, 0.2])

y_projection = np.dot(np.outer(a, a) / np.dot(a, a), y)
y_projection

array([0.35, 0.35])

origin = np.array([0,0])
plt.plot(*zip(origin, y), color='blue', label='y')
plt.plot(*zip(origin, a), color='green', label='a')
plt.plot(*zip(origin, y_projection), color='red', alpha=0.8, label='projection')
plt.legend()
plt.show()

Example of a vector in the nullspace

A = np.array([[1,0,0],[0,1,0],[0,0,0]])
A

array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 0]])

x = np.array([0,0,5])
x

array([0, 0, 5])

np.dot(A, x)

array([0, 0, 0])

3.10 The Determinant¶

A = np.array([[1,3],[3,2]])
A

array([[1, 3],
       [3, 2]])

np.linalg.det(A)

-7.000000000000001

Let's plot some of the linear combinations of the row vectors with coefficients $0 \leq \alpha_1 \leq 1$ and $0 \leq \alpha_2 \leq 1$

a1 = A[0]
a2 = A[1]
a1, a2

(array([1, 3]), array([3, 2]))

combinations = []
for alpha_1 in np.linspace(0,1,10):
    for alpha_2 in np.linspace(0,1,10):
        combinations.append(alpha_1 * a1 + alpha_2 * a2)
ax = plt.axes()
ax.set_xlim([0, 6])
ax.set_ylim([0, 6])
ax.scatter(*zip(*combinations))
origin = np.array([0,0])
ax.plot(*zip(origin, a1), color='red')
ax.plot(*zip(origin, a2), color='red')
plt.show()

Show and plot an example of $\left|I\right| = 1$

I = np.eye(2)
np.linalg.det(I)

1.0

combinations = []
for alpha_1 in np.linspace(0,1,10):
    for alpha_2 in np.linspace(0,1,10):
        combinations.append(alpha_1 * I[0] + alpha_2 * I[1])
ax = plt.axes()
ax.set_xlim([-1, 2])
ax.set_ylim([-1, 2])
ax.scatter(*zip(*combinations))
origin = np.array([0,0])
ax.plot(*zip(origin, I[0]), color='red')
ax.plot(*zip(origin, I[1]), color='red')
plt.show()

Example that if we multiply a single row in A by scalar t then new determinant is t|A|

A

array([[1, 3],
       [3, 2]])

t = 5

np.linalg.det(np.row_stack([A[0]*t, A[1]]))

-35.00000000000001

t * np.linalg.det(A)

-35.00000000000001

Example that if we exchange two rows in A then new determinant is -|A|

np.linalg.det(A)

-7.000000000000001

B = np.roll(A, 1, axis=0)
B

array([[3, 2],
       [1, 3]])

np.linalg.det(B)

7.000000000000001

Example that if $A\in{\mathbb{R}^{n \times n}}, \left|A\right| = \left|A^T\right|$

np.linalg.det(A.T)

-7.000000000000001

Example that if $A,B\in{\mathbb{R}^{n \times n}}, \left|AB\right| = \left|A\right|\left|B\right|$

B = np.random.rand(2,2)
B

array([[0.54705161, 0.53583754],
       [0.19440937, 0.28350404]])

np.linalg.det(A.dot(B))

-0.3564365288548418

np.linalg.det(A) * np.linalg.det(B)

-0.35643652885484195

Example that if A is singular, $|A| = 0$

A = np.array([[1,2],[2,4]])
A

array([[1, 2],
       [2, 4]])

np.linalg.det(A)

0.0

Example that if A is non-singular, $\left|A^{-1}\right| = \displaystyle\frac{1}{\left|A\right|}$

A = np.array([[1,2],[3,4]])
A

array([[1, 2],
       [3, 4]])

np.linalg.det(np.linalg.inv(A))

-0.4999999999999998

1 / np.linalg.det(A)

-0.4999999999999999

Let's write a recursive function that matches the general definition of the determinant:

$ \left|A\right| = \displaystyle\sum_{i=1}^{n}(-1)^{i+j}a_{ij}\left|A_{\backslash{i},\backslash{j}}\right| $

delete_ith_row_jth_column = lambda A, i, j: np.delete(np.delete(A, i, 0), j, 1)

delete_ith_row_jth_column(np.array([[1,2],[3,4]]), 0, 0)

array([[4]])

def my_det(A):
    if A.shape == (1,1):
        return A[0][0]
    else:
        j = 0 # for clarity
        det_sum = 0
        for i in range(A.shape[0]):
            det_sum += (-1) ** (i+j) * A[i][j] * my_det(delete_ith_row_jth_column(A, i, j))
        return det_sum

A = np.array([[1,2,3],[4,5,6],[7,8,10]])
A

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8, 10]])

my_det(A)

-3

np.linalg.det(A)

-3.000000000000001

Let's write a function for the adjoint:

$(adj(A))_{ij} = (-1)^{(i+j)}\left|A_{\backslash{j},\backslash{i}}\right|$

def my_adj(A):
    adj = np.zeros(A.shape)
    for i in range(A.shape[0]):
        for j in range(A.shape[0]):
            adj[i,j] = (-1) ** (i + j) * np.linalg.det(delete_ith_row_jth_column(A, j, i))
    return adj

A

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8, 10]])

my_adj(A)

array([[  2.,   4.,  -3.],
       [  2., -11.,   6.],
       [ -3.,   6.,  -3.]])

Example of

$ A^{-1} = \displaystyle\frac{1}{|A|}adj(A) $

np.allclose(np.linalg.inv(A), 1/np.linalg.det(A) * my_adj(A))

True

3.11 Quadratic Forms and Positive Semidefinite Matrices¶

Example of the quadratic form $x^TAx$

A = np.array([[1,2],[3,4]])
A

array([[1, 2],
       [3, 4]])

x = np.array([5,6])
x

array([5, 6])

np.dot(np.dot(x,A),x)

319

$A_{11}x_1x_1 + A_{12}x_1x_2 + A_{21}x_2x_1 + A_{22}x_2x_2$

1*5*5 + 2*5*6 + 3*6*5 + 4*6*6

319

Example of a symmetric matrix that is indefinite (because there exists $x1, x2\in{\mathbb{R}^n}$ such that $x^T_1Ax_1 > 0$ and $x^T_2Ax_2 < 0$)

A = np.array([[1,2],[2,1]])
A

array([[1, 2],
       [2, 1]])

np.allclose(A, A.T)

True

x = np.array([1,0])
np.dot(x,np.dot(A,x))

1

x = np.array([1,-1])
np.dot(x,np.dot(A,x))

-2

Example of a positive definite matrix

A = np.array([[2,0],[0,2]])
A

array([[2, 0],
       [0, 2]])

since

$x^TAx = x_1(2x_1 + 0x_2) + x_2(0x_1 + 2x_2) = 2(x_1^2 + x_2^2)$

which is positive for all $x_1, x_2\in{\mathbb{R}}$ except $x = 0$

np.linalg.matrix_rank(A)

2

3.12 Eigenvalues and Eigenvectors¶

Example eigenvalue $\lambda$ and eigenvector $x$ of $A$

$ Ax = \lambda x $

A = np.array([[1,2],[3,4]])
A

array([[1, 2],
       [3, 4]])

eigenvalues, eigenvectors = np.linalg.eig(A)
eigenvalues, eigenvectors

(array([-0.37228132,  5.37228132]), array([[-0.82456484, -0.41597356],
        [ 0.56576746, -0.90937671]]))

lambd = eigenvalues[0]
lambd

-0.3722813232690143

x = eigenvectors[:,0]
x

array([-0.82456484,  0.56576746])

np.linalg.norm(x)

1.0

np.dot(A, x)

array([ 0.30697009, -0.21062466])

lambd * x

array([ 0.30697009, -0.21062466])

$ (\lambda I - A)x = 0 $

np.dot(lambd * np.eye(2) - A, x)

array([0., 0.])

$ |(\lambda I - A)| = 0 $

np.linalg.det(lambd * np.eye(2) - A)

-1.250741480074576e-16

my_det(lambd * np.eye(2) - A)

0.0

Example of an array whose eigenvalues and eigenvectors are complex

A = np.array([[1,2],[-3,4]])
A

array([[ 1,  2],
       [-3,  4]])

np.linalg.eig(A)

(array([2.5+1.93649167j, 2.5-1.93649167j]),
 array([[0.38729833-0.5j, 0.38729833+0.5j],
        [0.77459667+0.j , 0.77459667-0.j ]]))

The trace of a A is equal to the sum of its eigenvalues

$ trA = \displaystyle\sum_{i=1}^{n} \lambda_{i} $

A = np.array([[1,2],[3,4]])
A

array([[1, 2],
       [3, 4]])

eigenvalues, eigenvectors = np.linalg.eig(A)
eigenvalues, eigenvectors

(array([-0.37228132,  5.37228132]), array([[-0.82456484, -0.41597356],
        [ 0.56576746, -0.90937671]]))

np.trace(A)

5

np.sum(eigenvalues)

5.0

The determinant of A is equal to the product of its eigenvalues

$ |A| = \displaystyle\prod_{i=1}^{n} \lambda_{i} $

np.linalg.det(A)

-2.0000000000000004

np.prod(eigenvalues)

-1.9999999999999998

Suppose A is non-singular with eigenvalue λ and an associated eigenvector x. Then $1/\lambda$ is an eigenvalue of $A^{−1}$ with an associated eigenvector x, i.e.,

$ A^{-1} x = (1/\lambda)x $

np.dot(np.linalg.inv(A), x)

array([ 2.21489715, -1.51973099])

(1/lambd) * x

array([ 2.21489715, -1.51973099])

(To prove this, take the eigenvector equation, $Ax = \lambda x$ and left-multiply each side by $A^{−1}$.)

$ Ax = \lambda x $

$ A^{-1}Ax = A^{-1}\lambda x $

$ x = A^{-1}\lambda x $

$ (1/\lambda) x = A^{-1} x $

$ A^{-1} x = (1/\lambda) x $

The eigenvalues of a diagonal matrix $D = diag(d_1, . . . d_n)$ are just the diagonal entries $d_1,...d_n$.

D = np.diag([3,4,5])
D

array([[3, 0, 0],
       [0, 4, 0],
       [0, 0, 5]])

np.linalg.eig(D)

(array([3., 4., 5.]), array([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]]))

3.13 Eigenvalues and Eigenvectors of Symmetric Matrices¶

Throughout this section, A is a symmetric real matrix.

A = np.array([[2,3],[3,2]])
A

array([[2, 3],
       [3, 2]])

np.array_equal(A, np.transpose(A))

True

np.linalg.eigvals(A)

array([ 5., -1.])

There exists a set of eigenvectors $u_1, . . . , u_n$ such that a) for all $i, u_i$ is an eigenvector with eigenvalue $\lambda_i$ and b) $u_1, . . . , u_n$ are unit vectors and orthogonal to each other.

np.linalg.eig(A)[1]

array([[ 0.70710678, -0.70710678],
       [ 0.70710678,  0.70710678]])

u1, u2 = np.linalg.eig(A)[1].T
u1, u2

(array([0.70710678, 0.70710678]), array([-0.70710678,  0.70710678]))

np.linalg.norm(u1)

1.0

np.linalg.norm(u2)

1.0

u1.dot(u2)

0.0

Example of $AU = U\Lambda$

U = np.linalg.eig(A)[1]
U

array([[ 0.70710678, -0.70710678],
       [ 0.70710678,  0.70710678]])

Lambda = np.diag(np.linalg.eig(A)[0])
Lambda

array([[ 5.,  0.],
       [ 0., -1.]])

np.dot(A,U)

array([[ 3.53553391,  0.70710678],
       [ 3.53553391, -0.70710678]])

np.dot(U, Lambda)

array([[ 3.53553391,  0.70710678],
       [ 3.53553391, -0.70710678]])

Example of the diagonalization of A

$A = AUU^T = U\Lambda U^T$

np.dot(np.dot(U, Lambda), U.T)

array([[2., 3.],
       [3., 2.]])

Example of representing a vector x with respect to another basis U

U

array([[ 0.70710678, -0.70710678],
       [ 0.70710678,  0.70710678]])

x = np.array([6,7])
x

array([6, 7])

x_hat = np.dot(U.T, x)
x_hat

array([9.19238816, 0.70710678])

Example of computing Ax by $\Lambda\hat{x}$ and then converting back

Lambda

array([[ 5.,  0.],
       [ 0., -1.]])

Lambda_x_hat = np.dot(Lambda, x_hat)
Lambda_x_hat

array([45.96194078, -0.70710678])

np.dot(U, Lambda_x_hat)

array([33., 32.])

np.dot(A, x)

array([33, 32])

Let's calculate AAAx directly and by using the eigenvalues of A and converting back

np.dot(np.dot(np.dot(A,A),A), x)

array([813, 812])

Lambda

array([[ 5.,  0.],
       [ 0., -1.]])

np.dot(U, np.array([5**3 * x_hat[0], -1**3 * x_hat[1]]))

array([813., 812.])

np.dot(U, np.dot(Lambda ** 3, x_hat))

array([813., 812.])

np.dot(A ** 3, x) # NOT equivalent

array([237, 218])

Let's see a graphical representation of an x and $\hat{x}$

To make this easier to see let's make U rotated 45 degrees and make x just a bit off the X axis

U = np.array([[np.sqrt(1/2), -np.sqrt(1/2)],[np.sqrt(1/2), np.sqrt(1/2)]])
U

array([[ 0.70710678, -0.70710678],
       [ 0.70710678,  0.70710678]])

x = np.array([2,0.3])
x

array([2. , 0.3])

x_hat = np.dot(U, x)
x_hat

array([1.20208153, 1.6263456 ])

origin = [0,0]
plt.quiver(*origin, [0,1], [1,0], angles='xy', scale_units='xy', scale=1, color='blue', label='X and Y axes')
plt.quiver(*origin, x[0], x[1], angles='xy', scale_units='xy', scale=1, color='dodgerblue', label='x')
plt.quiver(*origin, U[0], U[1], angles='xy', scale_units='xy', scale=1, color='green', label='U axes')
plt.quiver(*origin, x_hat[0], x_hat[1], angles='xy', scale_units='xy', scale=1, color='limegreen', label='x hat')
plt.xlim(-1, 3)
plt.ylim(-1, 3)
plt.legend()
plt.show()

Let's plot the two column vectors of a 2x2 symmetric matrix and its two eigenvectors

A = np.array([[2,3],[3,2]])
A

array([[2, 3],
       [3, 2]])

U = np.linalg.eig(A)[1]
U

array([[ 0.70710678, -0.70710678],
       [ 0.70710678,  0.70710678]])

origin = [0,0]
plt.quiver(*origin, A[0], A[1], angles='xy', scale_units='xy', scale=1, color='blue', label='A')
plt.quiver(*origin, U[0], U[1], angles='xy', scale_units='xy', scale=1, color='green', label='U')
plt.xlim(-1, 3)
plt.ylim(-1, 3)
plt.legend()
plt.show()

Example of a non-symmetric matrix

A = np.array([[1,2],[3,4]])
A

array([[1, 2],
       [3, 4]])

U = np.linalg.eig(A)[1]
U

array([[-0.82456484, -0.41597356],
       [ 0.56576746, -0.90937671]])

origin = [0,0]
plt.quiver(*origin, A[0], A[1], angles='xy', scale_units='xy', scale=1, color='blue', label='A')
plt.quiver(*origin, U[0], U[1], angles='xy', scale_units='xy', scale=1, color='green', label='U')
plt.xlim(-1, 4)
plt.ylim(-1, 4)
plt.legend()
plt.show()

Example where the eigenvectors and eigenvalues are complex

A = np.array([[-1,-2],[7,4]])
A

array([[-1, -2],
       [ 7,  4]])

np.linalg.eig(A)

(array([1.5+2.78388218j, 1.5-2.78388218j]),
 array([[ 0.31497039-0.35073619j,  0.31497039+0.35073619j],
        [-0.8819171 +0.j        , -0.8819171 -0.j        ]]))

“Diagonalizing” quadratic form: compute $x^TAx$

A = np.array([[2,3],[3,2]])
A

array([[2, 3],
       [3, 2]])

x = np.array([3,4])
x

array([3, 4])

U = np.linalg.eig(A)[1]
U

array([[ 0.70710678, -0.70710678],
       [ 0.70710678,  0.70710678]])

lambda_1, lambda_2 = np.linalg.eig(A)[0]
lambda_1, lambda_2

(5.0, -1.0000000000000009)

x_hat = np.dot(np.transpose(U), x)
x_hat

array([4.94974747, 0.70710678])

np.dot(np.dot(np.transpose(x), A), x)

122

$ \displaystyle\sum_{i=1}^{n} \lambda_i\hat{x}_i^2 $

lambda_1 * x_hat[0] ** 2 + lambda_2 * x_hat[1] ** 2

122.0

To see that

$ max_{x\in{\mathbb{R}^n}} \space x^TAx \space subject \space to \space \|x\| = 1 $

is the largest eigenvalue, let's plot $x_1$ on the X axis and $x^TAx$ on the Y axis where $x_1$ is the first component of x and the second component is computed such that the norm of x is 1.

x2 = lambda x1: np.sqrt(1 - x1 ** 2)

x1_vals = np.linspace(-1,1,1000)
y_vals = []
for x1 in x1_vals:
    x = np.array([x1, x2(x1)])
    y_vals.append(np.dot(np.dot(np.transpose(x), A), x))
plt.plot(x1_vals, y_vals)
plt.show()

np.linalg.eig(A)

(array([ 5., -1.]), array([[ 0.70710678, -0.70710678],
        [ 0.70710678,  0.70710678]]))

np.max(y_vals)

4.999995662961227

x1_vals[np.argmax(y_vals)], x2(x1_vals[np.argmax(y_vals)])

(0.7077077077077076, 0.706505343540374)

np.min(y_vals)

-0.9999956629612274

x1_vals[np.argmin(y_vals)], x2(x1_vals[np.argmin(y_vals)])

(-0.7077077077077076, 0.706505343540374)

4 Matrix Calculus¶

4.1 The Gradient¶

Let's write out an example using an example function of a vector:

$ x\in{\mathbb{R}^2}\\ f(x) = x_1^2 + x_2^2 + x_1x_2\\ \nabla_xf(x) = \begin{bmatrix} \frac{\partial{f(x)}}{\partial{x_1}} \\ \frac{\partial{f(x)}}{\partial{x_2}} \end{bmatrix} = \begin{bmatrix} 2x_1 + x_2\\ 2x_2 + x_1 \end{bmatrix} $

So if $ x = \begin{bmatrix} 3 \\ 4 \end{bmatrix} , \nabla_xf(x) = \begin{bmatrix} 10 \\ 11 \end{bmatrix} $

f = lambda x1, x2: x1**2 + x2**2 + x1*x2

epsilon = 0.01
(f(3+epsilon, 4) - f(3, 4)) / epsilon, (f(3, 4+epsilon) - f(3, 4)) / epsilon

(10.009999999999764, 11.009999999999565)

x1, x2 = np.meshgrid(np.linspace(-5,5,10), np.linspace(-5,5,10))
ax = plt.axes(projection='3d')
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_zlabel('f')
ax.plot_surface(x1, x2, f(x1, x2), color='blue', alpha=0.5)
ax.scatter([3], [4], [f(3,4)], color='black')
ax.plot([3-2, 3+2], [4, 4], [f(3,4)-2*6, f(3,4)+2*6], color='red')
ax.plot([3, 3], [4-2, 4+2], [f(3,4)-2*8, f(3,4)+2*8], color='green')
plt.show()

4.2 The Hessian¶

Let's write an example using the same function as above

$ x\in{\mathbb{R}^2}\\ f(x) = x_1^2 + x_2^2 + x_1x_2\\ \nabla_xf(x) = \begin{bmatrix} \frac{\partial{f(x)}}{\partial{x_1}} \\ \frac{\partial{f(x)}}{\partial{x_2}} \end{bmatrix} = \begin{bmatrix} 2x_1 + x_2\\ 2x_2 + x_1 \end{bmatrix}\\ \nabla_x^2f(x) = \begin{bmatrix} \frac{\partial^2{f(x)}}{\partial{x_1}\partial{x_1}} \frac{\partial^2{f(x)}}{\partial{x_1}\partial{x_2}} \\ \frac{\partial^2{f(x)}}{\partial{x_2}\partial{x_1}} \frac{\partial^2{f(x)}}{\partial{x_2}\partial{x_2}} \end{bmatrix} = \begin{bmatrix} 2 & 1\\ 1 & 2 \end{bmatrix} $

4.3 Gradients and Hessians of Quadratic and Linear Functions¶

Let's try an example of the quadratic function $f(x) = x^TAx$ for $A\in{\mathbb{S}^2}$

$\nabla_xX^TAx = 2Ax$

A = np.array([[2,3],[3,2]])
A

array([[2, 3],
       [3, 2]])

f = lambda x: np.dot(np.dot(x, A), x)

x = np.array([3,4])

gradient = 2 * np.dot(A, x)
gradient

array([36, 34])

epsilon = 0.01
(f(x + np.array([epsilon, 0])) - f(x)) / epsilon, (f(x + np.array([0, epsilon])) - f(x)) / epsilon

(36.019999999999186, 34.019999999999584)

x1, x2 = np.meshgrid(np.linspace(-5, 5, 10), np.linspace(-5, 5, 10))
ax = plt.axes(projection='3d')
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_zlabel('f')
ax.plot_surface(x1, x2, np.apply_along_axis(f, 0, np.stack([x1, x2])), color='blue', alpha=0.5)
ax.scatter(*x, [f(x)], color='black')
ax.plot([x[0]-2, x[0]+2], [x[1], x[1]], [f(x)-2*gradient[0], f(x)+2*gradient[0]], color='red')
ax.plot([x[0], x[0]], [x[1]-2, x[1]+2], [f(x)-2*gradient[1], f(x)+2*gradient[1]], color='green')
plt.show()

4.4 Least Squares¶

Using the second projection example we did above in 3.9, let's calculate the x that minimizes $\|Ax - b\|_2^2$ with $(A^TA)^{-1}A^Tb$ just as we did originally, and also estimate x by searching.

A = np.array([[1 , 0], [0, 1], [0, 0.5]])
A

array([[1. , 0. ],
       [0. , 1. ],
       [0. , 0.5]])

b = np.array([1,1,1])
b

array([1, 1, 1])

x = np.dot(np.dot(np.linalg.inv(np.dot(A.T, A)), A.T), b)
x

array([1. , 1.2])

np.dot(A, x)

array([1. , 1.2, 0.6])

xs = np.array(np.meshgrid(np.linspace(-2, 2, 100), np.linspace(-2, 2, 100))).T.reshape(-1, 2)
distances = np.apply_along_axis(lambda x: np.linalg.norm(np.dot(A, x) - b), 1, xs)
x = xs[np.argmin(distances)]
x

array([0.98989899, 1.19191919])

np.dot(A, x)

array([0.98989899, 1.19191919, 0.5959596 ])

Compute the gradient of $\|Ax - b\|$ showing one interim step not shown in the review:

$ \nabla_x(x^TA^Tx - 2b^TAx + b^Tb = \\ \nabla_xx^TA^TAx - \nabla_x2b^TAx + \nabla_xb^Tb = \\ \nabla_xx^T(A^TA)x - 2\nabla_x(A^Tb)^Tx + \nabla_xb^Tb = \\ 2A^TAx - 2A^Tb + 0 = \\ 2A^TAx - 2A^Tb $

Let's try an example where we're trying to fit a line to 5 points.

A = np.random.rand(5,1)
A

array([[0.2108527 ],
       [0.82189899],
       [0.37643178],
       [0.82373638],
       [0.98731394]])

# b is 2A with random noise added in
b = A * 2 + (np.random.rand(5,1) - 0.5)/2
b

array([[0.67017475],
       [1.83202091],
       [0.99319781],
       [1.80996379],
       [2.15461524]])

x = np.dot(np.dot(np.linalg.inv(np.dot(A.T, A)), A.T), b)
x

array([[2.24219163]])

plt.scatter(A, b)
plt.plot([0, 1], [0, x[0]], color='green')
plt.show()

4.5 Gradients of the Determinant¶

Let's go through an example showing $\nabla_A|A| = |A|A^{-T}$ when $A\in{\mathbb{R}^{n\ \times\ n}}$

A = np.random.rand(2,2)
A

array([[0.72642508, 0.9872509 ],
       [0.81603603, 0.52725954]])

f = lambda A: np.linalg.det(A) # for convenience

epsilon = 0.0001
np.array(
    [
        [
            f(A + np.array([[epsilon, 0], [0, 0]])) - f(A),
            f(A + np.array([[0, epsilon], [0, 0]])) - f(A),
        ],
        [
            f(A + np.array([[0, 0], [epsilon, 0]])) - f(A),
            f(A + np.array([[0, 0], [0, epsilon]])) - f(A),
        ]
    ]) / epsilon

array([[ 0.52725954, -0.81603603],
       [-0.9872509 ,  0.72642508]])

np.dot(f(A), np.linalg.inv(A).T)

array([[ 0.52725954, -0.81603603],
       [-0.9872509 ,  0.72642508]])

Now let's go through an example showing $\nabla_Alog|A| = A^{-1}$ when $A\in{\mathbb{S}^n_{++}}$

A = np.array([[3,2],[2,3]])
A

array([[3, 2],
       [2, 3]])

f = lambda A: np.log(np.linalg.det(A))

epsilon = 0.0001
np.array(
    [
        [
            f(A + np.array([[epsilon, 0], [0, 0]])) - f(A),
            f(A + np.array([[0, epsilon], [0, 0]])) - f(A),
        ],
        [
            f(A + np.array([[0, 0], [epsilon, 0]])) - f(A),
            f(A + np.array([[0, 0], [0, epsilon]])) - f(A),
        ]
    ]) / epsilon

array([[ 0.599982, -0.400008],
       [-0.400008,  0.599982]])

np.linalg.inv(A)

array([[ 0.6, -0.4],
       [-0.4,  0.6]])