Linear Algebra Review

This notebook goes with cs229 Linear Algebra Review and Reference

I put this notebook together as I read the document linked above. I had three goals...

  • learn/review by playing with examples using real numbers
  • improve my fluency at translating between math notation and numpy
  • learn and practice LaTeX

-Eric Silberstein / Klaviyo

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
%matplotlib inline

1 Basic Concepts and Notation

1.1 Basic Notation

Let's create an example matrix $A\in{\mathbb{R}^{3 \times 2}}$

In [2]:
A = np.arange(6).reshape(3,2)
A
Out[2]:
array([[0, 1],
       [2, 3],
       [4, 5]])

$a_{22}$, the entry at the 2nd row and 2nd column

In [3]:
A[1,1]
Out[3]:
3

$a^1$, the first column (alternative notation: $a_1$)

In [4]:
A[:,0]
Out[4]:
array([0, 2, 4])

$a^T_2$, the second row (alternative notation: $\bar{a}_1$)

2 Matrix Multiplication

2.1 Vector-Vector Products

Given two example vectors $x, y\in{\mathbb{R}^n}$, let's compute the dot product $x^Ty$

In [5]:
x = np.array([2,3])
x
Out[5]:
array([2, 3])
In [6]:
y = np.array([4,5])
y
Out[6]:
array([4, 5])
In [7]:
np.dot(x.T, y), np.dot(y.T, x), np.dot(x, y), np.dot(y, x)
Out[7]:
(23, 23, 23, 23)
In [8]:
x[0]*y[0] + x[1]*y[1]
Out[8]:
23

Now let's compute the outer product $xy^T$

In [9]:
np.outer(x, y)
Out[9]:
array([[ 8, 10],
       [12, 15]])
In [10]:
np.array([
    [x[0]*y[0], x[0]*y[1]],
    [x[1]*y[0], x[1]*y[1]],
])
Out[10]:
array([[ 8, 10],
       [12, 15]])
In [11]:
x = x.reshape(2,1)
x
Out[11]:
array([[2],
       [3]])
In [12]:
y = y.reshape(2,1)
y
Out[12]:
array([[4],
       [5]])
In [13]:
np.dot(x,y.T)
Out[13]:
array([[ 8, 10],
       [12, 15]])

Outer product example, use $x\boldsymbol{1}^T$ where $\boldsymbol{1}\in{\mathbb{R}^3}$ to compute a matrix where each column is x

In [14]:
np.outer(x, np.ones(3))
Out[14]:
array([[2., 2., 2.],
       [3., 3., 3.]])

2.2 Matrix-Vector Products

In [15]:
A = np.arange(4).reshape(2,2) + 2
A
Out[15]:
array([[2, 3],
       [4, 5]])
In [16]:
x = np.array([6,7])
x
Out[16]:
array([6, 7])

Let numpy calculate $Ax$

In [17]:
np.dot(A,x)
Out[17]:
array([33, 59])

Calculate $Ax$ element-wise

In [18]:
np.array([A[0,0] * x[0] + A[0,1] * x[1], A[1,0] * x[0] + A[1,1] * x[1]])
Out[18]:
array([33, 59])

Calculate $Ax$ by $ \begin{bmatrix} a^T_1x \\ a^T_2x \\ \end{bmatrix} $

In [19]:
np.array([np.dot(A[0].T, x), np.dot(A[1].T, x)])
Out[19]:
array([33, 59])

Calculate $Ax$ by $ \begin{bmatrix} a^1 \end{bmatrix} x_1 + \begin{bmatrix} a^2 \end{bmatrix} x_2 $

In [20]:
np.dot(A[:,0], x[0]) + np.dot(A[:,1], x[1])
Out[20]:
array([33, 59])

2.3 Matrix-Matrix Products

In [21]:
A = np.arange(9).reshape(3,3)
A
Out[21]:
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
In [22]:
B = A * 2
B
Out[22]:
array([[ 0,  2,  4],
       [ 6,  8, 10],
       [12, 14, 16]])

Calculate AB using numpy

In [23]:
A.dot(B)
Out[23]:
array([[ 30,  36,  42],
       [ 84, 108, 132],
       [138, 180, 222]])

Calculate AB thinking of A as row vectors and B as column vectors

$ \begin{bmatrix} a^T_1b^1 & a^T_1b^2 & a^T_1b^3 \\ a^T_2b^1 & a^T_2b^2 & a^T_2b^3 \\ a^T_3b^1 & a^T_3b^2 & a^T_3b^3 \\ \end{bmatrix} $

In [24]:
np.array(
    [
        [np.dot(A[0], B[:,0]), np.dot(A[0], B[:,1]), np.dot(A[0], B[:,2])],
        [np.dot(A[1], B[:,0]), np.dot(A[1], B[:,1]), np.dot(A[1], B[:,2])],
        [np.dot(A[2], B[:,0]), np.dot(A[2], B[:,1]), np.dot(A[2], B[:,2])],
    ]
)
Out[24]:
array([[ 30,  36,  42],
       [ 84, 108, 132],
       [138, 180, 222]])

Calculate AB thinking of A as column vectors and B as row vectors

$ \displaystyle\sum_{i=1}^{n}a^ib^T_i $

In [25]:
np.outer(A[:,0], B[0]) + np.outer(A[:,1], B[1]) + np.outer(A[:,2], B[2])
Out[25]:
array([[ 30,  36,  42],
       [ 84, 108, 132],
       [138, 180, 222]])

Calculate AB thinking of A as a matrix and B as column vectors

$ \begin{bmatrix} Ab^1 & Ab^2 & Ab^3 \end{bmatrix} $

In [26]:
np.column_stack([np.dot(A, B[:,0]), np.dot(A, B[:,1]), np.dot(A, B[:,2])])
Out[26]:
array([[ 30,  36,  42],
       [ 84, 108, 132],
       [138, 180, 222]])

Calculate AB thinking of A as row vectors and B as a matrix

$ \begin{bmatrix} a^T_1B \\ a^T_2B \\ a^T_3B \end{bmatrix} $

In [27]:
np.stack([np.dot(A[0], B), np.dot(A[1], B), np.dot(A[2], B)])
Out[27]:
array([[ 30,  36,  42],
       [ 84, 108, 132],
       [138, 180, 222]])

Example of matrix multiplication being associative: $(AB)C = A(BC)$

In [28]:
C = np.random.rand(3,2)
C
Out[28]:
array([[0.48016108, 0.78511536],
       [0.14709867, 0.23049283],
       [0.63444509, 0.77271909]])
In [29]:
np.allclose(np.dot(np.dot(A, B), C), np.dot(A, np.dot(B, C)))
Out[29]:
True

Example of matrix multiplication being distributive: $A(B + C) = AB + AC$

In [30]:
C = np.random.rand(3,3)
In [31]:
np.allclose(np.dot(A, B + C), np.dot(A, B) + np.dot(A, C))
Out[31]:
True

Matrix multiplication is not commutative: $AB \ne $BA

In [32]:
np.allclose(np.dot(A,B), np.dot(B,A))
Out[32]:
True

Example of multiplication of non-square matrices:

In [33]:
np.dot(np.random.rand(2,3), np.random.rand(3,4))
Out[33]:
array([[0.57285884, 0.45617738, 0.17109287, 0.227468  ],
       [1.0131327 , 0.57271021, 0.33625262, 0.73263324]])

3 Operations and Properties

3.1 The Identity Matrix and Diagonal Matrices

Identity matrix example

In [34]:
I = np.eye(3,3)
I
Out[34]:
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

Example of $AI = A = IA$

In [35]:
A
Out[35]:
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
In [36]:
np.dot(A, I)
Out[36]:
array([[0., 1., 2.],
       [3., 4., 5.],
       [6., 7., 8.]])
In [37]:
np.dot(I, A)
Out[37]:
array([[0., 1., 2.],
       [3., 4., 5.],
       [6., 7., 8.]])

Diagonal matrix example

In [38]:
np.diag([3,4,5])
Out[38]:
array([[3, 0, 0],
       [0, 4, 0],
       [0, 0, 5]])

3.2 The Transpose

In [39]:
A
Out[39]:
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
In [40]:
A.T
Out[40]:
array([[0, 3, 6],
       [1, 4, 7],
       [2, 5, 8]])

$(A^T)^T = A$

In [41]:
A.T.T
Out[41]:
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

$(AB)^T = B^TA^T$

In [42]:
np.dot(A, B).T
Out[42]:
array([[ 30,  84, 138],
       [ 36, 108, 180],
       [ 42, 132, 222]])
In [43]:
np.dot(B.T, A.T)
Out[43]:
array([[ 30,  84, 138],
       [ 36, 108, 180],
       [ 42, 132, 222]])

$(A + B)^T = A^T + B^T$

In [44]:
(A + B).T
Out[44]:
array([[ 0,  9, 18],
       [ 3, 12, 21],
       [ 6, 15, 24]])
In [45]:
A.T + B.T
Out[45]:
array([[ 0,  9, 18],
       [ 3, 12, 21],
       [ 6, 15, 24]])

3.3 Symmetric Matrices

Example of a symmetric matrix

In [46]:
S = np.array([[1,2,3],[2,4,5],[3,5,6]])
S
Out[46]:
array([[1, 2, 3],
       [2, 4, 5],
       [3, 5, 6]])
In [47]:
S.T
Out[47]:
array([[1, 2, 3],
       [2, 4, 5],
       [3, 5, 6]])

Example of an anti-symmetric matrix

In [48]:
AS = np.array([[0,2,3],[-2,0,5],[-3,-5,0]])
AS
Out[48]:
array([[ 0,  2,  3],
       [-2,  0,  5],
       [-3, -5,  0]])
In [49]:
-AS.T
Out[49]:
array([[ 0,  2,  3],
       [-2,  0,  5],
       [-3, -5,  0]])

Form a symmetric matrix by $A + A^T, A\in{\mathbb{R}^{n \times n}}$

In [50]:
S = A + A.T
S
Out[50]:
array([[ 0,  4,  8],
       [ 4,  8, 12],
       [ 8, 12, 16]])
In [51]:
np.allclose(S, S.T)
Out[51]:
True

Form an anti-symmetric matrix by $A - A^T$

In [52]:
AS = A - A.T
AS
Out[52]:
array([[ 0, -2, -4],
       [ 2,  0, -2],
       [ 4,  2,  0]])
In [53]:
np.allclose(AS, -AS.T)
Out[53]:
True

Example of $A\in{\mathbb{R}^{n \times n}}$ represented as a sum of a symmetric matrix and an anti-symmetric matrix

$A = \frac{1}{2}(A + A^T) + \frac{1}{2}(A - A^T)$

In [54]:
np.allclose(A, S/2 + AS/2)
Out[54]:
True

3.4 The Trace

Sum of the diagonal elements

$ trA = \displaystyle\sum_{i=1}^nA_{ii} $

In [55]:
A
Out[55]:
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
In [56]:
np.trace(A)
Out[56]:
12

Example of $trAB = trBA$

In [57]:
np.trace(np.dot(A, B))
Out[57]:
360
In [58]:
np.trace(np.dot(B, A))
Out[58]:
360

3.5 Norms

Euclidian ($l_2$) norm of a vector

$ \|x\|_2 = \sqrt{\displaystyle\sum_{i=1}^{n}x_i^2} $

In [59]:
x = np.arange(3)
x
Out[59]:
array([0, 1, 2])
In [60]:
np.linalg.norm(x, ord=2)
Out[60]:
2.23606797749979
In [61]:
np.sqrt(np.sum(x ** 2))
Out[61]:
2.23606797749979

$l_1$ norm of a vector

$ \|x\|_1 = \sqrt{\displaystyle\sum_{i=1}^{n}\left|x_i\right|} $

In [62]:
np.linalg.norm(x, ord=1)
Out[62]:
3.0
In [63]:
np.sum(np.abs(x))
Out[63]:
3

Frobenius norm of a matrix

$ \|A\|_F = \sqrt{\displaystyle\sum_{i = 1}^m\displaystyle\sum_{j = 1}^na_{ij}^2} $

In [64]:
np.linalg.norm(A, ord='fro')
Out[64]:
14.2828568570857
In [65]:
np.sqrt(np.sum(A ** 2))
Out[65]:
14.2828568570857

3.6 Linear Independence and Rank

Let's look at a matrix with 3 columns and rank 2

In [66]:
A
Out[66]:
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
In [67]:
np.linalg.matrix_rank(A)
Out[67]:
2

$-a^1 + 2a^2 = a^3$

In [68]:
-A[:,0] + 2 * A[:,1]
Out[68]:
array([2, 5, 8])

$-a^T_1 + 2a^T_2 = a^T_3$

In [69]:
-A[0] + 2 * A[1]
Out[69]:
array([6, 7, 8])

Let's look at a full rank matrix

In [70]:
R = np.random.rand(3, 3)
R
Out[70]:
array([[0.82830024, 0.46109313, 0.27433051],
       [0.85200938, 0.26364719, 0.32397499],
       [0.31054344, 0.50805488, 0.26092114]])
In [71]:
np.linalg.matrix_rank(R)
Out[71]:
3

3.7 The Inverse of a Square Matrix

A is not full rank so it's not invertible

In [72]:
A = np.arange(9).reshape(3,3)
A
Out[72]:
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
In [73]:
np.linalg.matrix_rank(A)
Out[73]:
2
In [74]:
try:
    np.linalg.inv(A)
except np.linalg.LinAlgError as err:
    print(f"Can't invert A due to {err}")
Can't invert A due to Singular matrix

Let's create an A that is invertible

In [75]:
A = np.array([[0,1,2], [3,4,5], [6,7,9]])
A
Out[75]:
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 9]])
In [76]:
np.linalg.matrix_rank(A)
Out[76]:
3
In [77]:
np.linalg.inv(A)
Out[77]:
array([[-0.33333333, -1.66666667,  1.        ],
       [-1.        ,  4.        , -2.        ],
       [ 1.        , -2.        ,  1.        ]])

$A^{-1}A = I = AA^{-1}$

In [78]:
np.allclose(np.dot(np.linalg.inv(A), A), np.eye(3))
Out[78]:
True
In [79]:
np.allclose(np.dot(A, np.linalg.inv(A)), np.eye(3))
Out[79]:
True

$(A^{-1})^{-1} = A$

In [80]:
np.linalg.inv(np.linalg.inv(A))
Out[80]:
array([[-0.,  1.,  2.],
       [ 3.,  4.,  5.],
       [ 6.,  7.,  9.]])

$(AB)^{-1} = B^{-1}A^{-1}$

In [81]:
B = np.random.rand(3, 3)
B
Out[81]:
array([[0.84735242, 0.0950299 , 0.23436798],
       [0.00574613, 0.2061367 , 0.87375284],
       [0.75017117, 0.32154513, 0.74314387]])
In [82]:
np.linalg.matrix_rank(B)
Out[82]:
3
In [83]:
np.allclose(np.linalg.inv(np.dot(A, B)), np.dot(np.linalg.inv(B), np.linalg.inv(A)))
Out[83]:
True

$(A^{-1})^T = (A^T)^{-1}$

In [84]:
np.linalg.inv(A).T
Out[84]:
array([[-0.33333333, -1.        ,  1.        ],
       [-1.66666667,  4.        , -2.        ],
       [ 1.        , -2.        ,  1.        ]])
In [85]:
np.linalg.inv(A.T)
Out[85]:
array([[-0.33333333, -1.        ,  1.        ],
       [-1.66666667,  4.        , -2.        ],
       [ 1.        , -2.        ,  1.        ]])

Example of using the inverse to solve a system of equations

In [86]:
A = np.array([[1,2],[3,4]])
A
Out[86]:
array([[1, 2],
       [3, 4]])
In [87]:
x = np.array([5,6])
x
Out[87]:
array([5, 6])
In [88]:
b = np.dot(A, x)
b
Out[88]:
array([17, 39])

$ A\in{\mathbb{R}^{2 \times 2}} \\ x,b\in{\mathbb{R}^2} $

Say we know A and b and want to solve for x in $Ax = b$, we can hand calculate x by...

$ x_1 + 2x_2 = 17 \\ 3x_1 + 4x_2 = 39 $

$ x_1 = 5 \\ x_2 = 6 $

...or we can do x = $A^{-1}b$

In [89]:
np.dot(np.linalg.inv(A), b)
Out[89]:
array([5., 6.])

3.8 Orthogonal Matrices

Example of orthogonal vectors

In [90]:
x = np.array([1, 1])
x
Out[90]:
array([1, 1])
In [91]:
y = np.array([1, -1])
y
Out[91]:
array([ 1, -1])
In [92]:
np.dot(x, y)
Out[92]:
0

Form an orthogonal matrix from x and y

In [93]:
x = x / np.linalg.norm(x)
x
Out[93]:
array([0.70710678, 0.70710678])
In [94]:
y = y / np.linalg.norm(y)
y
Out[94]:
array([ 0.70710678, -0.70710678])

$\|x\|_2 = 1$ and $\|y\|_2 = 1$ (x and y are normalized)

In [95]:
np.linalg.norm(x), np.linalg.norm(y)
Out[95]:
(0.9999999999999999, 0.9999999999999999)
In [96]:
U = np.column_stack((x, y))
U
Out[96]:
array([[ 0.70710678,  0.70710678],
       [ 0.70710678, -0.70710678]])

$U^TU = I = UU^T$

In [97]:
np.dot(U.T, U)
Out[97]:
array([[1., 0.],
       [0., 1.]])
In [98]:
np.dot(U, U.T)
Out[98]:
array([[1., 0.],
       [0., 1.]])

Operating on a vector with an orthogonal matrix will not change its Euclidean norm

$ \|Uz\|_2 = \|z\|_2$ for $z\in{\mathbb{R}^n}, U\in{\mathbb{R}^{n \times n}}$ orthogonal

In [99]:
z = np.random.rand(2)
z
Out[99]:
array([0.80175368, 0.52946705])
In [100]:
np.linalg.norm(z)
Out[100]:
0.9608039912644135
In [101]:
np.linalg.norm(np.dot(U, z))
Out[101]:
0.9608039912644135

3.9 Range and Nullspace of a Matrix

Let's do a projection example that we visualize in 3D as projecting vector y to the XY plane.

In [102]:
y = np.array([1,1,1])
y
Out[102]:
array([1, 1, 1])
In [103]:
a1 = np.array([1,0,0]) # x axis
a2 = np.array([0,1,0]) # y axis
A = np.column_stack((a1, a2))
A
Out[103]:
array([[1, 0],
       [0, 1],
       [0, 0]])

$ Proj(y; A) = A(A^TA)^{−1}A^Ty $

In [104]:
y_projection = np.dot(np.dot(np.dot(A,np.linalg.inv(np.dot(A.T, A))), A.T), y)
y_projection
Out[104]:
array([1., 1., 0.])
In [105]:
ax = plt.axes(projection='3d')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
ax.set_xlim([0, 1])
ax.set_ylim([0, 1])
ax.set_zlim([0, 1])
ax.set_xticks([0,1])
ax.set_yticks([0,1])
ax.set_zticks([0,1])
origin = [0,0,0]
ax.quiver(*origin, *y, color='blue', label='y')
ax.quiver(*origin, *a1, color='green', label='a1')
ax.quiver(*origin, *a2, color='green', label='a2')
ax.quiver(*origin, *y_projection, color='red', label='projection')
ax.legend()
ax.legend(loc='upper left')
Out[105]:
<matplotlib.legend.Legend at 0x111a18f98>

Now let's project y onto a different plane

In [106]:
a1 = np.array([1,0,0])
a2 = np.array([0,1,0.5])
A = np.column_stack((a1, a2))
A
Out[106]:
array([[1. , 0. ],
       [0. , 1. ],
       [0. , 0.5]])
In [107]:
y_projection = np.dot(np.dot(np.dot(A,np.linalg.inv(np.dot(A.T, A))), A.T), y)
y_projection
Out[107]:
array([1. , 1.2, 0.6])
In [108]:
ax = plt.axes(projection='3d')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
ax.set_xlim([0, 1])
ax.set_ylim([0, 1])
ax.set_zlim([0, 1])
ax.set_xticks([0,0.5,1])
ax.set_yticks([0,0.5,1])
ax.set_zticks([0,0.5,1])
origin = [0,0,0]
ax.quiver(*origin, *y, color='blue', label='y')
ax.quiver(*origin, *a1, color='green', label='a1')
ax.quiver(*origin, *a2, color='green', label='a2')
ax.plot_surface(
    np.column_stack([np.zeros(2), A[0]]),
    np.column_stack([np.zeros(2), A[1]]),
    np.column_stack([np.zeros(2), A[2]]),
    color='green',
    alpha=0.3)
ax.quiver(*origin, *y_projection, color='red', label='projection')
ax.legend()
ax.legend(loc='upper left')
plt.show()

Example of when A contains only a single column, this gives the special case for a projection of a vector on to a line:

$ Proj(y;a) = \displaystyle\frac{aa^T}{a^Ta}y $

In [109]:
a = np.array([1, 0])
a
Out[109]:
array([1, 0])
In [110]:
y = np.array([.5, .5])
y
Out[110]:
array([0.5, 0.5])
In [111]:
y_projection = np.dot(np.outer(a, a) / np.dot(a, a), y)
y_projection
Out[111]:
array([0.5, 0. ])
In [112]:
origin = np.array([0,0])
plt.plot(*zip(origin, y), color='blue', label='y')
plt.plot(*zip(origin, a), color='green', label='a')
plt.plot(*zip(origin, y_projection), color='red', alpha=0.8, label='projection')
plt.legend()
plt.show()

Another example

In [113]:
a = np.array([1, 1])
a
Out[113]:
array([1, 1])
In [114]:
y = np.array([.5, .2])
y
Out[114]:
array([0.5, 0.2])
In [115]:
y_projection = np.dot(np.outer(a, a) / np.dot(a, a), y)
y_projection
Out[115]:
array([0.35, 0.35])
In [116]:
origin = np.array([0,0])
plt.plot(*zip(origin, y), color='blue', label='y')
plt.plot(*zip(origin, a), color='green', label='a')
plt.plot(*zip(origin, y_projection), color='red', alpha=0.8, label='projection')
plt.legend()
plt.show()

Example of a vector in the nullspace

In [117]:
A = np.array([[1,0,0],[0,1,0],[0,0,0]])
A
Out[117]:
array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 0]])
In [118]:
x = np.array([0,0,5])
x
Out[118]:
array([0, 0, 5])
In [119]:
np.dot(A, x)
Out[119]:
array([0, 0, 0])

3.10 The Determinant

In [120]:
A = np.array([[1,3],[3,2]])
A
Out[120]:
array([[1, 3],
       [3, 2]])
In [121]:
np.linalg.det(A)
Out[121]:
-7.000000000000001

Let's plot some of the linear combinations of the row vectors with coefficients $0 \leq \alpha_1 \leq 1$ and $0 \leq \alpha_2 \leq 1$

In [122]:
a1 = A[0]
a2 = A[1]
a1, a2
Out[122]:
(array([1, 3]), array([3, 2]))
In [123]:
combinations = []
for alpha_1 in np.linspace(0,1,10):
    for alpha_2 in np.linspace(0,1,10):
        combinations.append(alpha_1 * a1 + alpha_2 * a2)
ax = plt.axes()
ax.set_xlim([0, 6])
ax.set_ylim([0, 6])
ax.scatter(*zip(*combinations))
origin = np.array([0,0])
ax.plot(*zip(origin, a1), color='red')
ax.plot(*zip(origin, a2), color='red')
plt.show()

Show and plot an example of $\left|I\right| = 1$

In [124]:
I = np.eye(2)
np.linalg.det(I)
Out[124]:
1.0
In [125]:
combinations = []
for alpha_1 in np.linspace(0,1,10):
    for alpha_2 in np.linspace(0,1,10):
        combinations.append(alpha_1 * I[0] + alpha_2 * I[1])
ax = plt.axes()
ax.set_xlim([-1, 2])
ax.set_ylim([-1, 2])
ax.scatter(*zip(*combinations))
origin = np.array([0,0])
ax.plot(*zip(origin, I[0]), color='red')
ax.plot(*zip(origin, I[1]), color='red')
plt.show()

Example that if we multiply a single row in A by scalar t then new determinant is t|A|

In [126]:
A
Out[126]:
array([[1, 3],
       [3, 2]])
In [127]:
t = 5
In [128]:
np.linalg.det(np.row_stack([A[0]*t, A[1]]))
Out[128]:
-35.00000000000001
In [129]:
t * np.linalg.det(A)
Out[129]:
-35.00000000000001

Example that if we exchange two rows in A then new determinant is -|A|

In [130]:
np.linalg.det(A)
Out[130]:
-7.000000000000001
In [131]:
B = np.roll(A, 1, axis=0)
B
Out[131]:
array([[3, 2],
       [1, 3]])
In [132]:
np.linalg.det(B)
Out[132]:
7.000000000000001

Example that if $A\in{\mathbb{R}^{n \times n}}, \left|A\right| = \left|A^T\right|$

In [133]:
np.linalg.det(A.T)
Out[133]:
-7.000000000000001

Example that if $A,B\in{\mathbb{R}^{n \times n}}, \left|AB\right| = \left|A\right|\left|B\right|$

In [134]:
B = np.random.rand(2,2)
B
Out[134]:
array([[0.54705161, 0.53583754],
       [0.19440937, 0.28350404]])
In [135]:
np.linalg.det(A.dot(B))
Out[135]:
-0.3564365288548418
In [136]:
np.linalg.det(A) * np.linalg.det(B)
Out[136]:
-0.35643652885484195

Example that if A is singular, $|A| = 0$

In [137]:
A = np.array([[1,2],[2,4]])
A
Out[137]:
array([[1, 2],
       [2, 4]])
In [138]:
np.linalg.det(A)
Out[138]:
0.0

Example that if A is non-singular, $\left|A^{-1}\right| = \displaystyle\frac{1}{\left|A\right|}$

In [139]:
A = np.array([[1,2],[3,4]])
A
Out[139]:
array([[1, 2],
       [3, 4]])
In [140]:
np.linalg.det(np.linalg.inv(A))
Out[140]:
-0.4999999999999998
In [141]:
1 / np.linalg.det(A)
Out[141]:
-0.4999999999999999

Let's write a recursive function that matches the general definition of the determinant:

$ \left|A\right| = \displaystyle\sum_{i=1}^{n}(-1)^{i+j}a_{ij}\left|A_{\backslash{i},\backslash{j}}\right| $

In [142]:
delete_ith_row_jth_column = lambda A, i, j: np.delete(np.delete(A, i, 0), j, 1)
In [143]:
delete_ith_row_jth_column(np.array([[1,2],[3,4]]), 0, 0)
Out[143]:
array([[4]])
In [144]:
def my_det(A):
    if A.shape == (1,1):
        return A[0][0]
    else:
        j = 0 # for clarity
        det_sum = 0
        for i in range(A.shape[0]):
            det_sum += (-1) ** (i+j) * A[i][j] * my_det(delete_ith_row_jth_column(A, i, j))
        return det_sum
In [145]:
A = np.array([[1,2,3],[4,5,6],[7,8,10]])
A
Out[145]:
array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8, 10]])
In [146]:
my_det(A)
Out[146]:
-3
In [147]:
np.linalg.det(A)
Out[147]:
-3.000000000000001

Let's write a function for the adjoint:

$(adj(A))_{ij} = (-1)^{(i+j)}\left|A_{\backslash{j},\backslash{i}}\right|$

In [148]:
def my_adj(A):
    adj = np.zeros(A.shape)
    for i in range(A.shape[0]):
        for j in range(A.shape[0]):
            adj[i,j] = (-1) ** (i + j) * np.linalg.det(delete_ith_row_jth_column(A, j, i))
    return adj
In [149]:
A
Out[149]:
array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8, 10]])
In [150]:
my_adj(A)
Out[150]:
array([[  2.,   4.,  -3.],
       [  2., -11.,   6.],
       [ -3.,   6.,  -3.]])

Example of

$ A^{-1} = \displaystyle\frac{1}{|A|}adj(A) $

In [151]:
np.allclose(np.linalg.inv(A), 1/np.linalg.det(A) * my_adj(A))
Out[151]:
True

3.11 Quadratic Forms and Positive Semidefinite Matrices

Example of the quadratic form $x^TAx$

In [152]:
A = np.array([[1,2],[3,4]])
A
Out[152]:
array([[1, 2],
       [3, 4]])
In [153]:
x = np.array([5,6])
x
Out[153]:
array([5, 6])
In [154]:
np.dot(np.dot(x,A),x)
Out[154]:
319

$A_{11}x_1x_1 + A_{12}x_1x_2 + A_{21}x_2x_1 + A_{22}x_2x_2$

In [155]:
1*5*5 + 2*5*6 + 3*6*5 + 4*6*6
Out[155]:
319

Example of a symmetric matrix that is indefinite (because there exists $x1, x2\in{\mathbb{R}^n}$ such that $x^T_1Ax_1 > 0$ and $x^T_2Ax_2 < 0$)

In [156]:
A = np.array([[1,2],[2,1]])
A
Out[156]:
array([[1, 2],
       [2, 1]])
In [157]:
np.allclose(A, A.T)
Out[157]:
True
In [158]:
x = np.array([1,0])
np.dot(x,np.dot(A,x))
Out[158]:
1
In [159]:
x = np.array([1,-1])
np.dot(x,np.dot(A,x))
Out[159]:
-2

Example of a positive definite matrix

In [160]:
A = np.array([[2,0],[0,2]])
A
Out[160]:
array([[2, 0],
       [0, 2]])

since

$x^TAx = x_1(2x_1 + 0x_2) + x_2(0x_1 + 2x_2) = 2(x_1^2 + x_2^2)$

which is positive for all $x_1, x_2\in{\mathbb{R}}$ except $x = 0$

In [161]:
np.linalg.matrix_rank(A)
Out[161]:
2

3.12 Eigenvalues and Eigenvectors

Example eigenvalue $\lambda$ and eigenvector $x$ of $A$

$ Ax = \lambda x $

In [162]:
A = np.array([[1,2],[3,4]])
A
Out[162]:
array([[1, 2],
       [3, 4]])
In [163]:
eigenvalues, eigenvectors = np.linalg.eig(A)
eigenvalues, eigenvectors
Out[163]:
(array([-0.37228132,  5.37228132]), array([[-0.82456484, -0.41597356],
        [ 0.56576746, -0.90937671]]))
In [164]:
lambd = eigenvalues[0]
lambd
Out[164]:
-0.3722813232690143
In [165]:
x = eigenvectors[:,0]
x
Out[165]:
array([-0.82456484,  0.56576746])
In [166]:
np.linalg.norm(x)
Out[166]:
1.0
In [167]:
np.dot(A, x)
Out[167]:
array([ 0.30697009, -0.21062466])
In [168]:
lambd * x
Out[168]:
array([ 0.30697009, -0.21062466])

$ (\lambda I - A)x = 0 $

In [169]:
np.dot(lambd * np.eye(2) - A, x)
Out[169]:
array([0., 0.])

$ |(\lambda I - A)| = 0 $

In [170]:
np.linalg.det(lambd * np.eye(2) - A)
Out[170]:
-1.250741480074576e-16
In [171]:
my_det(lambd * np.eye(2) - A)
Out[171]:
0.0

Example of an array whose eigenvalues and eigenvectors are complex

In [172]:
A = np.array([[1,2],[-3,4]])
A
Out[172]:
array([[ 1,  2],
       [-3,  4]])
In [173]:
np.linalg.eig(A)
Out[173]:
(array([2.5+1.93649167j, 2.5-1.93649167j]),
 array([[0.38729833-0.5j, 0.38729833+0.5j],
        [0.77459667+0.j , 0.77459667-0.j ]]))

The trace of a A is equal to the sum of its eigenvalues

$ trA = \displaystyle\sum_{i=1}^{n} \lambda_{i} $

In [174]:
A = np.array([[1,2],[3,4]])
A
Out[174]:
array([[1, 2],
       [3, 4]])
In [175]:
eigenvalues, eigenvectors = np.linalg.eig(A)
eigenvalues, eigenvectors
Out[175]:
(array([-0.37228132,  5.37228132]), array([[-0.82456484, -0.41597356],
        [ 0.56576746, -0.90937671]]))
In [176]:
np.trace(A)
Out[176]:
5
In [177]:
np.sum(eigenvalues)
Out[177]:
5.0

The determinant of A is equal to the product of its eigenvalues

$ |A| = \displaystyle\prod_{i=1}^{n} \lambda_{i} $

In [178]:
np.linalg.det(A)
Out[178]:
-2.0000000000000004
In [179]:
np.prod(eigenvalues)
Out[179]:
-1.9999999999999998

Suppose A is non-singular with eigenvalue λ and an associated eigenvector x. Then $1/\lambda$ is an eigenvalue of $A^{−1}$ with an associated eigenvector x, i.e.,

$ A^{-1} x = (1/\lambda)x $

In [180]:
np.dot(np.linalg.inv(A), x)
Out[180]:
array([ 2.21489715, -1.51973099])
In [181]:
(1/lambd) * x
Out[181]:
array([ 2.21489715, -1.51973099])

(To prove this, take the eigenvector equation, $Ax = \lambda x$ and left-multiply each side by $A^{−1}$.)

$ Ax = \lambda x $

$ A^{-1}Ax = A^{-1}\lambda x $

$ x = A^{-1}\lambda x $

$ (1/\lambda) x = A^{-1} x $

$ A^{-1} x = (1/\lambda) x $

The eigenvalues of a diagonal matrix $D = diag(d_1, . . . d_n)$ are just the diagonal entries $d_1,...d_n$.

In [182]:
D = np.diag([3,4,5])
D
Out[182]:
array([[3, 0, 0],
       [0, 4, 0],
       [0, 0, 5]])
In [183]:
np.linalg.eig(D)
Out[183]:
(array([3., 4., 5.]), array([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]]))

3.13 Eigenvalues and Eigenvectors of Symmetric Matrices

Throughout this section, A is a symmetric real matrix.

In [184]:
A = np.array([[2,3],[3,2]])
A
Out[184]:
array([[2, 3],
       [3, 2]])
In [185]:
np.array_equal(A, np.transpose(A))
Out[185]:
True
In [186]:
np.linalg.eigvals(A)
Out[186]:
array([ 5., -1.])

There exists a set of eigenvectors $u_1, . . . , u_n$ such that a) for all $i, u_i$ is an eigenvector with eigenvalue $\lambda_i$ and b) $u_1, . . . , u_n$ are unit vectors and orthogonal to each other.

In [187]:
np.linalg.eig(A)[1]
Out[187]:
array([[ 0.70710678, -0.70710678],
       [ 0.70710678,  0.70710678]])
In [188]:
u1, u2 = np.linalg.eig(A)[1].T
u1, u2
Out[188]:
(array([0.70710678, 0.70710678]), array([-0.70710678,  0.70710678]))
In [189]:
np.linalg.norm(u1)
Out[189]:
1.0
In [190]:
np.linalg.norm(u2)
Out[190]:
1.0
In [191]:
u1.dot(u2)
Out[191]:
0.0

Example of $AU = U\Lambda$

In [192]:
U = np.linalg.eig(A)[1]
U
Out[192]:
array([[ 0.70710678, -0.70710678],
       [ 0.70710678,  0.70710678]])
In [193]:
Lambda = np.diag(np.linalg.eig(A)[0])
Lambda
Out[193]:
array([[ 5.,  0.],
       [ 0., -1.]])
In [194]:
np.dot(A,U)
Out[194]:
array([[ 3.53553391,  0.70710678],
       [ 3.53553391, -0.70710678]])
In [195]:
np.dot(U, Lambda)
Out[195]:
array([[ 3.53553391,  0.70710678],
       [ 3.53553391, -0.70710678]])

Example of the diagonalization of A

$A = AUU^T = U\Lambda U^T$

In [196]:
np.dot(np.dot(U, Lambda), U.T)
Out[196]:
array([[2., 3.],
       [3., 2.]])

Example of representing a vector x with respect to another basis U

In [197]:
U
Out[197]:
array([[ 0.70710678, -0.70710678],
       [ 0.70710678,  0.70710678]])
In [198]:
x = np.array([6,7])
x
Out[198]:
array([6, 7])
In [199]:
x_hat = np.dot(U.T, x)
x_hat
Out[199]:
array([9.19238816, 0.70710678])

Example of computing Ax by $\Lambda\hat{x}$ and then converting back

In [200]:
Lambda
Out[200]:
array([[ 5.,  0.],
       [ 0., -1.]])
In [201]:
Lambda_x_hat = np.dot(Lambda, x_hat)
Lambda_x_hat
Out[201]:
array([45.96194078, -0.70710678])
In [202]:
np.dot(U, Lambda_x_hat)
Out[202]:
array([33., 32.])
In [203]:
np.dot(A, x)
Out[203]:
array([33, 32])

Let's calculate AAAx directly and by using the eigenvalues of A and converting back

In [204]:
np.dot(np.dot(np.dot(A,A),A), x)
Out[204]:
array([813, 812])
In [205]:
Lambda
Out[205]:
array([[ 5.,  0.],
       [ 0., -1.]])
In [206]:
np.dot(U, np.array([5**3 * x_hat[0], -1**3 * x_hat[1]]))
Out[206]:
array([813., 812.])
In [207]:
np.dot(U, np.dot(Lambda ** 3, x_hat))
Out[207]:
array([813., 812.])
In [208]:
np.dot(A ** 3, x) # NOT equivalent
Out[208]:
array([237, 218])

Let's see a graphical representation of an x and $\hat{x}$

To make this easier to see let's make U rotated 45 degrees and make x just a bit off the X axis

In [209]:
U = np.array([[np.sqrt(1/2), -np.sqrt(1/2)],[np.sqrt(1/2), np.sqrt(1/2)]])
U
Out[209]:
array([[ 0.70710678, -0.70710678],
       [ 0.70710678,  0.70710678]])
In [210]:
x = np.array([2,0.3])
x
Out[210]:
array([2. , 0.3])
In [211]:
x_hat = np.dot(U, x)
x_hat
Out[211]:
array([1.20208153, 1.6263456 ])
In [212]:
origin = [0,0]
plt.quiver(*origin, [0,1], [1,0], angles='xy', scale_units='xy', scale=1, color='blue', label='X and Y axes')
plt.quiver(*origin, x[0], x[1], angles='xy', scale_units='xy', scale=1, color='dodgerblue', label='x')
plt.quiver(*origin, U[0], U[1], angles='xy', scale_units='xy', scale=1, color='green', label='U axes')
plt.quiver(*origin, x_hat[0], x_hat[1], angles='xy', scale_units='xy', scale=1, color='limegreen', label='x hat')
plt.xlim(-1, 3)
plt.ylim(-1, 3)
plt.legend()
plt.show()

Let's plot the two column vectors of a 2x2 symmetric matrix and its two eigenvectors

In [213]:
A = np.array([[2,3],[3,2]])
A
Out[213]:
array([[2, 3],
       [3, 2]])
In [214]:
U = np.linalg.eig(A)[1]
U
Out[214]:
array([[ 0.70710678, -0.70710678],
       [ 0.70710678,  0.70710678]])
In [215]:
origin = [0,0]
plt.quiver(*origin, A[0], A[1], angles='xy', scale_units='xy', scale=1, color='blue', label='A')
plt.quiver(*origin, U[0], U[1], angles='xy', scale_units='xy', scale=1, color='green', label='U')
plt.xlim(-1, 3)
plt.ylim(-1, 3)
plt.legend()
plt.show()

Example of a non-symmetric matrix

In [216]:
A = np.array([[1,2],[3,4]])
A
Out[216]:
array([[1, 2],
       [3, 4]])
In [217]:
U = np.linalg.eig(A)[1]
U
Out[217]:
array([[-0.82456484, -0.41597356],
       [ 0.56576746, -0.90937671]])
In [218]:
origin = [0,0]
plt.quiver(*origin, A[0], A[1], angles='xy', scale_units='xy', scale=1, color='blue', label='A')
plt.quiver(*origin, U[0], U[1], angles='xy', scale_units='xy', scale=1, color='green', label='U')
plt.xlim(-1, 4)
plt.ylim(-1, 4)
plt.legend()
plt.show()

Example where the eigenvectors and eigenvalues are complex

In [219]:
A = np.array([[-1,-2],[7,4]])
A
Out[219]:
array([[-1, -2],
       [ 7,  4]])
In [220]:
np.linalg.eig(A)
Out[220]:
(array([1.5+2.78388218j, 1.5-2.78388218j]),
 array([[ 0.31497039-0.35073619j,  0.31497039+0.35073619j],
        [-0.8819171 +0.j        , -0.8819171 -0.j        ]]))

“Diagonalizing” quadratic form: compute $x^TAx$

In [221]:
A = np.array([[2,3],[3,2]])
A
Out[221]:
array([[2, 3],
       [3, 2]])
In [222]:
x = np.array([3,4])
x
Out[222]:
array([3, 4])
In [223]:
U = np.linalg.eig(A)[1]
U
Out[223]:
array([[ 0.70710678, -0.70710678],
       [ 0.70710678,  0.70710678]])
In [224]:
lambda_1, lambda_2 = np.linalg.eig(A)[0]
lambda_1, lambda_2
Out[224]:
(5.0, -1.0000000000000009)
In [225]:
x_hat = np.dot(np.transpose(U), x)
x_hat
Out[225]:
array([4.94974747, 0.70710678])
In [226]:
np.dot(np.dot(np.transpose(x), A), x)
Out[226]:
122

$ \displaystyle\sum_{i=1}^{n} \lambda_i\hat{x}_i^2 $

In [227]:
lambda_1 * x_hat[0] ** 2 + lambda_2 * x_hat[1] ** 2
Out[227]:
122.0

To see that

$ max_{x\in{\mathbb{R}^n}} \space x^TAx \space subject \space to \space \|x\| = 1 $

is the largest eigenvalue, let's plot $x_1$ on the X axis and $x^TAx$ on the Y axis where $x_1$ is the first component of x and the second component is computed such that the norm of x is 1.

In [228]:
x2 = lambda x1: np.sqrt(1 - x1 ** 2)
In [229]:
x1_vals = np.linspace(-1,1,1000)
y_vals = []
for x1 in x1_vals:
    x = np.array([x1, x2(x1)])
    y_vals.append(np.dot(np.dot(np.transpose(x), A), x))
plt.plot(x1_vals, y_vals)
plt.show()
In [230]:
np.linalg.eig(A)
Out[230]:
(array([ 5., -1.]), array([[ 0.70710678, -0.70710678],
        [ 0.70710678,  0.70710678]]))
In [231]:
np.max(y_vals)
Out[231]:
4.999995662961227
In [232]:
x1_vals[np.argmax(y_vals)], x2(x1_vals[np.argmax(y_vals)])
Out[232]:
(0.7077077077077076, 0.706505343540374)
In [233]:
np.min(y_vals)
Out[233]:
-0.9999956629612274
In [234]:
x1_vals[np.argmin(y_vals)], x2(x1_vals[np.argmin(y_vals)])
Out[234]:
(-0.7077077077077076, 0.706505343540374)

4 Matrix Calculus

4.1 The Gradient

Let's write out an example using an example function of a vector:

$ x\in{\mathbb{R}^2}\\ f(x) = x_1^2 + x_2^2 + x_1x_2\\ \nabla_xf(x) = \begin{bmatrix} \frac{\partial{f(x)}}{\partial{x_1}} \\ \frac{\partial{f(x)}}{\partial{x_2}} \end{bmatrix} = \begin{bmatrix} 2x_1 + x_2\\ 2x_2 + x_1 \end{bmatrix} $

So if $ x = \begin{bmatrix} 3 \\ 4 \end{bmatrix} , \nabla_xf(x) = \begin{bmatrix} 10 \\ 11 \end{bmatrix} $

In [235]:
f = lambda x1, x2: x1**2 + x2**2 + x1*x2
In [236]:
epsilon = 0.01
(f(3+epsilon, 4) - f(3, 4)) / epsilon, (f(3, 4+epsilon) - f(3, 4)) / epsilon
Out[236]:
(10.009999999999764, 11.009999999999565)
In [237]:
x1, x2 = np.meshgrid(np.linspace(-5,5,10), np.linspace(-5,5,10))
ax = plt.axes(projection='3d')
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_zlabel('f')
ax.plot_surface(x1, x2, f(x1, x2), color='blue', alpha=0.5)
ax.scatter([3], [4], [f(3,4)], color='black')
ax.plot([3-2, 3+2], [4, 4], [f(3,4)-2*6, f(3,4)+2*6], color='red')
ax.plot([3, 3], [4-2, 4+2], [f(3,4)-2*8, f(3,4)+2*8], color='green')
plt.show()

4.2 The Hessian

Let's write an example using the same function as above

$ x\in{\mathbb{R}^2}\\ f(x) = x_1^2 + x_2^2 + x_1x_2\\ \nabla_xf(x) = \begin{bmatrix} \frac{\partial{f(x)}}{\partial{x_1}} \\ \frac{\partial{f(x)}}{\partial{x_2}} \end{bmatrix} = \begin{bmatrix} 2x_1 + x_2\\ 2x_2 + x_1 \end{bmatrix}\\ \nabla_x^2f(x) = \begin{bmatrix} \frac{\partial^2{f(x)}}{\partial{x_1}\partial{x_1}} \frac{\partial^2{f(x)}}{\partial{x_1}\partial{x_2}} \\ \frac{\partial^2{f(x)}}{\partial{x_2}\partial{x_1}} \frac{\partial^2{f(x)}}{\partial{x_2}\partial{x_2}} \end{bmatrix} = \begin{bmatrix} 2 & 1\\ 1 & 2 \end{bmatrix} $

4.3 Gradients and Hessians of Quadratic and Linear Functions

Let's try an example of the quadratic function $f(x) = x^TAx$ for $A\in{\mathbb{S}^2}$

$\nabla_xX^TAx = 2Ax$

In [238]:
A = np.array([[2,3],[3,2]])
A
Out[238]:
array([[2, 3],
       [3, 2]])
In [239]:
f = lambda x: np.dot(np.dot(x, A), x)
In [240]:
x = np.array([3,4])
In [241]:
gradient = 2 * np.dot(A, x)
gradient
Out[241]:
array([36, 34])
In [242]:
epsilon = 0.01
(f(x + np.array([epsilon, 0])) - f(x)) / epsilon, (f(x + np.array([0, epsilon])) - f(x)) / epsilon
Out[242]:
(36.019999999999186, 34.019999999999584)
In [243]:
x1, x2 = np.meshgrid(np.linspace(-5, 5, 10), np.linspace(-5, 5, 10))
ax = plt.axes(projection='3d')
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_zlabel('f')
ax.plot_surface(x1, x2, np.apply_along_axis(f, 0, np.stack([x1, x2])), color='blue', alpha=0.5)
ax.scatter(*x, [f(x)], color='black')
ax.plot([x[0]-2, x[0]+2], [x[1], x[1]], [f(x)-2*gradient[0], f(x)+2*gradient[0]], color='red')
ax.plot([x[0], x[0]], [x[1]-2, x[1]+2], [f(x)-2*gradient[1], f(x)+2*gradient[1]], color='green')
plt.show()

4.4 Least Squares

Using the second projection example we did above in 3.9, let's calculate the x that minimizes $\|Ax - b\|_2^2$ with $(A^TA)^{-1}A^Tb$ just as we did originally, and also estimate x by searching.

In [244]:
A = np.array([[1 , 0], [0, 1], [0, 0.5]])
A
Out[244]:
array([[1. , 0. ],
       [0. , 1. ],
       [0. , 0.5]])
In [245]:
b = np.array([1,1,1])
b
Out[245]:
array([1, 1, 1])
In [246]:
x = np.dot(np.dot(np.linalg.inv(np.dot(A.T, A)), A.T), b)
x
Out[246]:
array([1. , 1.2])
In [247]:
np.dot(A, x)
Out[247]:
array([1. , 1.2, 0.6])
In [248]:
xs = np.array(np.meshgrid(np.linspace(-2, 2, 100), np.linspace(-2, 2, 100))).T.reshape(-1, 2)
distances = np.apply_along_axis(lambda x: np.linalg.norm(np.dot(A, x) - b), 1, xs)
x = xs[np.argmin(distances)]
x
Out[248]:
array([0.98989899, 1.19191919])
In [249]:
np.dot(A, x)
Out[249]:
array([0.98989899, 1.19191919, 0.5959596 ])

Compute the gradient of $\|Ax - b\|$ showing one interim step not shown in the review:

$ \nabla_x(x^TA^Tx - 2b^TAx + b^Tb = \\ \nabla_xx^TA^TAx - \nabla_x2b^TAx + \nabla_xb^Tb = \\ \nabla_xx^T(A^TA)x - 2\nabla_x(A^Tb)^Tx + \nabla_xb^Tb = \\ 2A^TAx - 2A^Tb + 0 = \\ 2A^TAx - 2A^Tb $

Let's try an example where we're trying to fit a line to 5 points.

In [250]:
A = np.random.rand(5,1)
A
Out[250]:
array([[0.2108527 ],
       [0.82189899],
       [0.37643178],
       [0.82373638],
       [0.98731394]])
In [251]:
# b is 2A with random noise added in
b = A * 2 + (np.random.rand(5,1) - 0.5)/2
b
Out[251]:
array([[0.67017475],
       [1.83202091],
       [0.99319781],
       [1.80996379],
       [2.15461524]])
In [252]:
x = np.dot(np.dot(np.linalg.inv(np.dot(A.T, A)), A.T), b)
x
Out[252]:
array([[2.24219163]])
In [253]:
plt.scatter(A, b)
plt.plot([0, 1], [0, x[0]], color='green')
plt.show()

4.5 Gradients of the Determinant

Let's go through an example showing $\nabla_A|A| = |A|A^{-T}$ when $A\in{\mathbb{R}^{n\ \times\ n}}$

In [254]:
A = np.random.rand(2,2)
A
Out[254]:
array([[0.72642508, 0.9872509 ],
       [0.81603603, 0.52725954]])
In [255]:
f = lambda A: np.linalg.det(A) # for convenience
In [256]:
epsilon = 0.0001
np.array(
    [
        [
            f(A + np.array([[epsilon, 0], [0, 0]])) - f(A),
            f(A + np.array([[0, epsilon], [0, 0]])) - f(A),
        ],
        [
            f(A + np.array([[0, 0], [epsilon, 0]])) - f(A),
            f(A + np.array([[0, 0], [0, epsilon]])) - f(A),
        ]
    ]) / epsilon
Out[256]:
array([[ 0.52725954, -0.81603603],
       [-0.9872509 ,  0.72642508]])
In [257]:
np.dot(f(A), np.linalg.inv(A).T)
Out[257]:
array([[ 0.52725954, -0.81603603],
       [-0.9872509 ,  0.72642508]])

Now let's go through an example showing $\nabla_Alog|A| = A^{-1}$ when $A\in{\mathbb{S}^n_{++}}$

In [258]:
A = np.array([[3,2],[2,3]])
A
Out[258]:
array([[3, 2],
       [2, 3]])
In [259]:
f = lambda A: np.log(np.linalg.det(A))
In [260]:
epsilon = 0.0001
np.array(
    [
        [
            f(A + np.array([[epsilon, 0], [0, 0]])) - f(A),
            f(A + np.array([[0, epsilon], [0, 0]])) - f(A),
        ],
        [
            f(A + np.array([[0, 0], [epsilon, 0]])) - f(A),
            f(A + np.array([[0, 0], [0, epsilon]])) - f(A),
        ]
    ]) / epsilon
Out[260]:
array([[ 0.599982, -0.400008],
       [-0.400008,  0.599982]])
In [261]:
np.linalg.inv(A)
Out[261]:
array([[ 0.6, -0.4],
       [-0.4,  0.6]])