Linear Algebra for Machine Learning-Part 2

Aditya Raj
8 min readAug 30, 2021

The introductory and some general concepts of linear algebra needed for machine learning were introduced in https://aditya007.medium.com/linear-algebra-for-machine-learning-part-1-8fb1ee259547.

In this final part,we will understand eigen values,eigen vectors,PCA,SVD and geometrical projection concept.

(The ‘ symbol will be used to show T which stands for transpose)

Projection of vector

Mathematically,projection of vector a on vector b means the part of vector a projected in direction of vector b.This is very intuitive and easy to visualize.

Projection of a on b or proj(a,b) can be calculated as:- a.b/(|a||b|)

Projection Matrix

A projection matrix is a matrix which transforms vector from one dimension to other.

Well this is very overlook definition,to understand it in depth,we need to know to about basis,orthogonal projection and stuffs which are easy and required very much in linear algebra but not understanding it will not make much difference in course of Machine Learning here or anywhere.

Taking an example :-

This matrix P transforms any vector into y=x.[multiply matrix P with vector [x y]’ ,

Geometrically to the projection of any matrix A in matrix B can be given as A.Proj(B) where Proj(B) gives projection matrix for B.

So Not going much in depth in this topic,projection matrix for any matrix A can be written as :- P=A(A’A)^−1A’.

For more indepth discussion,you can ping me on linkedin or whatsapp,I will cover the topic in video lecture.

Eigen Vectors and Eigen Values

Eigen vector of a matrix A is a vector represented by a vector X such that when X is multiplied with matrix A, then the direction of the resultant matrix remains same as vector X..

It means that matrix obtained by product of matrix A and vector X,i.e matrix AX is just a scaled form of vector X.So,AX can be represented as some λX.

AX = λX, and this λ is called as eigen value for that eigen vector. i.e matrix AX is in same direction as X with it’s value/magnitude scaled by λ or it’s eigen values.

Lets understand it more simply,The matrix A is multiplied by a vector X to produce a new transform vector AX.(dim(A) = mxn,dim(X) = nx1,so dim(AX=mx1,hence AX a vector)

When a matrix is multiplied by a vector,there are two possibilities:-

  1. The new transformed vector(product of matrix and vector) is just a scaled form of the original vector.i.e AX = λX.
  2. the transformed vector has no direct scalar relationship with the original vector which we used to multiply to the matrix.

If the new transformed vector is just a scaled form of the original vector then the original vector is known to be an eigenvector of the original matrix. Vectors that have this characteristic are special vectors and they are known as eigenvectors. Eigenvectors can be used to represent a large dimensional matrix.

The value by which newly transformed vector is scaled from original vector is called eigen value and large multi-dimensional matrix form of data can be represented by eigen values as features with the importance of feature being eigen value.

We will understand more of data and features and importance of these ahead.

Finding eigen values and eigen vectors:-

We use the general definition (AX=λX) to find eigen values and eigen vectors.

A.v = λ.v => (A-λI).v = 0, to calculate eigen values,we do |A-λI| = 0.

so we solve determinant of |A-λI| = 0 to calculate all possible eigen values for that matrix.

The no. of unique λ’s obtained represent the no. of eigen vectors vi’s for that matrix with them being scaled by λi’s.

Determinant

Determinant is a very important concept of core linear algebra but we can understand determinant as a function which maps every square matrix with a unique no. used to solve many mathematical equations and matrix systems

For a 1×1 Matrix

Let A = [a] be the matrix of order 1, then determinant of A is defined to be equal to a.

For a 2×2 Matrix

For a 2×2 matrix (2 rows and 2 columns):

determinant of A = ab - cd

For a 3×3 Matrix

For a 3×3 matrix (3 rows and 3 columns):

The determinant is: |A| = a (ei − fh) − b (di − fg) + c (dh − eg).

For higher dimension matrices

The pattern continues for higher order matrices with for 4x4 being:-

As a formula:

Notice the +−+− pattern (+a… −b… +c… −d…).

Finding Eigen vectors

The eigen values of matrix calculation was discussed as det|A-λI| = 0 giving all possible unique values of λ’s.

After getting eigen value λ,the vector X can be calculated by solving:-

(A-λI)X = 0

Principal Component Analysis

By definition:-

Principal Component Analysis, or PCA, is a dimensionality-reduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set.

Well this definition may not be for our purpose currently,so for followers of my lecture:-

PCA can be understood as a method of finding most important principal component vectors of matrix or feature vector of large data represented as matrix(both are same things).

Lets break it,finding most important feature of matrix……….,how to do that?

Well it’s simple,find all the eigen vectors and eigen values of square matrix obtained,and the eigen vectors are the feature or principal components with their importance value reflected by respective eigen values.so,most important vector is eigen vector with highest eigen value and so on…

But

how the hell will I get square matrix every time?

Obviously,you will not get square matrix each time,so as standard method,All matrix are first multiplied with their transpose to form square matrix and then all methods are applied to get principal component vectors.

So let’s analyze steps for PCA :-

  1. Let’s have a data of m products having n feature(n dimension) in form of matrix A of dim mxn,so A’(transpose of A) is matrix of dim nxm.
  2. we multiply A and A’ to get a matrix S = A’A,dim of S is nxn(nxm x mxn = nxn)
  3. This S is called covariance matrix,we do eigendecomposition of S.
  4. Eigen decomposistion is a very simple step,for this nxn matrix,there exist n eigen values and n eigen vectors(each having length n i.e equal to length of column vector).
  5. we sort all eigen vectors as per there eigen values in decreasing order and make a set of them i.e matrix of dim nxn( n eigen vector each having length n).
  6. Now,suppose if we want to reduce the dim of data from n to k,so we take only k eigen vectors forming nxk matrix.
  7. To reduce the feature dim,we multiply A(dim mxn) with this newly formed nxk matrix,giving a new matrix of dim mxk(mxn x nxk).
  8. Now,we have matrix having m products with there top k features which is < n.
  9. Remember we sorted eigen vectors as per there eigen values,that is greater the eigen value,more important the feature,so top k eigen vector(having top k eigen values) -> top k feature vector to form.
  10. Step 3 to 5 is called eigendecomposition and is important concept in PCA,we will also see it’s use in SVD next.

This is how we have PCA for dimensional reduction in real life machine learning/data science.We learn and apply this ahead in part of this course.

Singular value Decomposition

We decomposed a square matrix in terms of it’s eigen values in PCA.This decomposition of square matrix in form of it’s eigen vectors is called eigendecomposition.

The problem with eigendecomposition is that it can be done only for square matrices,so for factorization or decomposition of non symmetric or non-square matrices,we do singular value decomposition.

It is very important and applicable concept having huge use in machine learning,recommendation system,data computation etc.. .

Let’s understand it in mathematical manner:-

It is the decomposition of a rectangular matrix into product of two orthogonal square and a rectangular diagonal matrix.

here U and V are orthogonal matrix which means:- U’U = I and V’V = I.

So let’s understand how it is done:-

Taking matrix A as

  1. Taking a square matrix AA’ (dim = mxm),it’s eigen decomposition is done,i.e all n eigen values are taken and represented as nxn matrix(n eigen vectors each of length n as matrix dim is nxn),this matrix will be called U.

2. Again same step will be done with square matrix A’A(dim nxn),to again eigen decompose it to a matrix of dim nxn called as V.

3. The middlemost matrix is a diagonal matrix of same dimension as of A with diagonal components being square root of eigen values of AA’ or A’A(both will have same eigen values).

The SVD is used for many purpose in real life like sentiment analysis,entity recognition.

Thanks for reading.Please share and support.

In case of any doubt contact me linkedin or whatsapp (+918292098293).

Follow me on twitter :- https://twitter.com/AdityaR71244890?s=08 and linkedin :- https://www.linkedin.com/in/aditya-raj-553322197

--

--