Matrix Multiplication Optimization in C++

This is an example of dynamic programming. This requires a basic understanding of linear algebra and includes a program written in C++.

Problem

When multiplying matrices together, the dimensions of the matrices to be multiplied must be compatible. The columns of the first matrix must be equal to the rows in the second matrix. For example, consider the following compatible chain of matrices. Let A_x denote a matrix and the letters inside the brackets represent the ROWS and COLUMNS of that matrix respectively.

A_1[a][b] * A_2[b][c] * A_3[c][d]… *A_size[y][z]

When multiplying two matrices together, the resultant matrix has the ROWS dimension of the first matrix, and the COLUMNS dimension of the second matrix. For example

A_1[a][b] *A_2[b][c] = A_result[a][c]

So extending this idea, it is the case that

A_1[a][b] * A_2[b][c] * A_3[c][d]… *A_n[y][z] = A_result[a][z]

Matrix multiplication is not commutative. This is intuitive – since the columns in A_n-1Â must equal the rows in A_n, you can’t just switch the order around.

A_1 * A_2 * A_3 is probably NOT equal to A_1 * A_3 * A_2

However, Matrix multiplication is associative. Associativity signifies that the order the multiplication is done can change.

((A_1 * A_2) * A_3) == (A_1 * ( A_2 * A_3))

When multiplying a chain of matrices together, one way is generally more efficient than others. The cost (number of operations that must be done) of multiplying two matrices in the chain above can be calculated as follows:

cost(A_n-1[a][b] * A_n[b][c]) == a * b * c

Example:

Say we have the following chain of matrices to multiply together.

A_1[3][5] * A_2[5][100] * A_3[100][2]

First let’s group the array as follows:

((A_1[3][5] * A_2[5][100] ) * A_3[100][2])

multiplying A_1 with A_2 will cost 3 * 5 * 100 or 1,500, and the resultant is a 3 by 100 matrix We are then left with the following multiplication to do.

A_res1[3][100] * A_3[100][2]

This will cost 3 * 100 * 2 which is 600.

So the total cost of multiplying the matrices like this is 1,500 + 600 which is 2,100

Now let’s group the array as follows and compare it with the previous result.

(A_1[3][5] * (A_2[5][100] * A_3[100][2]))

This will cost (5*100*2) which is 1,000 + (3*5*2) which is 15. So the total cost of this multiplication is 1,015.

This is less than half as expensive as the first option. And this is not a contrived example – this is typical. Given a chain of matrices and their corresponding dimensions, it is possible to find the optimal way to multiply these matrices together.

Solution

The obvious way to find the optimal order of multiplying a chain of matrices together would be to simply calculate the cost of each possible solution and then compare these costs (brute force). This would be easy enough if there are only small chains. However, it turns out that this would require around 2n computations where n is the number of matrices, which is not practical for large n.

The reason the cost is so high is that there are many computations that are re-done. An improved solution would be to create a table which keeps track of all the sub-optimal solutions, and uses these to calculate the final cost. Using a table, the cost can be reduced to n3, which is substantially better. This is the method which is used in the below program.

The below program inputs the dimensions of a number of arrays. For example, if there is the following chain of arrays:

A_1[3][5] * A_2[5][100] * A_3[100][2]

The dimensions can be input as:

5 x 100

The program will then calculate the best way to multiply the matrix in n3 time. In the example above, it would output,

(A_1 * (A_2 * A_3))

Also, the program will output the tables that are used to calculate this optimal cost. Because of the size of the screen, the output will only be neat for small chains of matrices. Here’s the source.

/************************************************************************************************
*  Matrix multiplication optimizer
*  by Richard Lundeen - webstersProdigy.net
*
*  This program optimizes the order in which a chain of matrices are multiplied together in
*  O(n^3).  It prints out tables for documentation purposes - although these are not necessary
*  to understand the final grouping
*
**************************************************************************************************/

#include <iostream>

using namespace std;

void printdobArray(int** array, int row, int col);
void printOptimalParens(int** kArray, int i, int j);

int main()
{
   int numMat;
   cout<<"**********************************************************************"<<endl
       <<"Welcome to the matrix optimization program.  This program calculates"<<endl
       <<"the optimal way to multiply a series of matrices together"<<endl
       <<"*********************************************************************"<<endl;
   cout<<"Please enter the Number of Matricies: ";
   cin>>numMat;

   int *p;
   int psize = numMat + 1;
   p = new int[psize];

   cout<<"------------------------------------------------------------------------"<<endl
       <<"Now please enter dimensions.  \nFor example, if you were multiplying three matricies "<<endl
       <<"with the respective dimensions 3x1  1x1000 and 1000x13, you would enter '3', '1', '1000', '13'."<<endl
       <<"------------------------------------------------------------------------"<<endl<<endl;
   for(int i = 0; i < psize; i++)
   {
      cout<<"dimension "<<i<<" : ";
      cin>>p[i];
   }

   //an array which holds the cost table
   int **cost;
   cost = new int*[psize];  // sets up an array of row pointers
   for(int i =0; i < psize; i++)
   {
       cost[i] = new int[psize]; // allocates each row of the grades array
   }

   //an array which holds the splits
   int ** kArray;
   kArray = new int*[psize];
   for(int i=0; i<(psize); i++)
   {
      kArray[i]=new int[psize];
   }

   for(int length = 1; length < numMat; length++)
   {
      for(int i = 1; i<numMat-length + 1; i++)
      {

         int j = i+ length;
	 int tempCost;
	 int k = i;
	 int minCost = cost[i][k] + cost[k+1][j] + p[i-1]*p[k]*p[j];
	 int kay = i;
	 for(; k<j; k++)
	 {
	   tempCost = cost[i][k] + cost[k+1][j] + p[i-1]*p[k]*p[j];
	   if(tempCost<minCost)
	   {
	      minCost = tempCost;
	      kay = k;
	   }
	 }

	cost[i][j] = minCost;
	kArray[i][j] = kay;
      }
   }

   cout<<"\nTotal Cost Table\n************************************"<<endl;
   printdobArray(cost, psize, psize);

   cout<<"Corresponding K-Table\n***********************************"<<endl;
   printdobArray(kArray,psize,psize);

   cout<<endl<<endl;
   cout<<"This corresponds to the following multiplication chain"<<endl;
   printOptimalParens(kArray, 1, numMat);
   cout<<endl;
   return 0;
}

void printdobArray(int** array, int row, int col)
{
   for(int i = 1; i<row; i++)
   {
      for(int j=1; j<col; j++)
         cout<<array[i][j]<<"\t|\t";
      cout<<endl;
   }
}

void printOptimalParens(int** kArray, int i, int j)
{
   if(i==j)
    {
       cout<<"A"<<i<<" ";
    }
    else
    {
       cout<<"(";
       printOptimalParens(kArray,i, kArray[i][j]);
       printOptimalParens(kArray, kArray[i][j] + 1, j);
       cout<<")";
    }
}

This program was compiled using gcc.

Links to Additional Information

Wikipedia-Dynamic
Programming

Wikipedia-Matrix
Chain Multiplication

McGraw-Hill
(check out the Pseudo code for matrix chain
multiplication)

http://www.ics.uci.edu/~dan/class/161/notes/6/Dynamic.html

Longest Common Subsequence in C++

Simple program that efficiently finds the longest common subsequence of two sequences, written in C++ using dynamic programming. Unforunately, there are probably many bugs, but for what I needed it for it seems to work.

I should probably add that the reason this is complicated is because it is efficient and uses dynamic programming (sacrificing storage space for computation time). It’s not a stupid brute force like how it could be done with a regex and a couple loops.

Here is the Header:

#include <iostream>
#include <string>
using namespace std;
 
#ifndef LCS_FUNCTIONS_H
#define LCS_FUNCTIONS_H
 
enum direction{west=1, northWest=2, north=3};
 
void LCS_Length(string X, string Y, int** c, direction** b)
{
	int m = X.length() + 1;
	int n = Y.length() + 1;
	for(int i = 1; i&lt;m; i++)
		c[i][0] = 0;
	for(int j = 0; j&lt;n; j++)
		c[0][j] = 0;
 
	for(int i = 1; i&lt;m; i++)
	{
		for(int j = 1; j= c[i][j-1])
			{
				c[i][j] = c[i-1][j];
				b[i][j] = north;
			}
			else
			{
				c[i][j] = c[i][j-1];
				b[i][j] = west;
			}
		}
	}
}
 
void print_LCS(direction** b, string X,  int i, int j)
{
	if(i == 0 || j == 0)
		return;
 
	if(b[i][j] == northWest)
	{
		print_LCS(b, X, i-1, j-1);
		cout&lt;&lt;X[i-1]&lt;&lt;&quot; &quot;;
	}
	else if (b[i][j] == north)
		print_LCS(b, X, i-1, j);
	else
		print_LCS(b, X, i, j-1);
}
 
#endif

Here is the main file

#include "LCS_Functions.h"

int main()
{
	cout<<"This program inputs two strings of characters."
		<<"nIt then finds the longest common substring that the two strings have in common.n"
		<<"A substring si any sequence of characters which increase in order"<<endl<<endl;

	string X, Y;

	cout<<"Please Enter String X without spaces: "<>X;

	//clear input stream
	cin.clear();
	cin.ignore(256, 'n');

	cout<<"Please Enter String Y without spaces: "<>Y;
	cin.clear();
	cin.ignore(256, 'n');

	//declare the arrays int** c (for the string length), direction** b (to keep track of elements in string)

	int** c;
	direction** b;

	c = new int *[X.length() + 1];
	b = new direction *[X.length() + 1];

	for(int i=0; i<X.length() + 1; i++)
	{
		c[i] = new int[Y.length() + 1];
		b[i] = new direction[Y.length() + 1];
	}

	LCS_Length(X, Y, c, b);

	cout<<endl<<"Printing c:"<<endl;

	for(int i = 0; i < X.length()+1; i++)
	{
		for(int j = 0; j<Y.length()+1; j++)
			cout<<c[i][j]<<" ";
		cout<<endl;
	}

	cout<<endl<<"Printing b"<<endl;
	for(int i = 0; i < X.length()+1; i++)
	{
		for(int j = 0; j<Y.length()+1; j++)
			cout<<b[i][j]<<" ";
		cout<<endl;
	}
	cout<<"nThe longest common substring is the following:"<<endl;

	print_LCS(b, X,  X.length(), Y.length());

	cout<<"nngoodbye"<<endl;
	return 0;
}