Matrix Multiplication Optimization in C++
October 20, 2007 3 Comments
This is an example of dynamic programming. This requires a basic understanding of linear algebra and includes a program written in C++.
Problem
When multiplying matrices together, the dimensions of the matrices to be multiplied must be compatible. The columns of the first matrix must be equal to the rows in the second matrix. For example, consider the following compatible chain of matrices. Let A_x denote a matrix and the letters inside the brackets represent the ROWS and COLUMNS of that matrix respectively.
A_1[a][b] * A_2[b][c] * A_3[c][d]… *A_size[y][z]
When multiplying two matrices together, the resultant matrix has the ROWS dimension of the first matrix, and the COLUMNS dimension of the second matrix. For example
A_1[a][b] *A_2[b][c] = A_result[a][c]
So extending this idea, it is the case that
A_1[a][b] * A_2[b][c] * A_3[c][d]… *A_n[y][z] = A_result[a][z]
Matrix multiplication is not commutative. This is intuitive – since the columns in A_n-1Â must equal the rows in A_n, you can’t just switch the order around.
A_1 * A_2 * A_3 is probably NOT equal to A_1 * A_3 * A_2
However, Matrix multiplication is associative. Associativity signifies that the order the multiplication is done can change.
((A_1 * A_2) * A_3) == (A_1 * ( A_2 * A_3))
When multiplying a chain of matrices together, one way is generally more efficient than others. The cost (number of operations that must be done) of multiplying two matrices in the chain above can be calculated as follows:
cost(A_n-1[a][b] * A_n[b][c]) == a * b * c
Example:
Say we have the following chain of matrices to multiply together.
A_1[3][5] * A_2[5][100] * A_3[100][2]
First let’s group the array as follows:
((A_1[3][5] * A_2[5][100] ) * A_3[100][2])
multiplying A_1 with A_2 will cost 3 * 5 * 100 or 1,500, and the resultant is a 3 by 100 matrix We are then left with the following multiplication to do.
A_res1[3][100] * A_3[100][2]
This will cost 3 * 100 * 2 which is 600.
So the total cost of multiplying the matrices like this is 1,500 + 600 which is 2,100
Now let’s group the array as follows and compare it with the previous result.
(A_1[3][5] * (A_2[5][100] * A_3[100][2]))
This will cost (5*100*2) which is 1,000 + (3*5*2) which is 15. So the total cost of this multiplication is 1,015.
This is less than half as expensive as the first option. And this is not a contrived example – this is typical. Given a chain of matrices and their corresponding dimensions, it is possible to find the optimal way to multiply these matrices together.
Solution
The obvious way to find the optimal order of multiplying a chain of matrices together would be to simply calculate the cost of each possible solution and then compare these costs (brute force). This would be easy enough if there are only small chains. However, it turns out that this would require around 2n computations where n is the number of matrices, which is not practical for large n.
The reason the cost is so high is that there are many computations that are re-done. An improved solution would be to create a table which keeps track of all the sub-optimal solutions, and uses these to calculate the final cost. Using a table, the cost can be reduced to n3, which is substantially better. This is the method which is used in the below program.
The below program inputs the dimensions of a number of arrays. For example, if there is the following chain of arrays:
A_1[3][5] * A_2[5][100] * A_3[100][2]
The dimensions can be input as:
5 x 100
The program will then calculate the best way to multiply the matrix in n3 time. In the example above, it would output,
(A_1 * (A_2 * A_3))
Also, the program will output the tables that are used to calculate this optimal cost. Because of the size of the screen, the output will only be neat for small chains of matrices. Here’s the source.
/************************************************************************************************ * Matrix multiplication optimizer * by Richard Lundeen - webstersProdigy.net * * This program optimizes the order in which a chain of matrices are multiplied together in * O(n^3). It prints out tables for documentation purposes - although these are not necessary * to understand the final grouping * **************************************************************************************************/ #include <iostream> using namespace std; void printdobArray(int** array, int row, int col); void printOptimalParens(int** kArray, int i, int j); int main() { int numMat; cout<<"**********************************************************************"<<endl <<"Welcome to the matrix optimization program. This program calculates"<<endl <<"the optimal way to multiply a series of matrices together"<<endl <<"*********************************************************************"<<endl; cout<<"Please enter the Number of Matricies: "; cin>>numMat; int *p; int psize = numMat + 1; p = new int[psize]; cout<<"------------------------------------------------------------------------"<<endl <<"Now please enter dimensions. \nFor example, if you were multiplying three matricies "<<endl <<"with the respective dimensions 3x1 1x1000 and 1000x13, you would enter '3', '1', '1000', '13'."<<endl <<"------------------------------------------------------------------------"<<endl<<endl; for(int i = 0; i < psize; i++) { cout<<"dimension "<<i<<" : "; cin>>p[i]; } //an array which holds the cost table int **cost; cost = new int*[psize]; // sets up an array of row pointers for(int i =0; i < psize; i++) { cost[i] = new int[psize]; // allocates each row of the grades array } //an array which holds the splits int ** kArray; kArray = new int*[psize]; for(int i=0; i<(psize); i++) { kArray[i]=new int[psize]; } for(int length = 1; length < numMat; length++) { for(int i = 1; i<numMat-length + 1; i++) { int j = i+ length; int tempCost; int k = i; int minCost = cost[i][k] + cost[k+1][j] + p[i-1]*p[k]*p[j]; int kay = i; for(; k<j; k++) { tempCost = cost[i][k] + cost[k+1][j] + p[i-1]*p[k]*p[j]; if(tempCost<minCost) { minCost = tempCost; kay = k; } } cost[i][j] = minCost; kArray[i][j] = kay; } } cout<<"\nTotal Cost Table\n************************************"<<endl; printdobArray(cost, psize, psize); cout<<"Corresponding K-Table\n***********************************"<<endl; printdobArray(kArray,psize,psize); cout<<endl<<endl; cout<<"This corresponds to the following multiplication chain"<<endl; printOptimalParens(kArray, 1, numMat); cout<<endl; return 0; } void printdobArray(int** array, int row, int col) { for(int i = 1; i<row; i++) { for(int j=1; j<col; j++) cout<<array[i][j]<<"\t|\t"; cout<<endl; } } void printOptimalParens(int** kArray, int i, int j) { if(i==j) { cout<<"A"<<i<<" "; } else { cout<<"("; printOptimalParens(kArray,i, kArray[i][j]); printOptimalParens(kArray, kArray[i][j] + 1, j); cout<<")"; } }
This program was compiled using gcc.
Links to Additional Information
Wikipedia-Dynamic
Programming
Wikipedia-Matrix
Chain Multiplication
McGraw-Hill
(check out the Pseudo code for matrix chain
multiplication)
http://www.ics.uci.edu/~dan/class/161/notes/6/Dynamic.html
I want to know why the program (matrixChainOptimization.cpp) output have long number and nagative number?
because he used int instead of long long or unsigned value.
I want a code that display all the possible grouping of the Matrix, give any matrix