# Matrix multiplication is faster than you expect (part I)

Suppose we’re running a particle physics/neurobiology simulation, or solving some machine learning/data mining problem, and we need to do some linear algebra. Specifically, we want to multiply two real $n \times n$ matrices $A$, $B$. To compute the $i,j$-th entry of the product we go along the $i$th row of $A$, multiplying entrywise with the $j$th column of $B$, so that for each entry of $AB$ we perform $n$ real multiplications (and $n-1$ real additions, but this makes no difference to the “average” number of operations needed, so we’ll be slack and gloss over such details until we formalize things in the next few sections). Since there are $n^2$ entries in the product, this gives a total of $n^3$ operations required to calculate the product, as one would intuitively expect. As a slightly unrealistic example, let us generously assume that our matrices are small, of dimension $n \approx 10^5$ (which incidentally is just outside the largest matrix size allowed by GNU Octave on my computer), then assuming the ability to perform $10^{10}$ operations a second on a “standard” PC processor, one matrix multiplication would take on the order of $10^5$ seconds, or slightly over a day. In real-world applications such as high-performance computing or big data problems the dimension gets much bigger, so naive multiplication is still too slow, even given the extra processing power we gain by switching to supercomputers. Continue reading