Sunday, February 03, 2008, 10:17 AM
===================================
Using northwood_1.02

500x500 mm - normal algorithm                       1.191 secs.
500x500 mm - temporary variable in loop             1.497 secs.
500x500 mm - unrolled inner loop, factor of 4       1.183 secs.
500x500 mm - unrolled inner loop, factor of 8       1.190 secs.
500x500 mm - unrolled inner loop, factor of 16      1.199 secs.
500x500 mm - pointers used to access matrices       1.464 secs.
500x500 mm - pointers used, unrolled by 4           1.178 secs.
500x500 mm - transposed B matrix                    0.828 secs.
500x500 mm - interchanged inner loops               1.151 secs.
500x500 mm - blocking, step size of 20              1.326 secs.
500x500 mm - Robert's algorithm                     0.366 secs.
500x500 mm - T. Maeno's algorithm, subarray 20x20   0.392 secs.
500x500 mm - Generic Maeno, subarray 20x20          0.665 secs.
500x500 mm - D. Warner's algorithm, subarray 20x20  0.841 secs.
========================================================= =====
Total using no extensions and using no hackery     14.471 secs. ok

120x120 mm - normal algorithm                       0.343 secs.
120x120 mm - temporary variable in loop             0.715 secs.
120x120 mm - unrolled inner loop, factor of 4       0.415 secs.
120x120 mm - unrolled inner loop, factor of 8       0.364 secs.
120x120 mm - unrolled inner loop, factor of 16      0.437 secs.
120x120 mm - pointers used to access matrices       0.588 secs.
120x120 mm - pointers used, unrolled by 4           0.414 secs.
120x120 mm - transposed B matrix                    1.065 secs.
120x120 mm - interchanged inner loops               1.544 secs.
120x120 mm - blocking, step size of 20              1.764 secs.
120x120 mm - Robert's algorithm                     0.294 secs.
120x120 mm - T. Maeno's algorithm, subarray 20x20   0.490 secs.
120x120 mm - Generic Maeno, subarray 20x20          0.869 secs.
120x120 mm - D. Warner's algorithm, subarray 20x20  1.100 secs.
========================================================= =====
Total using no extensions and using no hackery     10.402 secs. ok

60x60 mm - normal algorithm                         0.525 secs.
60x60 mm - temporary variable in loop               0.853 secs.
60x60 mm - unrolled inner loop, factor of 4         0.537 secs.
60x60 mm - unrolled inner loop, factor of 8         0.632 secs.
60x60 mm - unrolled inner loop, factor of 16        0.742 secs.
60x60 mm - pointers used to access matrices         0.707 secs.
60x60 mm - pointers used, unrolled by 4             0.486 secs.
60x60 mm - transposed B matrix                      1.382 secs.
60x60 mm - interchanged inner loops                 1.972 secs.
60x60 mm - blocking, step size of 20                2.226 secs.
60x60 mm - Robert's algorithm                       0.571 secs.
60x60 mm - T. Maeno's algorithm, subarray 20x20     0.621 secs.
60x60 mm - Generic Maeno, subarray 20x20            1.108 secs.
60x60 mm - D. Warner's algorithm, subarray 20x20    1.385 secs.
========================================================= =====
Total using no extensions and using no hackery     13.747 secs. ok




Sunday, February 03, 2008, 22:14 PM
===================================
Using northwood_1.23

500x500 mm - normal algorithm                       1.193 secs.
500x500 mm - temporary variable in loop             1.626 secs.
500x500 mm - unrolled inner loop, factor of 4       1.271 secs.
500x500 mm - unrolled inner loop, factor of 8       1.245 secs.
500x500 mm - unrolled inner loop, factor of 16      1.227 secs.
500x500 mm - pointers used to access matrices       1.492 secs.
500x500 mm - pointers used, unrolled by 4           1.218 secs.
500x500 mm - transposed B matrix                    0.826 secs.
500x500 mm - interchanged inner loops               1.153 secs.
500x500 mm - blocking, step size of 20              1.324 secs.
500x500 mm - Robert's algorithm                     0.342 secs.
500x500 mm - T. Maeno's algorithm, subarray 20x20   0.397 secs.
500x500 mm - Generic Maeno, subarray 20x20          0.665 secs.
500x500 mm - D. Warner's algorithm, subarray 20x20  0.840 secs.
========================================================= =====
Total using no extensions and using no hackery     14.819 secs.

120x120 mm - normal algorithm                       0.340 secs.
120x120 mm - temporary variable in loop             0.709 secs.
120x120 mm - unrolled inner loop, factor of 4       0.418 secs.
120x120 mm - unrolled inner loop, factor of 8       0.367 secs.
120x120 mm - unrolled inner loop, factor of 16      0.438 secs.
120x120 mm - pointers used to access matrices       0.587 secs.
120x120 mm - pointers used, unrolled by 4           0.411 secs.
120x120 mm - transposed B matrix                    1.068 secs.
120x120 mm - interchanged inner loops               1.544 secs.
120x120 mm - blocking, step size of 20              1.762 secs.
120x120 mm - Robert's algorithm                     0.239 secs.
120x120 mm - T. Maeno's algorithm, subarray 20x20   0.490 secs.
120x120 mm - Generic Maeno, subarray 20x20          0.869 secs.
120x120 mm - D. Warner's algorithm, subarray 20x20  1.099 secs.
========================================================= =====
Total using no extensions and using no hackery     10.341 secs.

60x60 mm - normal algorithm                         0.539 secs.
60x60 mm - temporary variable in loop               0.849 secs.
60x60 mm - unrolled inner loop, factor of 4         0.533 secs.
60x60 mm - unrolled inner loop, factor of 8         0.628 secs.
60x60 mm - unrolled inner loop, factor of 16        0.733 secs.
60x60 mm - pointers used to access matrices         0.713 secs.
60x60 mm - pointers used, unrolled by 4             0.484 secs.
60x60 mm - transposed B matrix                      1.381 secs.
60x60 mm - interchanged inner loops                 1.968 secs.
60x60 mm - blocking, step size of 20                2.215 secs.
60x60 mm - Robert's algorithm                       0.482 secs.
60x60 mm - T. Maeno's algorithm, subarray 20x20     0.625 secs.
60x60 mm - Generic Maeno, subarray 20x20            1.110 secs.
60x60 mm - D. Warner's algorithm, subarray 20x20    1.387 secs.
========================================================= =====
Total using no extensions and using no hackery     13.647 secs.
