c - Strange performance issue with AMD's ACML BLAS/LAPACK library -
i asked question on @ amd developers forum few days ago, haven't gotten answer. maybe here has insight.
http://devgurus.amd.com/thread/167492
i running acml version 5.3.1, libacml_mp gfortran_fma4 on opteron 6348 processors on ubuntu 12.04.
what happens performance of call dsyev (eigen decomposition) slows down dramatically (by factor of 10+) if first make call dpotrf (cholesky decomposition). makes no sense @ me why happen. maybe there kind of cache need clear or that.
here simple c program reproduces problem.
#include <stdio.h> #include <stdlib.h> #include <acml.h> #include <time.h> int main(void) { double * x = malloc(1000000 * sizeof(double)); double * y = malloc(1000000 * sizeof(double)); double * eig0 = malloc(1000000 * sizeof(double)); double * eig1 = malloc(1000000 * sizeof(double)); double * eigw = malloc(1000 * sizeof(double)); double * chol = malloc(1000000 * sizeof(double)); clock_t t0,t1; int info; int i; // generate random matrix for(i = 0; i<1000000; ++i){ x[i] = rand() / (double) rand_max; } // compute y = xx^t y symmetric positive definite dgemm('n','t',1000,1000,1000,1,x,1000,x,1000,0,y,1000); // make copy of y cholesky , eigen decompositions for(i = 0; i<1000000; ++i){ chol[i] = y[i]; eig0[i] = y[i]; eig1[i] = y[i]; } // first eigenvalue test t0 = clock(); dsyev('v','u',1000,eig0,1000,eigw,&info); t1 = clock(); printf("eigen decomposition time: %d\n", (t1-t0)/1000); // cholesky dpotrf('u',1000,chol,1000,&info); // second eigenvalue test, after cholesky t0 = clock(); dsyev('v','u',1000,eig1,1000,eigw,&info); t1 = clock(); printf("eigen decomposition time: %d\n", (t1-t0)/1000); }
here output:
eigen decomposition time: 8120 eigen decomposition time: 95140
if comment out dpotrf line, works fine:
eigen decomposition time: 8150 eigen decomposition time: 8210
Comments
Post a Comment