An n-by-n matrix A is symmetric and positive definite (SPD) if
A = A^T and x^T*A*x > 0 for all x != 0
[A_{11} A_{12} A_{13}] [L_{11} 0 0 ] [L_{11} L_{21} L_{31}] [A_{21} A_{22} A_{23}] = [L_{21} L_{22} 0 ] * [ 0 L_{22} L_{32}] [A_{31} A_{32} A_{33}] [L_{31} L_{32} L_{33}] [ 0 0 L_{33}]Comparing both sides, we get
L_{11} = sqrt(A_{11}) L_{21} = A_{21} / L_{11} L_{22} = sqrt(A_{22} - L_{21}*L_{21}) L_{31} = A_{31} / L_{11} L_{32} = (A_{32} - L_{21}*L_{31}) / L_{22} L_{33} = sqrt(A_{33} - L_{31}*L_{31} - L_{32}*L_{32})In general, we have the following implementation generating L row by row.
function L = CholScalar(A) % L = CholScalar(A) % % Cholesky factorization of a symmetric and positive % definite matrix. Returns the lower triangular factor in L. % % A - symmetric positive definite matrix % L - lower triangular so that A = L*L' % % Scalar version % % Reference % Charles F. Van Loan, % Introduction to Scientific Computing, % Prentice Hall, NJ, 1997. P. 243 [n,n] = size(A); L = zeros(n,n); for i=1:n % compute L(i,1:i) for j=1:i s = A(j,i); for k=1:j-1 s = s - L(j,k)*L(i,k); end if j < i L(i,j) = s/L(j,j); else L(i,i) = sqrt(s); end end endThis implementation is scalar version and requires n^3/3 flops. Alternatively, we can derive a column version:
function G = CholSax(A) % G = CholSax(A) % % Cholesky factorization of a symmetric and positive % definite matrix. Returns the lower triangular factor in G. % % A - symmetric positive definite matrix % G - lower triangular so that A = G*G' % % Saxpy version, level-1 % % Reference % Charles F. Van Loan, % Introduction to Scientific Computing, % Prentice Hall, NJ, 1997. P. 244 [n,n] = size(A); G = zeros(n,n); s = zeros(n,1); for j=1:n s(j:n) = A(j:n,j); for k=1:j-1 s(j:n) = s(j:n) - G(j:n,k)*G(j,k); end G(j:n,j) = s(j:n)/sqrt(s(j)); endThe above implementation uses vector-vector operations. Further, we can derive matrix-vector (level-2) version:
function G = CholGax(A) % G = CholGax(A) % % Cholesky factorization of a symmetric and positive % definite matrix. Returns the lower triangular factor in G. % % A - symmetric positive definite matrix % G - lower triangular so that A = G*G' % % Gaxpy version, level-2 % % Reference % Charles F. Van Loan, % Introduction to Scientific Computing, % Prentice Hall, NJ, 1997. P. 244 [n,n] = size(A); G = zeros(n,n); s = zeros(n,1); for j=1:n if j==1 s(j:n) = A(j:n,j); else s(j:n) = A(j:n,j) - G(j:n,1:j-1)*G(j,1:j-1)'; end G(j:n,j) = s(j:n)/sqrt(s(j)); endThe following table shows the performce of the three implementations, where the matrix size n=32.
----------------------------------- Algorithms Time Flops ----------------------------------- CholScalar 0.619 11986 CholSax 0.378 11522 CholGax 0.083 12017 -----------------------------------For this assignment, you are to write three C functions to implement the three versions (scalar, vector-vector, matrix-vector) of Cholesky factorization. Since we don't want to pay for performance, you are not allowed to use the high performance library. However, you can use free software, such as BLAS in netlib. After testing, compare the performance of the three versions. The subroutines in BLAS are in Fortran. To call Fortran subroutines in C, please check our Calling Fortran in C in Resources section.
To run the competition, get your performance measurements from one of our Ultra 10 machines in GS324. To make the results independent of the matrix size, you report the floating-point operations per second, that is
n^3 / 3 --------- secondswhere n is the matrix size.
#include <stdio.h> #include <time.h> ... clock_t before; double elapsed; before = clock(); long_running_function(); elapsed = clock() - before; printf("function used %.3f seconds\n", elapsed/CLOCKS_PER_SEC);Alternatively, you can use system call, getrusage. The following is the template for the test program and method for measuring performance using getrusage.
#include <stdio.h> #include <sys/resource.h> loop(i) int i ; { int j; long isec, iusec, esec, eusec; double iseconds, eseconds; struct rusage rustart, ruend; /* ** Get start time */ getrusage(RUSAGE_SELF, &rustart); /* ** body of the routine. Optimization will kill this so no -O please */ /* ** Get ending time */ getrusage(RUSAGE_SELF, &ruend); /* convert the start time to seconds */ isec = rustart.ru_utime.tv_sec; /* seconds */ iusec = rustart.ru_utime.tv_usec; /* microseconds */ iseconds = (double) (isec + ((float)iusec/1000000)); /* convert the ending time to seconds */ esec = ruend.ru_utime.tv_sec; /* seconds */ eusec = ruend.ru_utime.tv_usec; /* microseconds */ eseconds = (double) (esec + ((float)eusec/1000000)); /* ** Print out the actual number of seconds used */ printf("%f seconds were used\n",eseconds-iseconds); }