An n-by-n matrix A is symmetric and positive definite (SPD) if
A = A^T and x^T*A*x > 0 for all x != 0
[A_{11} A_{12} A_{13}] [L_{11} 0 0 ] [L_{11} L_{21} L_{31}]
[A_{21} A_{22} A_{23}] = [L_{21} L_{22} 0 ] * [ 0 L_{22} L_{32}]
[A_{31} A_{32} A_{33}] [L_{31} L_{32} L_{33}] [ 0 0 L_{33}]
Comparing both sides, we get
L_{11} = sqrt(A_{11})
L_{21} = A_{21} / L_{11}
L_{22} = sqrt(A_{22} - L_{21}*L_{21})
L_{31} = A_{31} / L_{11}
L_{32} = (A_{32} - L_{21}*L_{31}) / L_{22}
L_{33} = sqrt(A_{33} - L_{31}*L_{31} - L_{32}*L_{32})
In general, we have the following implementation generating L row by row.
function L = CholScalar(A)
% L = CholScalar(A)
%
% Cholesky factorization of a symmetric and positive
% definite matrix. Returns the lower triangular factor in L.
%
% A - symmetric positive definite matrix
% L - lower triangular so that A = L*L'
%
% Scalar version
%
% Reference
% Charles F. Van Loan,
% Introduction to Scientific Computing,
% Prentice Hall, NJ, 1997. P. 243
[n,n] = size(A);
L = zeros(n,n);
for i=1:n
% compute L(i,1:i)
for j=1:i
s = A(j,i);
for k=1:j-1
s = s - L(j,k)*L(i,k);
end
if j < i
L(i,j) = s/L(j,j);
else
L(i,i) = sqrt(s);
end
end
end
This implementation is scalar version and requires n^3/3 flops.
Alternatively, we can derive a column version:
function G = CholSax(A)
% G = CholSax(A)
%
% Cholesky factorization of a symmetric and positive
% definite matrix. Returns the lower triangular factor in G.
%
% A - symmetric positive definite matrix
% G - lower triangular so that A = G*G'
%
% Saxpy version, level-1
%
% Reference
% Charles F. Van Loan,
% Introduction to Scientific Computing,
% Prentice Hall, NJ, 1997. P. 244
[n,n] = size(A);
G = zeros(n,n);
s = zeros(n,1);
for j=1:n
s(j:n) = A(j:n,j);
for k=1:j-1
s(j:n) = s(j:n) - G(j:n,k)*G(j,k);
end
G(j:n,j) = s(j:n)/sqrt(s(j));
end
The above implementation uses vector-vector operations. Further,
we can derive matrix-vector (level-2) version:
function G = CholGax(A)
% G = CholGax(A)
%
% Cholesky factorization of a symmetric and positive
% definite matrix. Returns the lower triangular factor in G.
%
% A - symmetric positive definite matrix
% G - lower triangular so that A = G*G'
%
% Gaxpy version, level-2
%
% Reference
% Charles F. Van Loan,
% Introduction to Scientific Computing,
% Prentice Hall, NJ, 1997. P. 244
[n,n] = size(A);
G = zeros(n,n);
s = zeros(n,1);
for j=1:n
if j==1
s(j:n) = A(j:n,j);
else
s(j:n) = A(j:n,j) - G(j:n,1:j-1)*G(j,1:j-1)';
end
G(j:n,j) = s(j:n)/sqrt(s(j));
end
The following table shows the performce of the three implementations,
where the matrix size n=32.
-----------------------------------
Algorithms Time Flops
-----------------------------------
CholScalar 0.619 11986
CholSax 0.378 11522
CholGax 0.083 12017
-----------------------------------
For this assignment, you are to write three C functions to implement
the three versions (scalar, vector-vector, matrix-vector) of Cholesky
factorization. Since we don't want to pay for performance, you are
not allowed to use the high performance library. However, you can
use free software, such as BLAS in netlib. After testing, compare
the performance of the three versions. The subroutines in BLAS are
in Fortran. To call Fortran subroutines in C, please check our
Calling Fortran in C in Resources section.To run the competition, get your performance measurements from one of our Ultra 10 machines in GS324. To make the results independent of the matrix size, you report the floating-point operations per second, that is
n^3 / 3
---------
seconds
where n is the matrix size.
#include <stdio.h>
#include <time.h>
...
clock_t before;
double elapsed;
before = clock();
long_running_function();
elapsed = clock() - before;
printf("function used %.3f seconds\n",
elapsed/CLOCKS_PER_SEC);
Alternatively, you can use system call, getrusage.
The following is the template for the test program and method for
measuring performance using getrusage.
#include <stdio.h>
#include <sys/resource.h>
loop(i)
int i ;
{
int j;
long isec, iusec, esec, eusec;
double iseconds, eseconds;
struct rusage rustart, ruend;
/*
** Get start time
*/
getrusage(RUSAGE_SELF, &rustart);
/*
** body of the routine. Optimization will kill this so no -O please
*/
/*
** Get ending time
*/
getrusage(RUSAGE_SELF, &ruend);
/* convert the start time to seconds */
isec = rustart.ru_utime.tv_sec; /* seconds */
iusec = rustart.ru_utime.tv_usec; /* microseconds */
iseconds = (double) (isec + ((float)iusec/1000000));
/* convert the ending time to seconds */
esec = ruend.ru_utime.tv_sec; /* seconds */
eusec = ruend.ru_utime.tv_usec; /* microseconds */
eseconds = (double) (esec + ((float)eusec/1000000));
/*
** Print out the actual number of seconds used
*/
printf("%f seconds were used\n",eseconds-iseconds);
}