Assignment 1: Due Thur., October 12, 5 pm

In scientific and engineering applications, we often have symmetric and positive definite system. Cholesky factorization is the method of choice for solving symmetric and positive definite systems.

An n-by-n matrix A is symmetric and positive definite (SPD) if

         A = A^T    and x^T*A*x > 0 for all x != 0

Properties of a symmetric and positive definite matrix

If A is SPD, then

A = A^T and all its eigenvalues are positive;

All diagonal entries A_{ii} are positive and

            A_{ii} + A_{jj}
            --------------- >= | A_{ij} |
                   2

In other words, the largest entry is on the diagonal;

Any pricipal submatrix of A is SPD;

There is a unique lower triangular nonsingular L, with positive diagonal entries, such that A = LL^T (Cholesky factorization).

What kind of matrices are symmetric and positive definite

If A is an n-by-n matrix and

                           n
         | A_{ii} | >     SUM     | A_{ij} |    i = 1:n
                      {j=1, j!=i}

then A is strictly diagonal deminant. If A is also symmetric with positive diagonal entries, then it is SPD;

If an n-by-n X has full column rank (independent columns) and A = X^T*X, the A is SPD

Cholesky factorization

Consider a 3-by-3 case:

     [A_{11} A_{12} A_{13}]   [L_{11}   0      0   ]   [L_{11} L_{21} L_{31}]
     [A_{21} A_{22} A_{23}] = [L_{21} L_{22}   0   ] * [  0    L_{22} L_{32}]
     [A_{31} A_{32} A_{33}]   [L_{31} L_{32} L_{33}]   [  0      0    L_{33}]

Comparing both sides, we get

        L_{11} = sqrt(A_{11})
        L_{21} = A_{21} / L_{11}
        L_{22} = sqrt(A_{22} - L_{21}*L_{21})
        L_{31} = A_{31} / L_{11}
        L_{32} = (A_{32} - L_{21}*L_{31}) / L_{22}
        L_{33} = sqrt(A_{33} - L_{31}*L_{31} - L_{32}*L_{32})

In general, we have the following implementation generating L row by row.

    function L = CholScalar(A)
    % L = CholScalar(A)
    %
    % Cholesky factorization of a symmetric and positive
    % definite matrix. Returns the lower triangular factor in L.
    %
    %       A - symmetric positive definite matrix
    %       L - lower triangular so that A = L*L'
    %
    % Scalar version
    %
    % Reference
    % Charles F. Van Loan,
    % Introduction to Scientific Computing,
    % Prentice Hall, NJ, 1997. P. 243

    [n,n] = size(A);
    L = zeros(n,n);
    for i=1:n
        % compute L(i,1:i)
        for j=1:i
            s = A(j,i);
            for k=1:j-1
                s = s - L(j,k)*L(i,k);
            end
            if j < i
                L(i,j) = s/L(j,j);
            else
                L(i,i) = sqrt(s);
            end
        end
    end

This implementation is scalar version and requires n^3/3 flops. Alternatively, we can derive a column version:

    function G = CholSax(A)
    % G = CholSax(A)
    %
    % Cholesky factorization of a symmetric and positive
    % definite matrix. Returns the lower triangular factor in G.
    %
    %       A - symmetric positive definite matrix
    %       G - lower triangular so that A = G*G'
    %
    % Saxpy version, level-1
    %
    % Reference
    % Charles F. Van Loan,
    % Introduction to Scientific Computing,
    % Prentice Hall, NJ, 1997. P. 244

    [n,n] = size(A);
    G = zeros(n,n);
    s = zeros(n,1);
    for j=1:n
        s(j:n) = A(j:n,j);
        for k=1:j-1
            s(j:n) = s(j:n) - G(j:n,k)*G(j,k);
        end
        G(j:n,j) = s(j:n)/sqrt(s(j));
    end

The above implementation uses vector-vector operations. Further, we can derive matrix-vector (level-2) version:

    function G = CholGax(A)
    % G = CholGax(A)
    %
    % Cholesky factorization of a symmetric and positive
    % definite matrix. Returns the lower triangular factor in G.
    %
    %       A - symmetric positive definite matrix
    %       G - lower triangular so that A = G*G'
    %
    % Gaxpy version, level-2
    %
    % Reference
    % Charles F. Van Loan,
    % Introduction to Scientific Computing,
    % Prentice Hall, NJ, 1997. P. 244

    [n,n] = size(A);
    G = zeros(n,n);
    s = zeros(n,1);
    for j=1:n
        if j==1
            s(j:n) = A(j:n,j);
        else
            s(j:n) = A(j:n,j) - G(j:n,1:j-1)*G(j,1:j-1)';
        end
        G(j:n,j) = s(j:n)/sqrt(s(j));
    end

The following table shows the performce of the three implementations, where the matrix size n=32.

       -----------------------------------
        Algorithms       Time       Flops
       -----------------------------------
        CholScalar      0.619       11986
        CholSax         0.378       11522
        CholGax         0.083       12017
       -----------------------------------

For this assignment, you are to write three C functions to implement the three versions (scalar, vector-vector, matrix-vector) of Cholesky factorization. Since we don't want to pay for performance, you are not allowed to use the high performance library. However, you can use free software, such as BLAS in netlib. After testing, compare the performance of the three versions. The subroutines in BLAS are in Fortran. To call Fortran subroutines in C, please check our Calling Fortran in C in Resources section.

To run the competition, get your performance measurements from one of our Ultra 10 machines in GS324. To make the results independent of the matrix size, you report the floating-point operations per second, that is

                   n^3 / 3
                  ---------
                   seconds

where n is the matrix size.

Timing

C and C++ provide a standard routine, clock, that reports the CPU time. Here is a timing scaffold using clock.

#include <stdio.h>
#include <time.h>

    ...
    clock_t before;
    double elapsed;

    before = clock();
    long_running_function();
    elapsed = clock() - before;
    printf("function used %.3f seconds\n",
            elapsed/CLOCKS_PER_SEC);

Alternatively, you can use system call, getrusage. The following is the template for the test program and method for measuring performance using getrusage.

#include <stdio.h>
#include <sys/resource.h>

loop(i)
int i ;
{
    int j;
    long isec, iusec, esec, eusec;
    double iseconds, eseconds;
    struct rusage rustart, ruend;
/*
**  Get start time
*/
    getrusage(RUSAGE_SELF, &rustart);
/*
**  body of the routine. Optimization will kill this so no -O please
*/

/*
**  Get ending time
*/
    getrusage(RUSAGE_SELF, &ruend);
/*  convert the start time to seconds */
    isec = rustart.ru_utime.tv_sec;     /* seconds */
    iusec = rustart.ru_utime.tv_usec;   /* microseconds */
    iseconds = (double) (isec + ((float)iusec/1000000));
/*  convert the ending time to seconds */
    esec = ruend.ru_utime.tv_sec;       /* seconds */
    eusec = ruend.ru_utime.tv_usec;     /* microseconds */
    eseconds = (double) (esec + ((float)eusec/1000000));
/*
**  Print out the actual number of seconds used
*/
    printf("%f seconds were used\n",eseconds-iseconds);
}