An overview of performance
Algorithm Type Serial time PRAM time Storage # of Proc. -------------------------------------------------------------------- Dense LU D N^3 N N^2 N^2 Band LU D N^2 N N^(3/2) N Jacobi I N^2 N N N Sparse LU D N^(3/2) sqrt(N) N*logN N CG I N^(3/2) sqrt(N)logN N N SOR I N^(3/2) sqrt(N) N N FFT D N*logN logN N N Multigrid I N (logN)^2 N N -------------------------------------------------------------------- lower bound N logN NIssues in iterative methods
The 2D Poisson equation
d^2 u(x,y) d^2 u(x,y) ------------ + ------------ = f(x,y) dx^2 dy^2Discrete form
u(i-1,j)-2u(i,j)+u(i+1,j) u(i,j-1)-2u(i,j)+u(i,j+1) --------------------------- + --------------------------- = f(i,j) h^2 h^2Boundary condition:
u(0,y) = u(1,y) = u(x,0) = u(x,1) = 0For example, n=3, rwo ordering
[-4 1 0 1 0 0 0 0 0 ] [ 1 -4 1 0 1 0 0 0 0 ] [ 0 1 -4 0 0 1 0 0 0 ] [ 1 0 0 -4 1 0 1 0 0 ] [ 0 1 0 1 -4 1 0 1 0 ] [ 0 0 1 0 1 -4 0 0 1 ] [ 0 0 0 1 0 0 -4 1 0 ] [ 0 0 0 0 1 0 1 -4 1 ] [ 0 0 0 0 0 1 0 1 -4 ]
Jacobi Method (including boundary values)
u(i,j,m+1) = (1/4)*(u(i-1,j,m)+u(i+1,j,m)+u(i,j-1,m)+u(i,j+1,m)+b(i,j))Initial guess: any
Convergence: m ~ ((n+1)/pi)^2
Termination criteria: assuming discretization error 10^(-6) (20 bits), 20*((n+1)/pi)^2
Performance
Serial time: O(n)*O(N) = O(N^2)
PRAM: O(N)*O(1) = O(N), N processors
Parallel: 2D block, N/p grid points per processor, computation O(N/p), communication O(Ts + 4n/sqrt(p)*Tw) each step
Isoefficiency: N = O(p) or n = O(sqrt(p)) (n^2 = N)
SOR
Improvements on Jacobi. It is based on Guass-Seidel method. The basic idea is to use the latest results. It is easy on serial but hard on parallel. Usually, red-black (chess board) ordering is used in parallel.
The idea of SOR is
u(i,j,m+1) = u(i,j,m) + correctionThe correction is a scaled error of previous method.
Initial guess: any
Convergence: m ~ (n+1)/(2*pi)
Termination criteria: assuming discretization error 10^(-6) (20 bits), 20*(n+1)/(2*pi)
Performance
Serial time: O(N^(3/2))
Parallel: computation O(N/p), communication O(Ts = 4n/sqrt(p)*Tw) each step
Isoefficiency: N = O(p) or n = O(sqrt(p))
Fast Fourier Transform (FFT)
Define
[ 2 -1 ] [-1 2 -1 0 ] T = [ ..... ] U = [u(i,j)] [ 0 -1 2 -1] [ -1 2]then the (i,j)-entry of TU
TU(i,j) = -(u(i-1,j)-2u(i,j)+u(i+1,j))and the (i,j)-entry of UT
UT(i,j) = -(u(i,j-1)-2(i,j)+u(i,j+1))Thus the discrete Poisson equation becomes
-(1/h^2)*(TU+UT) = F or TU+UT = B
How can FFT help? Since T has very special form, we can easily compute its eigenvalue decomposition
T = Q*D*Q^(-1)where D=diag(d(i)). In fact, we can find a Q such that Q=Q^(-1). Thus
D*(Q*U*Q) + (Q*U*Q)*D = Q*B*Q
Algorithm 0. Find Q and D: T=Q*D*Q; 1. Compute C=Q*B*Q; % O(n^2*log(n)) using FFT 2. For each i,k V(j,k)=C(j,k)/(d(j)+d(k)); % 2n^2=2N, embarrassingly parallel 3. Compute U=Q*V*Q. % O(n^2*log(n)) using FFTHow to compute the eigenvalues and eigenvectors of T? We can verify that d(i) and the jth column of Q are respectively
d(i)=2(1-cos(j*pi/(n+1))) q(j)=sqrt(2/(n+1))*[ sin(j*pi/(n+1))] [sin(2j*pi/(n+1))] [ ... ] [sin(nj*pi/(n+1))]