Dynamic programming (DP) is a technique for solving discrete optimization problems. It solves the whole problem by solving subproblems. It starts with the lowest level, usually simple subproblems, then moves up levels and finally solves the problem. The solution to a subproblem is expressed as a function of solutions to one or more subproblems at the preceeding levels.
Example. The shortest-path problem.
Notations
nodes: 0, 1, ..., n-1
cost: c(i,j), i
Thus f(n-1) is the solution to the whole problem.
Formulation
Monadic: Functional equation that contains a single recursive term.
Polyadic: Functional equation that contains multiple recursive
terms.
Serial: Subproblems at all levels depend on only the results
at the immediately proceeding level.
Example of serial monadic DP formulation. A shortest-path
problem where the nodes can be organized into levels.
Notations
C_l(i): minimum cost from v_l(i) to R
c_l: vector (c_l(0), c_l(1), ..., c_l(n-1))'
Thus the solution is c_0 = (c_0(0))
Formulation
Example of Nonserial monadic DP formulation.
The longest-common-subsequence prolbem.
Notations
F[i,j]: the longest common subsequence of the first
i elements of A and the first j elements of B,
Thus the solution is F[n,m].
Formulation
PRAM time: O(n), assuming m=n, using n processors, 2n-1 steps
G(V,E): graph
c(i,j): weight of the edge from i to j
d(i,j): cost of the shortest path between i and j
d_k(i,j): cost of the shortest path between i and j
using only nodes v_0, v_1, ..., v_{k-1}
Solution: d_n(i,j)
Formulation:
Algorithm
In iteration k=1, it takes sqrt(p) steps to send p(0,j) share of
d_0(0,:) to d_0(s-1,:) where s = sqrt(p). The time is
0 x=0
f(x) =
min_{0<=j
C_l(i) = min_j(c_l(i,j) + C_{l+1}(j)
Clearly,
c_{r-1} = (c_{r-1}(0,R), c_{r-1}(1,R), ..., c_{r-1}(n-1,R))'
In general, write the above equation in matrix form
c_l = M_{l,l+1} x c_{l+1}
where
c_l(0,0) c_l(0,1) ... c_l(0,n-1)
c_l(1,0) c_l(1,1) ... c_l(1,n-1)
M(l,l+1) = . . .
. . .
c_l(n-1,0) c_l(n-1,1) ... c_l(n-1,n-1)
In M_{l,l+1) x c_{l+1), multiplication is replaced by + and
addition is replaced by min. Parallel matrix-vector multiplication
algorithms can be used.
0 if i=0 or j=0
F[i,j] = F[i-1,j-1] + 1 if i,j>0 and xi=y_i
max{F[i,j-1], F[i-1,j]} if i,j>0 and xi neq yj
Sequential Implementation
Matrix F:
F[0,0] F[0,1] F[0,2] ... F[0,m]
F[1,0] F[1,1] F[1,2] ... F[1,m]
F[2,0] F[2,1] F[2,2] ... F[2,m]
. . . .
. . . .
. . . .
F[n,0] F[n,1] F[n,2] ... F[n,m]
Algorithm
Initial conditions: F[0,:], F[:,0];
for k=1 to 2*n-1
for each entry F[i,j] on the anti-diagonal (i+j=k)
use F[i-1,j] (north), F[i,j-1] (west),
and F[i-1,j-1] (north-west) to compute F[i,j]
end
end
Serial time: O(mn)Column Layout
Use n processors, each has a column of F.
total time = sum_{k=1}^{2*n-1} (alpha + beta -------- communication
+ 1 ----------------- computation
)
= (2*n-1)*(alpha + beta + 1)
n^2
Efficiency = ------------------------------ <= 0.5
n*(2*n-1)*(alpha + beta + 1)
This upper bound (0.5) for the efficiency is due to the
poor load balance.Example of Serial Polyadic DP Problem
Floy's All-Pairs Shortest-Paths Algorithm
Notations
c(i,j) k=0
d_k(i,j) =
min{d_{k-1}(i,j), (d_{k-1}(i,k) + d_{k-1}(k,j))} 0
for k=1 to n
for 0 <= i, j <= n-1
d_k(i,j) = min{d_{k-1}(i,j), (d_{k-1}(i,k) + d_{k-1}(k,j))}
end
end
Serial time:
sum_{k=1}^n sum_{i=0}^{n-1} sum_{j=0}^{n-1} (2) ------ min and +
= O(n^3)
PRAM time: O(n), use n^2 processorsRow-Column Block Layout
Use p processors, each has an (n/sqrt(p))-by-(n/sqrt(p)) block
for k=1 to n
broadcast d_{k-1}(:,k) in rows;
broadcast d_{k-1}(k,:) in columns;
compute d_k(i,j) (local)
end.
Assuming ring-based broadcast.
s*(alpha + (n/s)*beta), s = sqrt(p)
In the following iterations, the communication can be pipelined,
a message arrives every time interval of alpha + (n/s)*beta,
the computation time is
2*(n^2/p). When n is large, computation
is longer than communication, so they overlap.
In the last iteration, it takes sqrt(p) steps to send
p(s-1,j) share of d_{n-1}(n-1,:) to d_{n-1}(0,:). Similar
to the first iteration, the time is
s*(alpha + (n/s)*beta), s = sqrt(p)
Total time
2*s*(alpha + (n/s)*beta) ------------------- communication
+ sum_{k=1}^{n-1} (2*n^2/p) ------------------ computation
= O(s)*alpha + O(n)*beta + O(n^3/p)
Efficiency
1
----------------------------------------
1 + (p^{1.5}/n^3)*alpha + (p/n^2)*beta
Remarks