Performance
References
Metrics
Scalability
As we have seen, usually when problem size increases
efficiency increases; when the number of processors
increases efficiency decreases. Can we keep the efficiency
fixed when the problem size and the number of processors
increase simultaneously? That leads to the scalability.
Recall that a program is cost-optimal if p*T(p,n)/T(1,n)
is a constant. So, a scalable parallel system can always
be made cost-optimal if the number of processors and the size
of problem are chosen appropriately
Isoefficiency function
Normally, the efficiency is less than 1. We define the overhead:
T_o = p*T(p,n) - w > 0
Sources of overhead:
From the definition of overhead, we get the parallel time
T(p,n) = (T_o + w)/p
and the efficiency
E = w/(p*T(p,n)) = 1/(1 + T_o/w)
The above equation shows that if the overhead grows
faster than the problem size (w), then the efficiency
(E) decreases. This is undesirable. To maintain the
efficiency, we must have
w/T_o = const
In general, T_o is a function of w and p, since T(p,n)
is a function of n (thus w) and p and T_o = p*T(p,n) - w.
Example
Consider the problem of adding n numbers on a p-processor
hypercube. Adding n/p numbers locally requires (n/p - 1)
operations (floating-point additions). Adding p partial
sums in parallel on a hypercube takes log(p) + log(p)*T_comm.
Thus the total time is
T(p,n) ~ n/p + log(p)*(1 + T_comm)
Obviously, the serial time is w = n-1 ~ n. So, the overhead is
T_o = p*T(p,n) - w = p*(1 + T_comm)*log(p)
and the isoefficiency function is given by
w w
---------------- = const or ---------- = const
p*(1 + T_comm) p*log(p)
In this case, w = n, so the above equation means that if
p increases to p', to maintain the efficiency, the problem
size must increase from n to n*((p'*log(p')/(p*log(p))).
When T_o has multiple terms, we use the term that gives
the highest rate. Suppose T_o = p^(3/2) + p^(3/4)*w^(3/4),
then to keep w/T_o = const, we have
1
----------------------------- = const
p^(3/2)/w + p^(3/4)/w^(1/4)
If w grows at the rate of p^3 (w = O(p^3)),
then p^(3/4)/w^(1/4) = const and
1 1
----------------------------- = --------------------
p^(3/2)/w + p^(3/4)/w^(1/4) O(p^(-3/2)) + O(1)
tends to a constant as p is very large. The parallel system
is asymptotically isoefficient.
Note that the isoefficiency equation shows that to maintain
the efficiency (isoefficient), problem size must grow at
least at the rate of O(p). From const = w/T_o, it follows
that const*p*T(p,n) = (1 + const)*w, or
w = const*p*T(p,n)
Thus, w is at least proportional to p.
Sharks and fish problem
This is a collection of parallel program problem.
Sharks and fish simulate moving particles in a 2D
space following some physical rules. Details and
come working implementations of these problems
can be found in Demmel's
cs267.
The first sharks and fish problem is embarrassingly parallel.
In the second sharks and fish problem, every fish needs the
positions of all fish to calculate the gravity.
A sequential solution
pos(1:n): positions, each pos(i) is a 2-D array or a complex number
mass(1:n): masses
dir(1:n): direction of forces
force(1:n): forces
vel(1:n): velocities
accel(1:n): accelerations
dt: time step
tfinal: total time
formulas:
f = m1*m2/(r*r) (ignore the constant)
v = a*t
d = v*t + 0.5*a*t*t
t = 0.0;
while t < tfinal
for i = 1:n
force(i) = 0.0;
for j = 1:n & i ~= j
dir(j) = (pos(j) - pos(i))/||pos(j) - pos(i)||;
force(i) = force(i) + (mass(j)*mass(i)*dir(j)/||pos(j)-pos(i)||^2;
end
accel(i) = force(i)/mass(i);
pos(i) = pos(i) + dt*(vel(i) + 0.5*accel(i)*dt);
vel(i) = vel(i) + dt*accel;
end
t = t + dt;
end.
Since total_fish is very large and total_fish>>p (otherwise,
we would not use parallel computing), and t_s>>t_w, the third
solution is better than the second. As we said earlier, we
prefer sending fewer and larger messages.