Introduction

References

  • PCA, 1.1
  • Demmel, Lecture 1
  • DBPP, 1.1, 1.2
  • Why Parallel Computing?

    Trends

  • Performance: The performance of the highly integrated, single-chip CMOS microprocessor is steadily increasing. It is advantagerous to use small, inexpensive, low-power processors as the building blocks for computer systems with many processors.
  • Applications: In scientific and engineering computing, weather prediction, oil reservoir modeling, chemical dnamics, structured boilogy. In science, we usually start with theory and set up a model then do experiments. In engineering, we usually start with a design and build a prototype. Now both are often replaced by numerical simulation, since real applications can be too complicated to model and lab prototypes can be expensive to build. Simulations are computational intensive (speed requirement). See, for example, grand challenges.

    In commerce, online transaction processing (OLTP). The performance is measured in transactions per minute (tpm). It is data intensive (memory requirement).

    Parallel computing can solve the above two requirements (speed and memory) by distributing data and computation among computers.

  • Technology: More transistors on chip. It is possible to use many transistors at once (parallelism).
  • Archtecture: Clock cycle approaches the limit. For example, in current technology, 500MHz means 2e-9 seconds per cycle. During that cycle, light travels only 0.6m (2 feet). Electronic signals travel no faster than light. To get better speed, parallel computing is necessary. Bit-level parallelism, instruction level parallelism (pipelining and super scalar), thread level parallelism. In pipelining, multiple instructions are being executed in same cycle (but different stages). Superscalar can fetch multiple multiple instructions simultaneously. In multiprocessor computers, multiple processes can run simultaneously. Multiprocessor systems dominate the server and enterprise (or mainframe) markets.
  • Network: A typical scenario is a network of hetrogenous computers (PCs, workstations, supercomputers). Parallel computing can make use of the networked computers. High speed network makes parallel computing feasible.
  • Take a look at performance benchmark information.

    Examples

  • Toy story (1995): The first full-length, computer-animated motion picture was produced on a parallel computer system composed of hundreds of Sun workstations.
  • A simplified climate model: Climate is a function of four arguments, longitude, latitude, elevation, and time. Climate is described by six floating-point numbers, temperature, pressure, humidity, and wind velocity (3-D vector). A weather prediction algorithm computes the climate at position (i,j,k) at time n+1 from the climate at time n by solving differential equations. If the grid size is 1km-by-1km and 10 layers, then there are 5e9 cells. If the computation per cell is 100 floating-point operations, then total computation is 5e11 floating-point operations. The storage per cell is six words (24 bytes). The total storage is 1e11 bytes (100 GB). If it takes 1 minute to predict the weather 1 minute after, the speed requirement is 5e11/60 = 8 Gflops. If it takes 24 hours to predict the weather 7 days after, the speed requirement is 8x7=56 Gflops. To make a comparison, for a 1 GHz processor and 1 cycle per floating-point operation, the peak speed is 1 Gflops.

    To realize 56 Gflops speed and 100 GB memory on a single processor, the processor mush be smaller than 5.4 mm-by-5.4 mm, and each bit of the on chip memory takes no more than 50 Angstroms-by-50 Angstromes.

  • Wireless communication. Consider the grid size of 1 foot-by-1 foot-by-1 foot, elevation of 100 (~30m), and updating for every 0.1 second. If each cell takes 100 floating-point operations for each update, then a computer performing 64 Gflops speed can cover an area of size 800-by-800 (feet) (about 240-by-240 m).
  • Three Major Issues

    Parallel computing is not only finding concurrency, but also dealing with communication and synchronization issues.
  • Parallelism (parallelizing code): techniques of designing parallel programs (data parallelism, structure parallelism); performance analysis (efficiency, scalability); software engineering (reuse, portability, testing, verification).
  • Communication: model, interconnection networks, routing mechanism.
  • Synchronization: mutual exclusion, barrier, blocking and nonblocking send and receive.
  • Why Should We Study Parallel Computing?

    Multipocessor computers are common today. Parallel computing allows us to execute multiple instructions on processors in parallel. Computers are networked. Distributed computing allows us to run a program on a cluster of computers in parallel. Parallel programming is significantly different from serial programming. Usually we build hardware first then develop software to fit hardware. In early stages, software is specific to hardware. There is still no standard parallel programming languages. The language MPI, or Message Passing Interface, is the de facto standard adopted by the industry. It can run on many platforms.