Technology is driving parallel database machines. The ideal machine would be a single infinitely fast processor
with an infinite memory with infinite bandwidth – and it would be infinitely cheap (free). Given such a
machine, the need for parallelism would be much reduced. Unfortunately, technology is not delivering such
machines – but it is coming close. Technology is promising to deliver fast one-chip processors, fast and large
disks, and large ram memories. And each of these devices will be very cheap by today’s standards, costing
only hundreds of dollars each. So the challenge is to build an infinitely fast processor out of infinitely many
processors offinite speed, and to build an infinitely large memory with infinite memory bandwidth from
infinitely many storage units offinite speed. This sounds trivial mathematically; but in practice, when a new
processor is added to most computer designs, it slows every other computer down just a litde bit. If this slow
down is 1%, then when adding a processor to a 100 processor complex, the added power is cancelled by the
slow down of the other processors. Such a system has a maximum speedup of 25 (at 50 processors) and the
100 processor complex has the effective power of a single processor system.
The ideal parallel system demonstrates two key properties: (1) linear speed up: Twice as much hardware can
perform the task in half the elapsed time, and (2) linear scaleup: Twice as much hardware can perform
twice as large a task in the same elapsed time (see Figure l.a).
Figure 18. The difference between a speedup design in which a one-minute job is done in 15-seconds,
and a scaleup design in which a ten-times bigger job is done in the same time by a ten-times bigger
The barriers to these goals are the triple threats of: (1) startup: The time needed to start a parallel
operation. If thousands of processes must be started, this can easily dominate the actual computation
time. (2) interference: The slowdown each new process imposes on all others. (3) skew: The service
time of a job is often the service time of the slowest step of the job. As the number of parallel
steps increase the average sized of each step decreases, but the variance can well exceed the mean.
When the variance dominates the mean, further parallelism will not help.
This section describes several basic techniques widely used in the design of shared-nothing parallel
database machines to achieve linear speedup and scaleup on relational operators. There are a number of
reasons for the success of shared-nothing systems. The main advantage of shared nothing multiprocessors
is that they can be scaled up to hundreds and probably thousands of processors which do not interfere
with one another. Teradata and Tandem, for example, have shipped systems with more than 200
processors, and Intel is implementing a 2000 node Hypercube. On the other hand, the largest shared
memory multiprocessors currently available are limited to about 32 processors.
Figure 1b. The standard speedup curves. The left curve is the ideal. The middle graph shows no
speedup as harclware is added. The right curve shows the three threats to parallelism. Initial startup costs
may dominate at first. As the number of processes increase, interference can increase. Ultimately, the job
is divided so finely, that the variance in service times (skew) causes a slowdown.
Interference is a major problem for shared-memory multiprocessors for database applications. As
processors become faster, each processor is given a large private cache in order to avoid interference
on shared memory resources. Recent results measuring a multiprocessor running an online transaction
workload show that the cache loading/flushing overhead of transaction processing applications
considerably degrades processor performance. We do not believe high-performance shared-memory
machines will scale beyond a tens of processors when running database applications. When high degrees
of parallelism are introduced, shared resources become a bottleneck. Consequently the software
architecture must use fine granularity concurrency control, and must even partition resources to
avoid process and processor interference. This observation is true for both shared-nothing and shared
memory designs. This partitioning on a shared-memory system creates many of the problems faced by a
shared nothing machine.