Before reading about the solution, a fair question the reader may ask is: “What
is the problem? Is that problem important? and to whom?” Answering
these questions requires looking at a global picture of our computerized society.
Today, in a competitive world, enterprises of all kinds use and depend on timely
available, up-to-date information. Information volumes are growing 25-35% per
year and the traditional transaction rate has been forecast to grow by a factor
of 10 over the next five years-twice the current trend in mainframe growth.
In addition, there is already an increasing number of transactions arising
from computer systems in business-to-business interworking and by intelligent
terminals in the home, office or factory.
The profile of the transaction load is also changing as decision-support queries,
typically complex, are added to the existing simpler, largely clerical workloads.
Thus, complex queries such as those macro-generated by decision support systems
or system-generated as in production control will increase to demand significant
throughput with acceptable response times. In addition, very complex queries on
very large databases, generated by skilled staff workers or expert systems, may
hurt throughput while demanding good response times.
From a database point of view, the problem is to come up with database
servers that support all these types of queries efficiently on possibly very large
on-line databases. However, the impressive silicon technology improvements
alone cannot keep pace with these increasing requirements. Microprocessor
performance is now increasing 50% per year, and memory chips are increasing
in capacity by a factor of 16 every six years. RISC processors today can deliver
between 50 and 100 MIPS (the new 64 bit DEC Alpha processor is predicted to
deliver 200 M!PS at cruise speed!) at a much lower price/MIPS than mainframe
processors. This is in contrast to much slower progress in disk technology which
has been improving by a factor of 2 in response time and throughput over the
last 10 years. With such progress, the I/O bottleneck worsens with time.
The solution is therefore to use large-scale parallelism to magnify the raw power
of individual components by integrating these in a complete system along with the
appropriate parallel database software. Using standard hardware components is
essential to exploit the continuing technology improvements with minimal delay.
Then, the database software can exploit the three forms of parallelism inherent
in data-intensive application workloads. Interquery parallelism enables the parallel
execution of multiple queries generated by concurrent transactions. Intraquery
parallelism makes the parallel execution of multiple, independent operations (e.g.,
select operations) possible within the same query. Both interquery and intraquery
parallelism can be obtained by using data partitioning. Finally, with intraoperation
parallelism, the same operation can be executed as many suboperations using
function partitioning in addition to data partitioning. The set-oriented mode of
database languages (e.g., SQL) provides many opportunities for intraoperation
parallelism. For example, the performance of the join operation can be increased
significantly by parallelism.