Search Space Optimization

The optimizer considers each algebraic operation independently since the query decomposition step has
already taken global reorganization decisions. Optimization of all operations but the n-way Select
Project Join (SPJ) is quit straightforward. It consists of choosing the algorithm and the home of
operation. The crucial issue in terms of the search strategy is the join ordering problem, which
is NP-hard on the number of relations.

The search space, or solution space, is the set of all QEPs that compute the same result. A point
in the solution space is one particular plan, i. e. solution for the problem. A solution is
described by the query tree for executing the join expression. Every point of the search space
has a cost associated with it; a cost function maps query trees to their respective costs. The
query tree itself is a binary tree that consists of base relations as its leaves and joins
operations as its inner nodes; edges denote the flow of data that takes place from the leaves
of the tree to the root. As the join operation is commutative and associative, the number of
possible query trees increases quickly with increasing the number of relations involved in the
query. Thus, we reduce the goal of the query optimizer to find the best join order, together
with the best join method for each join. For a complex query, involving many relations and
many operators, the number of equivalent operator trees can be very high. Investigating a large
search space may make optimization time prohibitive, sometime much more expensive than the
execution time. Therefore, query optimizers typically restrict the size of the search space
they consider. The first restriction is to use heuristics. Another important restriction is the
shape of the join tree. The following sections discuss the characteristics of the most interesting
kinds of search spaces depending on the shape of the join tree they include.

Left-Deep and Right-Deep Trees

The search space restricted to only left-deep trees consists of all query trees where the inner
relation of each join is a base relation as in Figure 2-a. For a fixed number of base relations,
left-deep tree does not leave any degree of freedom concerning the shape of the tree, but there
are n! ways to allocate n base relations to the tree leaves.

Figure 2. Different query trees

In most sequential optimizers like the optimizer of IBM System R, the search space is restricted
to left-deep trees. The reason for this heuristic restriction is that the space including all trees
is much larger, while an optimal or nearly optimal plan can often be found in the much smaller
space of left-deep trees. When join algorithms are not symmetric, which is the case for hash-join,
it is useful to distinguish between left-deep and right-deep trees, see Figure 2-b. Hash-based join
algorithm have been shown to be the most efficient for equi-joins and particularly interesting in a
parallel environment because they naturally induce intra-operation parallelism. Hash-join usually
consists of two consecutive phases: Build and probe. In the build phase, a hash table is constructed
on the join attribute of the smallest relation. In the probe phase, the largest relation is
sequentially scanned and the hash table is consulted for each of its tuples.

Advertisements

, , , , , , , , , , , , , ,

  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: