The optimizer considers each algebraic operation independently since the query decomposition step has

already taken global reorganization decisions. Optimization of all operations but the n-way Select

Project Join (SPJ) is quit straightforward. It consists of choosing the algorithm and the home of

operation. The crucial issue in terms of the search strategy is the join ordering problem, which

is NP-hard on the number of relations.

The search space, or solution space, is the set of all QEPs that compute the same result. A point

in the solution space is one particular plan, i. e. solution for the problem. A solution is

described by the query tree for executing the join expression. Every point of the search space

has a cost associated with it; a cost function maps query trees to their respective costs. The

query tree itself is a binary tree that consists of base relations as its leaves and joins

operations as its inner nodes; edges denote the flow of data that takes place from the leaves

of the tree to the root. As the join operation is commutative and associative, the number of

possible query trees increases quickly with increasing the number of relations involved in the

query. Thus, we reduce the goal of the query optimizer to find the best join order, together

with the best join method for each join. For a complex query, involving many relations and

many operators, the number of equivalent operator trees can be very high. Investigating a large

search space may make optimization time prohibitive, sometime much more expensive than the

execution time. Therefore, query optimizers typically restrict the size of the search space

they consider. The first restriction is to use heuristics. Another important restriction is the

shape of the join tree. The following sections discuss the characteristics of the most interesting

kinds of search spaces depending on the shape of the join tree they include.

**Left-Deep and Right-Deep Trees**

The search space restricted to only left-deep trees consists of all query trees where the inner

relation of each join is a base relation as in Figure 2-a. For a fixed number of base relations,

left-deep tree does not leave any degree of freedom concerning the shape of the tree, but there

are n! ways to allocate n base relations to the tree leaves.

Figure 2. Different query trees

In most sequential optimizers like the optimizer of IBM System R, the search space is restricted

to left-deep trees. The reason for this heuristic restriction is that the space including all trees

is much larger, while an optimal or nearly optimal plan can often be found in the much smaller

space of left-deep trees. When join algorithms are not symmetric, which is the case for hash-join,

it is useful to distinguish between left-deep and right-deep trees, see Figure 2-b. Hash-based join

algorithm have been shown to be the most efficient for equi-joins and particularly interesting in a

parallel environment because they naturally induce intra-operation parallelism. Hash-join usually

consists of two consecutive phases: Build and probe. In the build phase, a hash table is constructed

on the join attribute of the smallest relation. In the probe phase, the largest relation is

sequentially scanned and the hash table is consulted for each of its tuples.