Even with an ideal virtual table interface, traditional execution of queries involving WebCount or
WebPages would be extremely slow due to many high-latency calls to one or more Web search engines.
The optimizations that can reduce the number of external calls, and caching techniques are
important for avoiding repeated external calls. But these approaches can only go so far—even after
extensive optimization, a query involving WebCount or WebPages must issue some number of search engine calls.
In many situations, the high latency of the search engine will dominate the entire execution time of the
WSQ query. Any traditional non-parallel query plan involving WebCount or WebPages will be forced to
issue Web searches sequentially, each of which could take one or more seconds, and the query processor
is idle during each request. Since Web search engines are built to support many concurrent requests, a
traditional query processor is making poor use of available resources.
Thus, we want to find a way to issue as many concurrent Web searches as possible during query
processing. While a parallel query processor (such as Oracle, Informix, Gamma, or Volcano)
is a logical option to evaluate, it is also a heavyweight approach for our problem. For
example, suppose a query requires 50 independent Web searches (for 50 U.S. states, say).
To perform all 50 searches concurrently, a parallel query processor must not only dynamically
partition the problem in the correct way, it must then launch 50 query threads or processes.
Supporting concurrent Web searches during query processing is a problem of restricted scope
that does not require a full parallel DBMS.
In the remainder of this section we describe asynchronous iteration, a new query processing
technique that can be integrated easily into a traditional non-parallel query processor to
achieve a high number of concurrent Web searches with low overhead. Asynchronous iteration is
in fact a general query processing technique that can be used to handle a high number of
concurrent calls to any external sources. (In future work, we plan to compare asynchronous
iteration against the performance of a parallel query processor over a range of queries
involving many calls to external sources.) As described in the following subsections,
asynchronous iteration also opens up interesting new query optimization problems.