Integrated Query and search of database, XML, and the Web

The amount of information available on-line is proliferating at a tremendous rate. At one extreme,

traditional database systems are managing large amounts of structured, well-understood data that

can be queried via declarative languages such as SQL. At the other extreme, millions of unstructured

Web pages are being collected and indexed by search engines for keyword-based search. Recently,

XML— the eXtensible Markup Language— has emerged as a simple, practical way to model and

exchange semi structured data across the Internet, without the rigid constraints of traditional database

systems.

This describes work towards unifying and integrating query techniques for traditional

databases, search engines, and XML. First, we describe our contributions to the Lore DBMS for

managing semi structured data, focusing on ways to enhance system usability for effective querying

and searching. Next, we discuss algorithms and indexing techniques that enable effective keyword based

search over traditional and semi structured databases. We then describe how we have migrated

and enhanced our research on semi structured data to support the subtle but important nuances of

XML. Finally, we describe a new platform that enables efficient combined querying over structured

traditional databases and existing Web search engines.

Keyword Search Over Semi structured And Structured Databases

Keyword-based search is very useful for unstructured documents, and often is the only way to query

such data. Keyword search also can be very useful over more structured data, since it is inherently

simple for users to master and often is sufficient for the task at hand. However, some IR concepts and

algorithms must be reconsidered in a database setting. In particular, proximity search benefits from

a new approach in a database setting. Traditionally, proximity search in IR systems is implemented

using the “ near” operator. If we search our document collection for “Harrison Ford” near “Carrie

Fisher”, we are looking for documents where those two names appear “ close” to each other, where

closeness is measured by textual proximity. In this sense, proximity search is a relatively simple,

“ intra-object” operation: we measure proximity along a single dimension (text) in each document.

Now, suppose that we have fully migrated our movie document collection to XML. Each movie

might begin with a MOVIE  tag, followed by nested tags for that movie’s actors, producers, etc.

In this setting, we want to account for “ structural proximity” in the database, while textual proximity

may not be relevant. For example, if Harrison Ford and Carrie Fisher both star in the same movie,

then they will both be sub elements of a specific MOVIE element. In the textual representation,

however, there may be many other actors lexically in between these actors. Similarly, we may find

that the last actor listed for some movie X is textually close to the first actor listed for an adjacent

movie Y— but this doesn’t mean that the two actors are related in any way. Thus, we need to extend

the notion of proximity search to handle the structure inherent in a semi structured database.

As per that, algorithms and techniques for performing proximity search over

a graph-structured (semi structured) database are applicable to a traditional relational or object oriented

database as well. We can (logically) translate a relational database into a graph based

on the schema and on primary/foreign key relationships. We can then use our proximity search techniques to measure the distance between database elements based on the graph representation. Viewing an object-oriented database as a graph is of course even simpler. By combining proximity search with traditional indexing techniques for identifying tables or attribute values that contain given keywords, we can provide keyword-based search (and browsing) for traditional databases.

Advertisements

, ,

  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: