Improved B-trees To Reduce Maintenance Cost And Fragmentation

Database warehouses support processing very large amounts of data. The workload is primarily composed
of SELECT queries; updates are less frequent and typically come in batches. And yet the INSERT operation
can be expensive and have a significant impact on SELECT query performance. Inserted data has to be
placed in its respective location based on presence of indexing structures; excepting the index based on
the arrival time stamp, the updates are likely to manifest all over the data range. If the modification
do not exhibit an access locality, each one will touch one or more different disk pages causing a high
performance penalty (irrespective of any batching). The relevant page has to be read, modified and
eventually written back to disk once the memory buffer fills up. In this paper we propose a Fractured
Index mechanism to alleviate the penalties associated with the update costs while still being able to
use B-Tree indexes. By creating a mini-Index for each batch of data, we ensure insert locality (all data
within the mini-Index is freshly inserted) and keep the indexes read-only (achieving better fragmentation
of these indexes and providing compression opportunities). The idea of batching is frequently used in
databases; storing insert batches in the B-Tree structures is a logical extension of the same approach.
Although earlier work has presented a similar delayed update approach, we believe that our idea is much
simpler to implement.This minimalistic implementation achieves two orders of magnitude improvement of
INSERTs and one order of magnitude improvement of SELECT queries.

B-tree provides a fast read access to data by sorting the data by indexed attribute (key) values and
hierarchically organizing the data into intermediate and leaf nodes. B-tree splits and merges nodes upon
insertions and deletions to keep the tree balanced. Most database products provide two types of B-tree
indexes, primary (clustered) and secondary (non-clustered). A primary B-tree index stores tuple data in
its leaf nodes while a secondary B-tree index stores only pointers to the primary index (or heap if no
primary index exists). A primary B-tree index provides a faster read access than a secondary B-tree index,
especially for Online Analytical Processing (OLAP) style queries because an access path using a secondary
Btree index needs a random disk seek for each tuple to be fetched while OLAP queries are often not selective.
A primary B-tree index requires only one seek to locate the needed tuples and then performs a sequential scan.
This is substantially more efficient because one random disk seek takes several milliseconds while a
sequential disk access takes only microseconds to read a tuple.

Although reads on B-tree in its initial state are efficient, writes on B-tree create two problems. First, each
write operation requires random disk seeks to access and overwrite data in B-tree. Second, splitting and
merging nodes can cause severe fragmentation of B-tree structure on disk. This can significantly degrade the
read performance on B-tree because each fragment requires a jump (seek) even for sequential read on a primary
index. These problems become more and more significant as the random disk seek performance has not been
increasing over a few decades while disk capacity and sequential disk access performance are rapidly
increasing. we propose a new data structure, Fractured Index, which overcomes the problems of B-tree by using
only sequential accesses on disk. Fractured Index is a few orders of magnitude faster than B-tree to maintain
and also an order of magnitude faster to query after heavy write operations because Fractured Index causes no
fragmentation on disk. It is especially beneficial for OLAP type queries which are less selective. It mainly
focus on the context of OLAP, but the same technique could be used to Online Transaction Processing (OLTP)
setting to achieve the better maintenance performance with a comparatively more performance penalties on read


The key idea of Fractured Index is to sequentially output small B-trees to reduce the overhead of inserts
into tables that are organized as primary and secondary B-trees. Inserting new tuples to random places of the
B-tree is a costly operation which cause random disk seeks and fragmentation due to splits and merges of
B-tree nodes. DBMS uses a buffer pool to alleviate this problem by keeping dirty (modified) pages in memory
and later writing them back to disk, but it is impossible for the buffer pool to contain all of the necessary
leaf pages when millions of new tuples are inserted to the main table. The leaf pages are continuously being
swapped between the disk and buffer pool as inserts continue to come in unless new tuples are coming in the
order of index keys, which does not occur in general case. This results in a large number of disk seeks and
fragmented B-trees.

The Fractured Index mitigates these problems by sequentially recording data changes (insertions and deletions)
into small B-trees separated from the main table. In order to sequentially write small B-trees (mini-index),
the Fractured Index buffers insertions and deletions into the database in Insert Buffer and dumps the data
changes to a new mini-index when the current insert buffer becomes full. When reading the database, the query
processor reads from each mini-index in addition to the main table.


, , , , , , , ,

  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: