|
CubeTree Storage Organization
In this work, we introduced the Cubetree as a storage
abstraction of the Data Cube and argued for a compact representation
for it using packed R-trees. We developed algorithms for allocating
each cube query to the projected Cubetree that gives good clustering
and for selecting good sort order within each Cubetree. We then used
Cubetree as an alternative storage organization for ROLAP aggregate
views and presented an efficient algorithm for mapping an arbitrary
set of views to a collection of Cubetrees that achieve excellent
performance.
The Cubetree organization combines both storage and indexing in a
single data structure and still within the relational paradigm. We
have shown that when compared to a conventional (relational) storage
organization of materialized OLAP views, Cubetrees offer at least a
2-1 storage reduction, a 10-1 better OLAP query performance, and a
100-1 faster updates. We
compared
the two alternative approaches with data generated from the TPC-D
benchmark and stored in the Informix Universal Server (IUS). The
straight forward implementation materializes the Relational-OLAP
(ROLAP) views using IUS tables which are then indexed with B-trees.
The Cubetree implementation materializes the same ROLAP views using a
Cubetree Datablade developed for IUS. This Datablade implements the
Cubetrees, the mapping algorithm and the supporting routines on the
Informix Universal Server. It defines a
Perhaps the most critical issue in data warehouse environments is the
time to generate and/or refresh its derived data from the raw data.
The mere size of them does not permit frequent re-computation.
Creating a new cube every time an update increment is obtained is not
only wasteful but, it may require a window that leaves no time for
OLAP. This is especially critical because the off-line window for
computing the cube and its indexes has shrank due to the international
operations of the organizations. In this work we argued that
maintenance of the cube should be considered as a bulk incremental
update operation and that record-at-a-time updates or full
re-computation are not viable solutions. We have reduced the
incremental refresh problem to sorting the update increment and
merge-packing the Cubetrees. Rewriting the Cubetrees into fresh
storage permits the use of the old Cubetrees during maintenance and,
thus, eliminates query down time. Then, based on the rewrite cost, we
investigated how to obtain good refresh schedules for amortizing
maintenance cost.
CubeTrees Papers
CubeTrees People |
|
|