Dwarf is a patented (US Patent 7,133,876) highly compressed structure for computing, storing, and querying Data Cubes. It is a highly compressed structure with reduction reaching 1:1,000,000 depending on the data distribution. The method is based on finding prefix and suffix redundancies in high dimensional data. Prefix redundancies occur in dense areas of the cube and some existing techniques have utilized. However, we discovered suffix dependency is a lot more higher in sparse areas of multi-dimensional space. The two put together fuse the exponential sizes of high dimensional cubes into a dramatically condensed LOSSLESS store.
With the Dwarf technology, we managed to create the first lossless full PetaCube in a Dwarf store of 2.1GBytes and construction time 80 minutes. The PetaCube is on a 25-dimensional fact table which generates a full cube of a Petabyte in size if stored in binary (all possible 2^^25 un-indexed views/summary tables with two aggregate values). This a 1000-fold bigger than Microsoft's TeraCube of the future. We also surpassed the fastest OLAP Council APB-1 benchmark density 5 published by Oracle. The Dwarf Cube creation time is 20 minutes and the size of it 3GB compared to Oracle's 4.5 hours and 30+GB. We further pushed the APB-1 benchmark to its maximum possible density 40 in just 7 hours compute time and about 10GB in size. To the best of our knowledge, no one else has even tried this. This enormous storage reduction comes with NO loss of information and provides a fully indexed cube that includes the original fact table.
The most important aspect of this patented Dwarf technology is that its data fusion (prefix and suffix redundancy elimination) is discovered and eliminated BEFORE the cube is computed and this explains the dramatic reduction in compute time. A complete version of the Dwarf Cube software with full support of hierarchies is available to interested parties under an NDA and a 90-day evaluation agreement.
The Dwarf Project Publications:
Back to the
home page