From meesh@cs.umd.edu Mon Feb 28 01:39:34 2005 Date: Mon, 6 Oct 2003 01:51:09 -0400 From: Michelle Hugue To: Saila Scod Cc: Dariush Samari , Eric Liu Newsgroups: csd.cmsc420 Subject: Part 1: B+ tree clarified (was Re: confused about B+ trees) Dariush writes: >I'm looking at figure 10.16 and it shows an internal node with only two keys but three pointers. I thought that maybe what Shaffer means is that there is a hidden key, which they don't show, in the leftmost part of that node, which is less than or equal to any key value in the node's leftmost subtree. Bottom line, I'm confused about this 'order' business, how to determine >the maximum number of elements a leaf can have, and how they connect. an answer: indeed, you are correct in your reading of shaffer's text. and, you can implement yours that way if you want. however (and, no offense implied), you needed to read the whole section of the chapter--not just look at the picture and read the pseudo code--to understand why the number of KEYSs (corresponding to actual data records) in the leaves may be different than the number of GUIDE values in internal nodes. however, before we can do that, we need to get some nomenclature regarding IMPLEMENTATION a B+ tree of order m. 1) internal nodes and external nodes are devised to take one page of main memory. that is, the order, m, of the B+ tree is selected so that the amount of space required by all the stuff in the node is as close to a page as possible, without exceeding a page. (to find out what a page is, read some of the page-specific email available, or go visit google using "virtual memory page faults" or " memory page faults" 2) internal nodes contain space for "pointers" or "references" or LINKS to m child nodes, which can be either internal or external nodes. internal nodes also contain space for the values, called GUIDE values, that are used to determine which of the m child subtrees of the node will be accessed next. So, space for m-1 of these guides is needed. 3) external nodes contain space for the actual KEY values and associated data record or data record pointer. the question is, how much? that is determined by the size of the KEY and the size of the data record or data record pointer So, the only specification is that the leaf must be half full. I typically say that we assume that each leaf can contain, say, p KEYs, where p need not have any relation to m, the order of the B+ tree 4) To make B+ tree maintenance easier, nodes on each level are typically connected as a doubly linked lists. This has the added advantage of supporting range queries. Check out Part 2: the B+ tree information rules in a later posting. (That's what i did in lecture on wednesday) m