B-tree

  • Memory hierarchical system: Registers, L1 cache, L2 cache, Main memory(RAM), Hard Disk, Network(cloud)

  • the CPU determines which data to put in which sections of memory largely based on spatial locality. In other words, if we are accessing data that is close to other data, the CPU will also load the close data into an L1 cache because it predicts that we might need to access the close data in the near future.

  • accessing other data that is inside the node structure is expected to be faster than simply traversing to another node.

  • minimize the amount of node traversals we have to do to find data (i.e., make the trees as short as possible) and store as much data as possible close together (i.e., have each node store multiple keys).

  • The data structure that does the above is called a B-Tree. A B-Tree is a self-balancing tree data structure that generally allows for a node to have more than two children (to keep the tree wide and therefore from growing in height) and keeps multiple inserted keys in one node.

  • "order b," where b is the minimum number of children any node is allowed to have and 2b is the maximum number of children any node is allowed to have.

  • Formally, we define a B-Tree of order b to be a tree that satisfies these properties:

    1. Every node has at most 2b children.
    2. Every internal node (except root) has at least b children.
    3. The root has at least two children if it is not a leaf node.
    4. An internal node with k children contains k–1 keys.
    5. All leaves appear in the same level. (We will later see how this is enforced during the insert operation)
  • What is the worst-case time complexity for a B-Tree insert operation?

    O(blog_b(n))
    
  • Worst-case scenario, during insertion, we need to re-sort the keys within each node to make sure the ordering property is kept; this is a O(b) operation. We also need to keep in mind that in the worst case, the inserting node overflows and we therefore need to traverse all the way back up the tree to fix the overflowing nodes and re-assign pointers; this is a O(log_b(n)) operation.

  • Exercise break: correct options

    1. A B-Tree will always have all of its leaves on the same level. There are absolutely no exceptions.
    2. In theory, a B-Tree has the same worst-case time complexity as a regular self-balancing tree for the find operation.
REFERENCE: Stepik 3.8

results matching ""

    No results matching ""