So I recall talking about BSTs once, and ending it on a note of concern. The problem with a BST was that, following its almighty rule, it was still possible to create something akin to a linked list. This completely destroyed the whole reason behind using a BST, which was the fact that a BST can potentially reduce our time for searching for a particular node by a lot, specifically by the factor O(lg n)
(and yeah, the lg
is not a mistake, it is nothing but the logarithm to the base 2).
So how do we ensure our dear friend the Binary Search Tree retains its O(lg n)
time required for searching? Enter the concept of balancing and the world of balanced binary search trees. In this episode, we are gonna be taking a look at the basics of AVL Trees.
Prerequisites
Ah, the prerequisites. Well, for today's post, all you are gonna be needing is BSTs, which you can read about right here. Other than that, you may require to hold on to your wits, because this is not exactly a very simple topic.
The Beginning
In the beginning, there were Binary Search Trees. And they were good. But three dudes whose names started with the letters A, V and L (I could Google their names right now, but I'm actually writing this without an internet connection) realized that they were not good enough! So they set out to create a new data structure, inventively called AVL Trees (and confused some people: why would you do this? Most people (including myself) don't even know what AVL stands for. At least you could have named it after its purpose or something). And then began a revolution of more streamlined and effective algorithms, blah blah blah.
So anyway, cutting to the chase, what necessitated the AVL data structure was outlined, in like the first paragraph of this post. What is more important is HOW it works. The AVL data structure does what we want it to do: reduces searching to a worst-case scenario of O (lg n)
. What this means is that the BST data structure, that potentially allowed a worst-case O(n)
scenario, has been drastically reduced in complexity (hehe, time-wise, m'boy! Programmers always have the worst of it when reducing so-called "complexity"). Therefore, basically we are gonna have faster searches, guaranteed.
What about insertion and deletion? The incredible part about AVL Trees is that even these take O(lg n)
time. Sure, it may not sound as good as arrays or linked lists (those have constant-time insertion, but are meh on searching), but this complexity makes AVL Trees pretty frickin' awesome in terms of data structures.
How it works
Okay, so the whole point of this post is to get you acquainted with AVL Trees. I'm not going to cover the mechanics of insertion and deletion here, because those concepts are quite long and hard to wrap your brain around, and have the potential to make this post a gazillion miles long (notwithstanding my penchant for writing, like a lot). However, I will cover the basics of how an AVL Tree looks like and works.
Before we begin, I want to make one thing clear. I am using a Height Convention of 1. What this means is that a single leaf node possesses a height of 1 in my book. Now I know a lot of you have been taught otherwise, with heights of 0 and 2 and whatnot, but forget about all that and follow this convention when reading this post. It will make further discussion a lot more painless.
For those of you who don't know what the height of a BST is, first of all, shame on you. I know I did not cover that in my post on BSTs, but I never said you have to stop there. Google stuff and learn! It's good for you. Secondly, the definition is basically this:
Height (node x) = length of max path down from node x, including x
I am using the "including" convention here. A lot of people use the "excluding" convention (which I believe is the prevailing convention) but I don't like that. According to that, the height of a single leaf node is 0, which is plainly absurd. So no, I am not going down that path.
Anyway, AVL Trees are BSTs (which means the BST Rule applies), with one more rule:
A node in an AVL Tree has the height difference between its left and right subtrees as either 1, 0 or -1.
Confused? Here's what it means. Consider the node A
and its subtrees B
and C
below.
In the above, remember that B
and C
are the topmost nodes of those subtrees (that's the subtree notation). Now in an AVL tree, height(B) - height(C) = 0 or 1 or -1
, and this is true for all nodes A
in the AVL tree.
In other words, no matter which node you take, it can have an equal depth on both its children, or it can have a single additional node in the height balance on one of its children. In other words, every node is almost completely balanced.
This rule above makes AVL Trees strictly height-balanced. For example, this is an AVL Tree:
While this is NOT an AVL Tree (spot why!):
This is also an AVL Tree:
While this isn't:
From the above examples, it should be plainly clear that an AVL Tree enforces a sort of "strict" balancing and does not really allow any linked-listey formations (look at the fourth image, immediately above. The thing looks like two linked lists joined at the top). This limits the height of the tree to reasonable proportions depending on the number of nodes. This in turn results in faster searching as we have to traverse fewer levels to reach (or not) the node we want.
Now, while working with AVL Trees, you may wonder how we enforce this height balance. The method is quite simple: we change the root of the node whenever the tree goes out of balance. For a simple example, consider the tree containing three elements: 5, 6 and 7. Now these can be put into a binary tree in three ways:
We see that all three ways are valid BSTs, with elements on the left less than elements on the right. However, out of the three, only the middle one is a valid AVL Tree, or is well-balanced. Therefore, as a rule of thumb, we try to make the horizontally middling value as the root of the tree whenever balancing is concerned.
How that is done is beyond the scope of this post. For now, however, know that this is what we must ensure when dealing with AVL Trees. Just like normal BSTs, an AVL Tree must satisfy the AVL Property at all stages (along with the BST Property itself... AVL Trees are subsets of BSTs).
Worst Case Scenarios
Even in an AVL Tree, there IS a worst case of O lg(n)
. We will see that this case appears quite definitely in one particular form of an AVL Tree: one where no node has balance factor 0. In other words, the tree is as staggered as it can get: every node has a balance factor of either 1 or -1. This sort of tree is called a Fibonacci Tree and looks sorta like this:
I did not make the above image. This is from Wikipedia. Either way, that shouldn't matter.
For those of you who are too dense to realize it, every node shows the height and not the balance factor of itself in the above image. The balance factor of each node above (except the leaf nodes) is 1.
These nodes have the worst case of O (lg n)
. Why that is so will be explained when I cover insertion on this blog. Until then, know that AVL Trees, in general, are best suited for purposes where you have a (mostly) static tree in which you do loads of searches. For trees where you modify stuff more often, you use the big brother of all binary trees, the Almighty Red-Black Tree, which I am almost scared to write about because there is so much to write there.
But until then, I have two more posts to go regarding AVL Trees themselves, so I'm gonna get right to that. Meanwhile you guys can go Google it or something. Stay alive, stay alert, keep learning.
PS: That was totally lame.
PPS: I know I have a lot more posts to catch up on (4 more in my reckoning), but I've been having internet troubles for a day or so, so I'm really sorry about that. Plus, my original resolution requires me to learn a new concept everyday, which I am doing. These concepts I am blogging currently are stuff I have learnt back during my semester. The newer concepts are going into the queue and will be here later. 'Til then, ciao!