Dotnet Thread: Back to Basics: LinkedLists

I tend to subscribe to the belief that programmers with some C background are typically better off than those without. This is largely because C is far less abstract from the underlying O/S and hardware than languages like C# or Java - specifically the memory model. This kind of knowledge is just handy to have - even for day to day programming.

I've covered fundamental memory topics before, including chapter 7 of the Foundation series (free ebook) as well as with the arrays are just fancy pointers post. It's time to put some of that information to quasi-practical use.

One of the first real programming classes I had dealt with data structures. Technically all the primitive type you deal with are simple data structures - a System.Int32 is a data structure capable of holding a 4 byte value representing an integer. What I want to talk about though are slightly more complex data structures - Arrays, Linked Lists, HashTables and Binary Trees (maybe). I think it's great that .NET and Java come loaded with a bunch of data structures. While you can probably get away without ever knowing how they really work, there might be instances where it'd be useful to know how things work under the hood - besides, I think this is just really fun stuff to mess around with.

Arrays

We've already covered arrays in depth. In case you forgot, the most important point to know is that arrays are cut up from a continuous block of memory. It's this fact that lets them be indexed speedily using simple pointer arithmetic. If you want to access the 4th array element (index 3), you simply need take your pointer, which is pointing at the start of the array, and increment its memory location by 3 times the size of the type in the array. This explains why an array's dimension needs to be defined as soon as you declare it. It also explains why you can't simply expand the size of an array - there's a good chance that the memory directly next to the array is being taken by something else. The only way to grow an array is to reallocate a larger chunk of continuous memory and copy the old array into the new one. This is exactly the strategy employed by many .NET classes -including the StringBuilder and ArrayList (if you're interested, open up Reflector, and look at the Capacity property on the ArrayList class).

Linked Lists

The fact that .NET's ArrayLists take care of this automatically (and rather efficiently) is pretty impressive. In school though, for reasons I'll guess at later, we didn't build our own ArrayList, but rather built well-known LinkedLists. As of .NET 2.0 there's actually a built-in generic LinkedList (Java has had one for a while as far as I know). In most situations you'll want to use an ArrayList, but LinkedList do serve a unique purpose - and again, I just think it's good stuff to know as it illustrates how memory works. So, let's take a look at how to build a LinkedList in C#. We'll keep it basic, and won't worry about implementing typical collection interfaces (IEnumerable). There's already a working LinkedList, we just want to explore the data structure to get an even greater understanding of programming.

What Are Linked List?

If an array can be defined as a data structure based on a continuous chunk of memory (which is synonymous with fixed-size), then a LinkedList should be defined as a data structure based on dispersed chunks of memory. As such, LinkedLists can grow to unlimited sizes (only limited by available memory). The fundamental aspect of a LinkedList is that each element within the list points to the next element in the list. This is necessary since items aren't placed one after the other and simple pointer arithmetic's won't work - you can't just do pointer + sizeof(int) to figure out where the next element is - it could be anywhere!

How do we do it?

Building a linked list is pretty simple. We'll need a generic LinkedList class which will expose our Add and Remove method (and Find and whatever else we want), as well as a LinkedListItem class, which will wrap the added item and include a reference to the next item.

Here's a memory representation of what it'll look like:

Notice that our containing LinkedList class has a reference to our first item within the linked list. This is necessary otherwise we'd have no point of access to any of the items.

LinkedList v1

For the first version we'll focus on getting the core components - it won't be able to do anything, but it will set the groundwork for v2.

public class LinkedList<T>  {     private LinkedListItem _first;       public LinkedList(){}     public LinkedList(T firstItem)     {        _first = new LinkedListItem(firstItem);     }       private class LinkedListItem     {        public readonly T Item;        public LinkedListItem Next;          public LinkedListItem(T item)        {           Item = item;        }     }  }

All we can do with this code is instantiated a new version of the LinkedList class and pass in the first element - we can't add more items, delete items or retrieve them. However, it does illustrate the purpose of the LinkedListItem class as a wrapper.

LinkedList v2

For this version of our list we'll add the ability to add new items to the list by adding two methods. We need an Add method to actually add a new element to our list, but we also need a method to find the last element of our list.

public void Add(T itemToAdd)  {     if (_first == null)     {        _first = new LinkedListItem(itemToAdd);        return;     }     LinkedListItem last = FindLastItem();              last.Next = new LinkedListItem(itemToAdd);  }    private LinkedListItem FindLastItem()  {     if (_first == null)     {        return null;     }     LinkedListItem item = _first;     while (item.Next != null)     {        item = item.Next;     }     return item;  }

There's a very common pattern with LinkedList - walking through the elements. To find the last entry - the one where Next is null, we need to go through each item. We can't just index a spot and jump to it.

LinkedList v3

You wouldn't know it from the above code, but one of the things LinkedList excel at is speedily inserting records. In fact, high insert/retrieval ration is probably the only good reason to use them on a modern system. They way we've coded it now, inserts are actually a pretty slow linear operation - the longer the list gets, the longer it takes to find the last element. However, with the simple addition of a reference to the Last item we can make insertion a constant and very fast operation. Here's a visual representation of the memory for our improved LinkedList:

Here's the code (LinkedListItem hasn't changed, so it's been left out):

public class LinkedList<T>  {     private LinkedListItem _first;     private LinkedListItem _last;       public LinkedList(){}     public LinkedList(T firstItem)     {      _first = new LinkedListItem(firstItem);      _last = _first;     }     public void Add(T itemToAdd)     {        if (_first == null)        {           _first = new LinkedListItem(itemToAdd);           _last = _first;           return;        }                 _last.Next = new LinkedListItem(itemToAdd);        _last = _last.Next;     }     private class LinkedListItem{...}  }

Why LinkedLists?

We'll take a break from making improvements to our code to discuss what the appeal of LinkedLists are. Obviously, a main advantage is that they aren't fixed in size, like arrays. Another advantage is that thanks to our last code change, insertion is a constant and speedy operation. Yet another advantage is that an item can be added to the middle of the linked list with minimal effort - we don't need to shift our entire list, simply reroute our next pointer. Finally, since LinkedLists don't require continuous memory, they can be used with highly fragmented memory space.

Insertion into an ArrayList is also fast, but not constant. Once the ArrayList fills, it must grow. Additionally, when it does grow, a continuous group of memory must be found to meet the new size. Of course, LinkedList do have one major drawback: random access is brutally slow. We have to walk through the list to find the desired item.

LinkedList v4

Deleting from a LinkedList is a little tricky (at least the way we've set it up). We don't just remove the item, we also have to reroute our list. The item that pointed to our removed item must now point to the removed item's next. If 1 --> 2 --> 3 and we remove 2, then 1-->3. This is pretty tricky to cleanly implement as-is. One way we can improve the maintainability of our LinkedList is to change each item so that it not only tracks the Next item in the list, but also the Previous one (this is called a doubly-linked list).

public class LinkedList<T>  {     private LinkedListItem _first;     private LinkedListItem _last;       public LinkedList(){}     public LinkedList(T firstItem)     {        _first = new LinkedListItem(firstItem);        _last = _first;     }     public void Add(T itemToAdd)     {        if (_first == null)        {           _first = new LinkedListItem(itemToAdd);           _last = _first;           return;        }                 _last.Next = new LinkedListItem(itemToAdd);        _last.Next.Previous = _last;        _last = _last.Next;     }       public void Remove(T itemToRemove)     {        LinkedListItem found = Find(itemToRemove);        if (found == null)        {           return;        }        //this is the _first element        if (found.Previous == null)        {           _first = found.Next;        }        else        {           //reroute the list           found.Previous.Next = found.Next;        }        //this is the last element        if (found.Next == null)        {           _last = found.Previous;        }                      }     private LinkedListItem Find(T itemToFind)     {        if (_first == null)        {           return null;        }        for (LinkedListItem current = _first; current != null; current = current.Next)        {           if (current.Item.Equals(itemToFind))           {              return current;           }        }        return null;     }       private class LinkedListItem     {        public readonly T Item;        public LinkedListItem Next;        public LinkedListItem Previous;          public LinkedListItem(T item)        {           Item = item;        }     }  }

Fin

Today, it's safe to consider LinkedLists as highly specialized data structures. However, back when memory was a real constraint, they were a core part of any program. Even the choice of creating a doubly-linked list could be difficult as every item within the list consumed an extra reference (8bits on an 8bit platform, 16bits on a 16bit platform and so on). Even if LinkedList aren't as useful today as they once were, they're still great for illustrating how our code interacts with system memory. They also show us how arrays and ArrayLists really work, by providing an interesting contrast. Thanks Mr. Woollard for teaching me all about data structures!

Source Click Here.

Dotnet Thread

Monday, August 4, 2008

Back to Basics: LinkedLists

Arrays

Linked Lists

What Are Linked List?

How do we do it?

LinkedList v1

LinkedList v2

LinkedList v3

Why LinkedLists?

LinkedList v4

Fin

No comments:

Post a Comment

Originals Enjoy

Dotnet Thread

Monday, August 4, 2008

Back to Basics: LinkedLists

Arrays

Linked Lists

What Are Linked List?

How do we do it?

LinkedList v1

LinkedList v2

LinkedList v3

Why LinkedLists?

LinkedList v4

Fin

No comments:

Post a Comment

Originals Enjoy

Archives