A New Internet Library: Add Your Website/Blog or Suggest A Website/Blog to our Free Web Directory http://anil.myfunda.net.

Its very simple, free and SEO Friendly.
Submit Now....

Sunday, August 3, 2008

Recursing into Recursion - Memoization

Lately, I've been heads down on a lot of concurrency items which will hopefully come out soon.  In the mean time, I want to get back to the basics one last time with recursion.  As I posted earlier, I've been talking about recursion in the past couple of posts, and this one will be one last on the topic.  I'll do my best to post the samples in both C# and F#.  As always, you can find my C# samples on MSDN Code Gallery at the Functional C# Samples.

Let's catch up to where we are today:
Memoization

When doing heavy computations through recursion, memoization becomes a pretty important topic.  The idea behind memoization is to speed up your programs by avoiding repetitive calculations for previously processed function inputs.  Let's go ahead and try a simple example using the ever so trite Fibonacci sequence as a quick example.

F#
let rec fibonacci n =
  if n <= 2 then 1 
  else fibonacci(n-1) + fibonacci(n-2)

C#
static int Fibonacci(int n)
{
    return n <= 2 ? 1 : Fibonacci(n - 2) + Fibonacci(n - 1);
}

But, if you call these above functions time and time again with nearly the same input, they have to go through the same computation again and again, which isn't necessary.  So, instead, let's try through the technique of memoization to speed up this computation.

F#
let fibonacci = 
  let t = new Dictionary<_,_>()
  let rec fibCached n = 
    if t.ContainsKey(n) then t.[n]
    elif n <= 2 then 1
    else
      let res = fibCached(n - 2) + fibCached(n - 1)
      t.Add(n, res)
      res
  fun n -> fibCached n

C#

 

public static Func<int, int> Fibonacci()
{
    var t = new Dictionary<int, int>();
    Func<int, int> fibCached = null;
    fibCached = n =>
    {
        if (t.ContainsKey(n)) return tNo;
        if (n <= 2) return 1;
        var result = fibCached(n - 2) + fibCached(n - 1);
        t.Add(n, result);
        return result;
    };

    return n => fibCached(n);
}

As you can see from the above code, we're using a Dictionary<TKey, TValue> as our storage for already computed values.  Then we go ahead and compute the values if we have not seen them before.  We'll see a pretty impressive speed boost when calling with repetitive values.  But, that's not a generic solution, and I'd rather have the ability to take a generic function and pass any function to this and compute.  So, let's go ahead and make this generic.

F#
let memoize (f : 'a -> 'b) =
  let t = new System.Collections.Generic.Dictionary<'a, 'b>()
  fun n ->
    if t.ContainsKey(n) then t.[n]
    else
      let res = f n
      t.Add(n, res)
      res
  
let rec fibonacci= 
  memoize(fun n -> if n <= 2 then 1 else fibonacci(n - 1) + fibonacci(n - 2))

C#
public static Func<T, TResult> Memoize<T, TResult>(this Func<T, TResult> func)
{
    var t = new Dictionary<T, TResult>();
    return n =>
    {
        if (t.ContainsKey(n)) return tNo;
        
        var result = func(n);
        t.Add(n, result);
        return result;
    };
}

static int Fibonacci(int n)
{
    return n <= 2 ? 1 : Fibonacci(n - 2) + Fibonacci(n - 1);
}

Func<int, int> fibonacci = Fibonacci;
var fibonacciMemoized = fibonacci.Memoize();

As you can see, I was able to create a generic memoize function that takes a function as its input, checks whether the item is in the cache and if not, adds it to the cache.  Using the C# version, we created an extension method for our given Func<TArg, TResult>.  This allows us to memoize the function correctly.  Alternatively, I could have written the memoized function this way:

C#
static Func<int, int> Fibonacci()
{
    return Extensions.Memoize((int n) => n <= 2 ? 1 : Fibonacci(n - 2) + Fibonacci(n - 1));
}

var fibResult = Fibonacci(30);

To me, this reads a bit more of functional programming and the true power of it.  I basically took the above methods and combined them so that my Fibonacci function now returns the function instead taking an integer and returning an integer.

Beware of the Bad Memoization

In order to get memoization to work properly, you must remember not to include the value to the function when declaring the function.  This will cause a fresh memoization table to be created each and every time.  Such functions such as these should be avoided as below:

F#
let rec fibonacciIncorrect n =
  memoize (fun n -> if n <= 2 then 1 else fibonacciIncorrect(n - 2) + fibonacciIncorrect(n - 1)) n

C#
static int FibonacciIncorrect(int n)
{
    return Extensions.Memoize((int x) => x <= 2 ? 1 : FibonacciIncorrect(n - 2) + FibonacciIncorrect(n - 1))(n);
}

Generic Memoization Service

Due to the subtlety of problems mentioned above, it may be better to return an object instead of a function.  These can be pretty simple objects with no more functionality than to return values and discard the given collection.  Let's go ahead and do this sample in F# first.  What I'm going to do is define the holder type for the calculated value:

type ITable<'a, 'b> =
  abstract Item : 'a -> 'b with get
  abstract Discard : unit -> unit

This defines for me an interface with a Discard method and an Item method which accepts 'a and returns 'b via a get accessor.  Pretty simple holder for my data.  Now, let's define the memoize function which does the work:

let memoize f =
  let outerTable = new Dictionary<_,_>()
  { new ITable<_,_> with
      member t.Item
        with get(n)
          if outerTable.ContainsKey(n) then
            outerTable.[n]
          else
            let res = f n
            outerTable.Add(n, res)
            res
      member t.Discard() =
        outerTable.Clear()
  }

What you can see from the above is that I'm creating an inner class of type ITable which I defined above.  Then I implement the given methods and properties I defined on the interface for Item and Discard().  During the Item property, I'm checking whether I have the given item.  If I do, I return it from the collection, else I execute the given function and store the result.  During my discard, I give myself the ability to reset the cached values.  Now, let's implement the Fibonacci sequence function once again with our new function that we defined above.

let rec fibonacci =
  memoize 
    (fun n ->
      if n <= 2 then 1 else fibonacci.[n - 1] + fibonacci.[n - 2])

Now with this function, I can execute the given methods directly on the class that was returned from the memoize function.  As you can see, I am getting values from the fibonacci object and calling the function once again on itself.  I can then calculate the given numbers while using this code below:

let result = fibonacci.[10]
printfn "%i" result

That's pretty slick, isn't it?  Now, let's try in C#.  It's not going to be as elegant nor to the point.  But, let's define our Table class first.

public class Table<T, U>
{
    private readonly Dictionary<T, U> dictionary;
    private readonly Func<T, U> func;

    public Table(Func<T, U> func, Dictionary<T, U> dictionary)
    {
        this.dictionary = dictionary;
        this.func = func;
    }

    public U this[T n]
    {
        get
        {
            if (dictionary.ContainsKey(n))
                return dictionaryNo;

            var result = func(n);
            dictionary.Add(n, result);
            return result;
        }
    }

    public void Discard()
    {
        dictionary.Clear();
    }
}

Now that this is defined, then the rest of our methods should come nicely as noted here.

static Table<T, U> Memoize<T, U>(Func<T, U> func)
{
    var table = new Dictionary<T, U>();
    return new Table<T, U>(func, table);
}

static Table<int, int> Fibonacci
{
    get
    {
        return Memoize((int n) =>
        {
           if (n <= 2) return 1;
           return Fibonacci[n - 2] + Fibonacci[n - 1];
        });
    }
}

var fibs = Fibonacci[30];

Not as elegant in the F# version because C# won't let me create an anonymous inner class as F# does.  If you look around the F# codebase, you'll find them using this plenty to define such things that return IEnumerable<T> for Map and so on.

Some Notes on Memoization

Don't forget that proper memoization requires the functions that you are writing to be pure.  What that means is that given the same input, the same output will always be calculated and returned.  Also note that none of the code I wrote was particularly thread safe, so I would definitely recommend if you were going to worry about concurrency here with our shared mutable state, that you would use the appropriate locks and so on.

Wrapping It Up

Now I was able to show you how to speed up some of those recursive algorithms using memoization.  This allows for a bit better efficiency when doing large and pure calculations.  Next time, I'm going to start going back to my concurrency topics as I have a lot yet untouched.  With my recent presentations on the matter, there is plenty to talk about.



Source Click Here.

No comments:

Post a Comment

Post your comments here:

Originals Enjoy