Monday, May 13, 2013

Introduction to RIFL

I want to introduce a new pattern for resource management, which I call RIFL (pronounced "rifle"), which stands for "Resource Is Function-Local".  RIFL has some advantages (besides the better name) over another common pattern, RAII.  One big advantage of RIFL is that it can be used in languages like Go which do not have destructors.  I'll compare RIFL and RAII in detail in an upcoming post, but first I want to introduce RIFL, and provide some examples.

"Resource" is a word which here means "something which, when you're done using it, it's best to tell someone that you're done".  That is, when you're done with memory, you should free (or delete) it; when you're done with a file (or network connection), you should close it; when you're done with a transaction you should commit it (or roll it back); when you're done with a lock being locked, you should unlock it.  And a big "et cetera", because this sort of thing is pretty common.  Also, this is slightly broader than the more typical definition of resource; for instance, while a lock itself is often considered a resource, here I'm consider the state of it being locked to be a resource (i.e., "holding the lock", rather than just the existence of the lock).

When you're done reading the book, you should put it back on the shelf.

The life-cycle of a resource is: Obtain, Use (and Use some more), Release.  People often say "allocate" or "open" instead of "obtain", and "free" or "deallocate" or "close" instead of Release; and of course, there are a thousand variants of "use" depending on the resource: "read", "write", "update", etc.  There can be several "use"s for a single resource.  Files typically have "read", "write", and sometimes "seek".  With a database transaction, you have at least "select", "insert", "update", and "delete"; and perhaps "merge", "bulk insert", "create", and "drop".  But when talking about resources in general, I'll just stick with "Use".

The challenging part of resource management is making sure you never forget to put the book back on the shelf, even if you leave the library because of a fire alarm.  I mean, the hard part is being sure that you release the resource even with complicated control flow, including an exception being thrown.  Occasionally it also gets tricky not using the resource after it is released, and especially only releasing the resource once.

One resource that people don't usually think about managing is stack frames (i.e., local variables in a function).  You could say "When you're done with the stack frame, return from the function."  But you don't need to say that; it's pretty natural.  And RIFL takes advantage of this by turning managing other resources (which is hard) into managing the stack (which is easy).

My examples in this post will be mainly in a C++-like language that has a "finally" clause for the "try" statement.  Which is not standard C++, but is easy to understand.

A typical raw (unmanaged) resource then would look like this:

class RawResource {
public:
    static RawResource Obtain();
    void Use();
    void Release();
};

Often Obtain() and Use() take parameters, but that's not important for getting the basic idea.  As I mentioned before, there are often actually many Use() methods.  Occasionally, as with locks, there may be no Use() methods at all.

Also note that Obtain() is a static method returning an object of the class.  Essentially that's a constructor; so, Obtain() can just be the constructor for the class.

This is how a raw resource is supposed to be used:

void ExampleLeakyRawUsage() {
    RawResource resource = RawResource.Obtain();
    resource.Use();
    DoSomethingElse();
    resource.Use();
    resource.Release();
}

That's pretty straightforward, but if Use() or DoSomethingElse() can throw, the Release() will never get called.  It's also possible to have a dozen places in your code where you use the resource, and you could easily miss a Release in one of them; or just miss it along one code path if there is complicated code in the middle.  This leaks the resource, which is bad.

If you want to use the resource correctly, you have to do something like this:

void ExampleRawUsage() {
    RawResource resource = RawResource.Obtain();
    try     {
        resource.Use();
        DoSomethingElse();
        resource.Use();
    } finally { // not C++
        resource.Release(); // guaranteed
    }
}

That's a lot of boilerplate every time you want to use the resource.  Here's a very similar function:

void AnotherExampleRawUsage() {
    RawResource resource = RawResource.Obtain();
    try {
        DoAnotherThing();
        resource.Use();
        DoSomethingElseEntirely();
        resource.Use();
    } finally { // not C++
        resource.Release();
    }
}

Not really very different.  Followers of the DRY (Don't Repeat Yourself) principle immediately look for a way to abstract out the duplication.  And it's pretty straightforward, if you have a background in functional programming.  But the technique takes a little getting used to for a dysfunctional programmer.  But it shouldn't; if you have almost-duplicate code, you turn it into a function, with the differences as parameters.  This is the same thing.

Everything is the same except for the body of the try block.  So make that a parameter to a function, and use that function to eliminate the redundancy.  When I say "make the body of the try block a parameter", I mean a parameter which is a function:

void AlmostARiflFunction(void useResource(RawResource &resource)) {
    RawResource resource = RawResource.Obtain();
    try {
        useResource(resource);
    } finally { // not C++
        resource.Release();
    }
}

Here "useResource" is a function which is passed to "AlmostARiflFunction".  And, in turn, AlmostARifl function passes the resource to "useResource", which then, well, uses the resource. 

The usage looks like this, with anonymous functions (which in real life I usually call lambdas or lambda functions) being passed in.

void ExampleAlmostRiflUsage() {
    AlmostARiflFunction([](RawResource &resource) {
        resource.Use();
        DoSomethingElse();
        resource.Use();
    });
}

void AnotherExampleAlmostRiflUsage() {
    AlmostARiflFunction([](RawResource &resource) {
        DoAnotherThing();
        resource.Use();
        DoSomethingElseEntirely();
        resource.Use();
    }
}

The repetition is vastly reduced.  If you're not familiar with it, the function arguments demonstrate the syntax for anonymous functions in C++-11.  C++ always has elegant syntax; the no-op anonymous function is "[](){}", but you can abbreviate that to "[]{}" for even better readability.  Does sarcasm come across in a blog?

RIFL usability depends on anonymous function usability; C++ isn't great, but it's better than C which doesn't have anonymous functions at all; or MATLAB which doesn't support multi-statement anonymous functions.

Higher-order functions (i.e., functions taking functions as parameters) are the bread of functional programming (the butter is immutability), so I'm sure RIFL is already widely used in functional programming, and probably better named.  But let me elaborate on how best to implement it.

The good thing about AlmostARiflFunction is that, as long as you don't obtain a RawResource on your own, and only use AlmostARiflFunction to get access to one, you cannot fail to release it; or rather AlmostARiflFunction guarantees that it will release it for you, which is even better.

The bad thing is that nothing stops you from releasing it yourself, which can cause trouble.  Or continuing to use it after you release it yourself.  The obvious fix is to hide the Release method.

class RiflResource {
public:
    static void WithResource(void useResource(RiflResource &resource)) {
        RawResource resource = RawResource.Obtain();
        try {
            RiflResource riflResource(resource);
            useResource(riflResource);
        } finally { // not C++
            resource.Release();
        }   
    }
    void Use() { raw_.Use(); }
private:
    RiflResource(RawResource &raw) : raw_(raw) {}
    RawResource &raw_;
};

We simply provide a wrapper class that provides Use() but does not provide Release() or Obtain().  And we can make the RIFL function a static method of the class, to tie everything together.  You can think of the RIFL function as a wrapper for both Obtain() and Release().

void ExampleRiflUsage() {
    RiflResource.WithResource([](RiflResource &resource) {
        resource.Use();
        DoSomethingElse();
        resource.Use();
    });
}

Now we have guaranteed that Release() is called exactly once, and there is no way to use the resource after it is released.

Also, note that the using code now has no references to the raw resource.  We presumably can't remove the raw resource from the language, but as long as we only use it through the RIFL class, we don't have to worry about releasing it (or even explicitly obtaining it).

In the same way that you get a stack frame by calling a function, and free it up by  returning (often implicitly by falling off the end of the function), we obtain the resource by calling the RIFL function, and it is released when our anonymous function ends.

In my next post, I'll give some examples of RIFL in Go and Perl.  And I promise to get to real C++ and RAII soon.

No comments:

Post a Comment