Careful Code: RIFL

Showing posts with label RIFL. Show all posts

Monday, May 20, 2013

RIFL vs. using

In the last several posts, I've introduced a resource management pattern I call RIFL, or Resource Is Function-Local. I've given examples in Go, Perl, and C++, and compared to RAII. Here, I'll give examples in C#, and discuss how it relates to the IDisposable pattern in C#.

In C#, a resource that needs management signals that to the coder by implementing the interface IDisposable. That contains a single function, void Dispose(), which is about as simple as interfaces get. Implementing IDisposable means that you should ensure that Dispose() gets called when you're done with the resource. And the language provides the "using" statement, which is some lovely syntax making calling Dispose() easy. A typical usage looks like this:

using (var resource = new Resource()) {
    resource.Use();
} // implicitly calls resource.Dispose()

This is roughly equivalent to

var resource = new Reource();
try {
    resource.Use();
} finally {
    resource.Dispose();
}

The differences are mostly too subtle to affect our discussion here.

Often, as with RAII, you want some resource to have the same lifetime as an instance of a class. In that case, you construct the IDisposable resource in the constructor for the new class, and you have that class also implement IDisposable and call Dispose() on the resource in its own Dispose() method. Actually, there are subtle, confusing and dangerous interactions between Dispose(), the actual destructor (well, finalizer), the garbage collector and OS resources, which make correctly implementing Dispose() rather tricky. Fortunately, for our purposes, all of these issues are too subtle to affect our discussion much.

I'm not comfortable really talking much about Java, as I'm much more familiar with C#, but recent versions of Java have an AutoCloseable interface and an extension to the "try" syntax which is similar to Disposable and "using".

C# has similar flexibility to C++ because Disposable is so similar to RAII. Here's a few examples of using resources and non-resources in C#:

public void ExampleUsingResource() {
    using (var resource = new Resource() {
        resource.Use();
    }
}

public Resource ExampleUseOfEscapingResource() {
    var resource = new Resource();
    resource.Use();
    return resource;
}

public void ExampleUseOfLeakingResource() {
    var resource = new Resource();
    resource.Use();
} // failed to release resource properly by calling Dispose()

public void ExampleUseOfNonResource() {
    var nonResource = new NonResource();
    nonResource.Use();
} // no need to release nonResource, since it does not implement IDisposable

So this is even trickier than in C++, since leaking a resource not only looks like a cross between "using" a resource and deliberately allowing a resource to escape the current scope; but it looks identical to using a non-resource. And here's where it gets really tricky: the difference between a non-resource and a resource is that a resource implements IDisposable. And how do you know that when you're coding? Well, if you're using Visual Studio with IntelliSense (or something roughly equivalent), usually you would try typing "resource.Disp" and seeing if the system wants to autocomplete it to Dispose. This almost works, except for three issues.

First, and pretty minor, is that it's possible to implement Dispose without implementing IDisposable, and then you can't use "using". But you find that out as soon as you try to use a "using" statement, so that's mostly ok.

Second, and pretty tricky, is that some classes explicitly implement IDisposable.Dispose, so that the Dispose method isn't available unless you cast to (IDisposable). That means it won't autocomplete so you can be fooled into not using "using".

Third, sometimes it is apparently safe not to call Dispose on certain IDisposable objects, such as Task. This turns out to be pretty convenient, since in many of the ways you are encouraged to use Tasks in recent C# versions, it's very hard to find the right way to Dispose them. But this means that sometimes the Leaking code doesn't really leak, which is pretty confusing.

Oh, and did I mention that sometimes Dispose throws?

What it adds up to is that the IDisposable pattern seems fine, but ends up being really painful to use safely. Fortunately, C# has pretty good lambda support, so you can easily implement and use RIFL.

public class RiflResource {
    public static void WithResource(Action<RiflResource> useFunc) {
        using (var raw = RawResource.Obtain()) { // often: new RawResource()
            var rifl = new RiflResource(raw);
            useFunc(rifl);
        } // implicitly Dispose (Release)
    }

    public void Use() { raw_.Use(); }
    private readonly RawResource raw_;
    private RiflResource(RawResource raw) { raw_ = raw; }
}

And it's easy to use

public void ExampleRiflUsage() {
    RiflResource.WithResource(resource => {
        resource.Use();
    });
}

In fact, of the languages I've used in examples, C# has the best syntax for anonymous functions, which makes this the cleanest usage. And I intentionally wrote my example so that it's obvious how to add multiple calls to Use; this simple case has an even briefer alternate syntax:

public void ExampleRiflUsage() {
    RiflResource.WithResource(resource => resource.Use());
}

or even

public void ExampleRiflUsage() {
    RiflResource.WithResource(Use);
}

but that only works if you want to call Use() alone once with no arguments.

I am really happy to wrap up my posts on RIFL. So far, I've only blogged on 2 topics, and each of those turned into multi-post discussions, where each post was too long. I hope to find a reasonable-length topic for the next post.

Friday, May 17, 2013

RIFL vs. RAII

In previous posts, I introduced a resource-management pattern I call RIFL (pronounced "rifle"), or Resource Is Function-Local. Here I will talk about the standard C++ way of managing resources, which is called RAII, or Resource Allocation Is Initialization. RAII is usually pronounced "are eh aye aye", but I prefer "rah eeeeeeeeeee".

In RAII, you create a class in which the resource is obtained in the constructor and released in the destructor. Thus the resource necessarily has exactly the same lifetime as the instance of the class, so the instance is a proxy for the resource. And, if you allocate that instance on the stack, it is released when the stack frame is destroyed. Actually, in C++, it is released when control exits the lexical scope, but that's not too far different.

class RawResource {
public:
    static RawResource Obtain();
    void Use();
    void Release();
};

class RaiiResource {
public:
    RaiiResource() : raw_(RawResource.Obtain()) {}
    ~RaiiResource() { raw_.Release(); }
    void Use() { raw_.Use(); }
private:
    RawResource &raw_;
};

This is a bit simpler than a RiflResource. Using the resource is also easy:

void ExampleRaiiUsage() {
    RaiiResource resource;
    resource.Use();
}

This is very similar to using a RiflResource, but again a bit simpler. As with RIFL, there is no reference to the raw resource, and no likelihood of releasing the resource twice.

The next example usage is both a strength and a weakness of RAII. If the RAII object is not allocated on the stack, the lifetime of the resource is still the lifetime of the object, whatever it may be. The strength is that if you need a more flexible lifetime, you can create one:

RaiiResource *RaiiEscape() {
    RaiiResource *resource = new RaiiResource();
    resource->Use();
    return resource;
}

Before I get into the technical weaknesses of RAII, let me just warn you that confusing it with the RIAA may get you sued.

There's a weakness of RAII which is almost identical to the strength; you can accidentally fail to release the resource:

void RaiiLeak() {
    RaiiResource *resource = new RaiiResource();
    resource->Use();
}

This is a memory leak of the resource object, and therefore a resource leak as well. The biggest problem with this weakness is that the code looks like a cross between the other two valid usages. In this simple example, of course, it is easy to see the leak; but it takes a lot of discipline to avoid the leaks in large programs. Of course, C++ offers tools (such as std::shared_ptr) to help manage memory (and therefore, with RAII, other resources).

If you recall, I only showed fake C++ code (using a "finally" clause) to implement RIFL in C++. The actual way to implement RIFL in C++ is on top of RAII.

class RiflResource {
public:
    static void WithResource(void useResource(RiflResource &resource)) {
        RaiiResource resource; // Obtain
        RiflResource riflResource(resource); // wrap
            useResource(riflResource); // use
    } // implicit release when RAII resource exits scope
    void Use() { low_.Use(); }
private:
    RiflResource(RaiiResource &raii) : raii_(raii) {}
    RaiiResource &raii_;
};

RAII turns managing any resource into managing memory. Managing memory is hard in the general case, but easy when the memory is on the stack. RIFL turns managing any resource into managing the stack, which is always easy, but more limiting.

Holding a lock (not the lock itself) is an unusual resource, because there are no Use() methods, just Obtain (Lock) and Release (Unlock), in the usual implementation. C++ has an RAII wrapper for locking, which gets used like this:

void AccessWhileRaiiLocked() {
    std::lock_guard<std::mutex> myLock(myMutex);
    x.Use(); // use some class data safe in the knowledge that it is locked.
    y.Use();
} // implicit unlock when lock goes out of scope

This is roughly equivalent to this:

void AccessWhileUnsafelyLowLevelLocked() {
    myMutex.lock();
    x.Use();
    y.Use()
    myMutex.unlock(); // not executed if either Use() throws
}

And to this RIFL example:

void AccessWhileRiflLocked() {
    nonstd::lock_rifl<std::mutex>::HoldingLock(myMutex, []() {
        x.Use();
        y.Use();
    });
}

Ignoring the fact that nothing keeps you from using "x" or "y" outside any of the locking mechanisms, I would argue that RIFL has an advantage over RAII in this case. It's not that it's simpler; it's slightly more verbose. But the RAII example looks like it simply has an unused variable. It's not at all obvious from the code that the x.Use is somehow nested within the lock; or that it's at all related. Or even, that the lock is needed.

Better would be to use RIFL to actually restrict access to the variables:

class RiflControlled {
public:
    void Sync(void useFunc(usable &x, usable &y)) {
        std::lock_guard<std::mutex> raiiLock(mutex_);
        useFunc(x_, y_);
    } // implicit unlock
private:
    RiflControlled(usable const &x0, usable const &y0) : x_(x0), y_(y0) {}
    usable x_;
    usable y_;
    std::mutex mutex_;
};

void AccessWhileRiflControlled(RiflControlled &rifl) {
    rifl.Sync([](usable &x, usable &y) {
        x.Use();
        y.Use();
    });
}

With RiflControlled, there is simply no way to access "x" or "y" without holding the lock. You've guaranteed that the locking is correct. Well, that's a bit overstated, but you really have to deliberately undermine it; it's not going to happen by accident. Note that in this case, the RIFL function is a (non-static) method on the object, unlike all the previous RIFL examples. This again suggests that RIFL is a flexible approach to resource management.

With RAII, you can't limit the access to the variables to the scope where the lock is held. Challenge: prove me wrong.

RAII is necessary in C++ because of the lack of a "finally" clause on a try; there's really no good way around it. However, it is also a relatively low-level mechanism, which can be abused. RIFL in C++ can be used as a wrapper around RAII, providing more rigid control over resources (which is good); but less flexible control over resources (which is bad). But RAII is a step up from the raw resource; one might say that RAII is semi-automatic, while RIFL is fully-automatic.

In the next post, I'll compare RIFL to another resource management pattern similar to RAII, which is IDisposable in C#, and about implementing RIFL in C#.

Wednesday, May 15, 2013

RIFL Examples

In the last post, I introduced a pattern I call RIFL, or "Resource Is Function-Local". The goal is to control access to a resource in such a way that there is a guarantee that it is released exactly once. The idea is that the resource never exists in the wild. It is created within a function, and can only be used when passed into another function which is a parameter of the first. That's a little confusing, but let me jump into some examples which should make it clearer.

First, here is a RIFL wrapper around writing files in Go. Note that a function declaration in Go is something like "funcName(argName argType) returnType".

package rifl
import "os"

type File struct{ file *os.File }

func WithFile(fileName string, useFile func(file File) error) error {
    file, err := os.Create(fileName) // create or truncate, open for writing
    if err != nil {
        return err
    }
    defer file.Close() // see note below
    return useFile(File{file})
}

func (file File) WriteString(s string) (int, error) {
    return file.file.WriteString(s)
}

I think this example is fairly readable for those who don't know Go. The only thing that really needs explaining is the "defer" statement. "defer" basically means: put the rest of the function in a "try" block, and put this in the "finally" clause. That is, it contains a function call which is not executed until the rest of the function is completed; but is guaranteed to execute, even if there is exception (which Go calls a "panic").

And this is how the rifl package might be used, with poor error handling:

package main
import "rifl"

func main() {
    rifl.WithFile("out.txt", func(file rifl.File) error {
        file.WriteString("Hello\n")
        file.WriteString("Goodbye\n")
        return nil
    })
}

Go is a really good language in many ways, but if you're used to C++'s destructors, you'll miss them mightily. RIFL can help you feel a bit better.

Note that in the main package, there are no references to os.File, only to rifl.File; we're not using the raw resource, just the RIFL wrapper. I only implemented WriteString, but I could have implemented the entire interface, or (in principle) improved it. Or made it worse. Also, unlike my abstract example in the previous post, the Obtain function (os.Create) and the Use function (WriteString) both take arguments and return multiple values. So it's a bit noiser.

I deliberately made the useFile argument the last argument to WithFile, because it tends to run on over many lines.

The most important point, of course, is that, in the WithFile function, we guarantee that the file will be closed, by using "defer". It doesn't matter how many goats are sacrificed while running "useFile"; the file will get closed.

Also, the os.File object "file" is hidden from users. In Go, identifiers with an initial uppercase letter (File, WithFile, WriteString) are exported; others (file) are not. This means the caller has no way to call Close(). But that's fine, since WithFile handles that.

Now for something completely different. Here's a RIFL wrapper for a database transaction in Perl, annotated (comments after # signs) so that those who don't know Perl can (I hope) follow it:

package XAct; # class
use Moose;

has '_dbh' => (is => 'ro'); # private data

sub Atomically { # RIFL function
    my ($dbh, $func) = @_;
    my @results;
    $dbh->begin_work; # start transaction
    eval { # try
    my $wrap = XAct->new(_dbh => $dbh);
    @results = $func->($wrap); # in transaction
    $dbh->commit;
    };
    if ($@) { # catch
    $dbh->rollback;
    die $@; # rethrow
    }
    return @results;
}

sub prepare { # pass-through Use
    my ($self, @args) = @_;
    return $self->_dbh->prepare(@args);
}

1; # end package

The RIFL function Atomically guarantees that the transaction is in effect when the passed-in function, $func is called; and that it is cleaned up afterwards. A transaction, unlike a typical resource, can be Released either by a Commit or by a Rollback. Atomically guarantees that if $func does not throw, the transaction is committed; and if it does throw, the transaction is rolled back. So, this is an additional complexity easily handled by RIFL.

As before, the transaction object (which is really the the database handle) is wrapped to prevent direct access to the commit or rollback.

Note that the database handle itself is a resource, which could also be managed by a RIFL function, but that is not included in this example.

Here is an example function using Atomically:

sub InsertSomeNames {
    my ($dbh, %map) = @_;
    my $sql = 'INSERT INTO SomeNames(Id, Name) VALUES(?,?)';
    XAct::Atomically($dbh, sub {
    my ($xact)=@_; # get resource
    my $sth = $xact->prepare($sql); # use directly
    while (my ($id, $name) = each(%map)) {
        $sth->bind_param(1, $id);
        $sth->bind_param(2, $name);
        $sth->execute; # use indirectly
        }
    });
};

Here the transaction is being passed to the inner function as $xact, which is being used to insert many values into the database. If any of those inserts fail, all of them will be rolled back; they will only be committed if they all succeed.

The point of this post is that the RIFL pattern is pretty easy to implement and pretty easy to use in many languages. It relies only on good anonymous functions (lambda functions), used when calling the RIFL function. Of course, you also have to be able to reasonably manage the resource so that it is guaranteed to be freed at the end of the RIFL function; but it seems like that's a fundamental prerequisite to being able to manage resources in any way in the language.

In the next post, I'll compare RIFL with RAII and show some C++ examples.

Monday, May 13, 2013

Introduction to RIFL

I want to introduce a new pattern for resource management, which I call RIFL (pronounced "rifle"), which stands for "Resource Is Function-Local". RIFL has some advantages (besides the better name) over another common pattern, RAII. One big advantage of RIFL is that it can be used in languages like Go which do not have destructors. I'll compare RIFL and RAII in detail in an upcoming post, but first I want to introduce RIFL, and provide some examples.

"Resource" is a word which here means "something which, when you're done using it, it's best to tell someone that you're done". That is, when you're done with memory, you should free (or delete) it; when you're done with a file (or network connection), you should close it; when you're done with a transaction you should commit it (or roll it back); when you're done with a lock being locked, you should unlock it. And a big "et cetera", because this sort of thing is pretty common. Also, this is slightly broader than the more typical definition of resource; for instance, while a lock itself is often considered a resource, here I'm consider the state of it being locked to be a resource (i.e., "holding the lock", rather than just the existence of the lock).

When you're done reading the book, you should put it back on the shelf.

The life-cycle of a resource is: Obtain, Use (and Use some more), Release. People often say "allocate" or "open" instead of "obtain", and "free" or "deallocate" or "close" instead of Release; and of course, there are a thousand variants of "use" depending on the resource: "read", "write", "update", etc. There can be several "use"s for a single resource. Files typically have "read", "write", and sometimes "seek". With a database transaction, you have at least "select", "insert", "update", and "delete"; and perhaps "merge", "bulk insert", "create", and "drop". But when talking about resources in general, I'll just stick with "Use".

The challenging part of resource management is making sure you never forget to put the book back on the shelf, even if you leave the library because of a fire alarm. I mean, the hard part is being sure that you release the resource even with complicated control flow, including an exception being thrown. Occasionally it also gets tricky not using the resource after it is released, and especially only releasing the resource once.

One resource that people don't usually think about managing is stack frames (i.e., local variables in a function). You could say "When you're done with the stack frame, return from the function." But you don't need to say that; it's pretty natural. And RIFL takes advantage of this by turning managing other resources (which is hard) into managing the stack (which is easy).

My examples in this post will be mainly in a C++-like language that has a "finally" clause for the "try" statement. Which is not standard C++, but is easy to understand.

A typical raw (unmanaged) resource then would look like this:

class RawResource {

public:

    static RawResource Obtain();

    void Use();

    void Release();

};

Often Obtain() and Use() take parameters, but that's not important for getting the basic idea. As I mentioned before, there are often actually many Use() methods. Occasionally, as with locks, there may be no Use() methods at all.

Also note that Obtain() is a static method returning an object of the class. Essentially that's a constructor; so, Obtain() can just be the constructor for the class.

This is how a raw resource is supposed to be used:

void ExampleLeakyRawUsage() {

    RawResource resource = RawResource.Obtain();

    resource.Use();

    DoSomethingElse();

    resource.Use();

    resource.Release();

}

That's pretty straightforward, but if Use() or DoSomethingElse() can throw, the Release() will never get called. It's also possible to have a dozen places in your code where you use the resource, and you could easily miss a Release in one of them; or just miss it along one code path if there is complicated code in the middle. This leaks the resource, which is bad.

If you want to use the resource correctly, you have to do something like this:

void ExampleRawUsage() {

    RawResource resource = RawResource.Obtain();

    try     {

        resource.Use();

        DoSomethingElse();

        resource.Use();

    } finally { // not C++

        resource.Release(); // guaranteed

}

}

That's a lot of boilerplate every time you want to use the resource. Here's a very similar function:

void AnotherExampleRawUsage() {

    RawResource resource = RawResource.Obtain();

    try {

        DoAnotherThing();

        resource.Use();

        DoSomethingElseEntirely();

        resource.Use();

    } finally { // not C++

        resource.Release();

}

}

Not really very different. Followers of the DRY (Don't Repeat Yourself) principle immediately look for a way to abstract out the duplication. And it's pretty straightforward, if you have a background in functional programming. But the technique takes a little getting used to for a dysfunctional programmer. But it shouldn't; if you have almost-duplicate code, you turn it into a function, with the differences as parameters. This is the same thing.

Everything is the same except for the body of the try block. So make that a parameter to a function, and use that function to eliminate the redundancy. When I say "make the body of the try block a parameter", I mean a parameter which is a function:

void AlmostARiflFunction(void useResource(RawResource &resource)) {

    RawResource resource = RawResource.Obtain();

    try {

        useResource(resource);

    } finally { // not C++

        resource.Release();

}

}

Here "useResource" is a function which is passed to "AlmostARiflFunction". And, in turn, AlmostARifl function passes the resource to "useResource", which then, well, uses the resource.

The usage looks like this, with anonymous functions (which in real life I usually call lambdas or lambda functions) being passed in.

void ExampleAlmostRiflUsage() {

    AlmostARiflFunction([](RawResource &resource) {

        resource.Use();

        DoSomethingElse();

        resource.Use();

});

}

void AnotherExampleAlmostRiflUsage() {

    AlmostARiflFunction([](RawResource &resource) {

        DoAnotherThing();

        resource.Use();

        DoSomethingElseEntirely();

        resource.Use();

}

}

The repetition is vastly reduced. If you're not familiar with it, the function arguments demonstrate the syntax for anonymous functions in C++-11. C++ always has elegant syntax; the no-op anonymous function is "[](){}", but you can abbreviate that to "[]{}" for even better readability. Does sarcasm come across in a blog?

RIFL usability depends on anonymous function usability; C++ isn't great, but it's better than C which doesn't have anonymous functions at all; or MATLAB which doesn't support multi-statement anonymous functions.

Higher-order functions (i.e., functions taking functions as parameters) are the bread of functional programming (the butter is immutability), so I'm sure RIFL is already widely used in functional programming, and probably better named. But let me elaborate on how best to implement it.

The good thing about AlmostARiflFunction is that, as long as you don't obtain a RawResource on your own, and only use AlmostARiflFunction to get access to one, you cannot fail to release it; or rather AlmostARiflFunction guarantees that it will release it for you, which is even better.

The bad thing is that nothing stops you from releasing it yourself, which can cause trouble. Or continuing to use it after you release it yourself. The obvious fix is to hide the Release method.

class RiflResource {

public:

    static void WithResource(void useResource(RiflResource &resource)) {

        RawResource resource = RawResource.Obtain();

        try {

            RiflResource riflResource(resource);

            useResource(riflResource);

        } finally { // not C++

            resource.Release();

}

}

    void Use() { raw_.Use(); }

private:

    RiflResource(RawResource &raw) : raw_(raw) {}

    RawResource &raw_;

};

We simply provide a wrapper class that provides Use() but does not provide Release() or Obtain(). And we can make the RIFL function a static method of the class, to tie everything together. You can think of the RIFL function as a wrapper for both Obtain() and Release().

void ExampleRiflUsage() {

    RiflResource.WithResource([](RiflResource &resource) {

        resource.Use();

        DoSomethingElse();

        resource.Use();

});

}

Now we have guaranteed that Release() is called exactly once, and there is no way to use the resource after it is released.

Also, note that the using code now has no references to the raw resource. We presumably can't remove the raw resource from the language, but as long as we only use it through the RIFL class, we don't have to worry about releasing it (or even explicitly obtaining it).

In the same way that you get a stack frame by calling a function, and free it up by returning (often implicitly by falling off the end of the function), we obtain the resource by calling the RIFL function, and it is released when our anonymous function ends.

In my next post, I'll give some examples of RIFL in Go and Perl. And I promise to get to real C++ and RAII soon.