Saturday, November 18, 2006

 

Abstraction

The two biggest complaints I usually have when I start working with a new codebase are (a) functions and methods that are too long, and (b) lack of abstractions. The two complaints are related: overly long functions usually come about because the writer doesn't use any abstractions. Why don't programmers want to use abstractions?

Some programmers believe that code with a lot of "layers of abstraction" will be less efficient than code "written for the bare metal." This belief is often false. For an explanation of how code with good abstractions can be more efficient than low-level bit-banging code, see Bill Venners's interview with Bjarne Stroustrup. Stroustrup's FAQ debunks some other myths about the relationships between C++'s higher-level features and efficiency. If you are one of those people who thinks that using a C-style array is better than using a std::vector, std::list, or std::map, please read those articles.

Some programmers just don't know how to use abstractions. They think in terms of moving bytes around in memory; they don't think in higher-levels, and their code reflects that thinking. It's difficult to change the way people think, but there are some techniques one can use to recognize opportunities for abstraction.

A rule that works well for me is "Every time you make a design decision, express it in code as an abstraction." For example, if you decide you need a list of employee ID's in your application, and the employee IDs will each fit in a 32-bit unsigned integer type, don't just sprinkle uses of std::list<unsigned int> everywhere. Instead, put these definitions in your code:

    typedef unsigned int EmployeeID;
    typedef std::list<EmployeeID> EmployeeIDList;

Many programmers think that doing something like this just provides a convenient shorthand for writing code that uses the list, and it makes it easy to change the representation later. But convenience is not the only benefit. Here, we have documented in code the decisions to represent an employee ID as an unsigned int and to use a std::list as a container for them.

All programmers make dozens or hundreds of design decisions every day. Whenever you make one, try to make the code reflect your thinking, rather than just writing a comment (or writing nothing at all). Often, this will lead to definition of a new class, or a new function, or some other abstraction that makes the intent obvious.

Another good rule is "Keep functions to ten lines or less." I am truly amazed when I find functions that are hundreds (or thousands) of lines long. I'm amazed because the people who write them must have god-like intelligence if they can actually understand what they have written. I can't understand such functions until I start breaking them apart into bite-sized chunks.

There are exceptions to the "ten lines or less" rule, but they are rare. Any time somebody has to hit Page Up or Page Down to try to comprehend what a function does, the writer of that function has screwed up.

A related rule is "Don't indent more than two levels in a function." For example, when I see this:

    for (int i = 0; i < NUM_COLS; ++i) {
        for (int j = 0; j < NUM_ROWS; ++j) {
            if (a[i][j] < 0) {
                if (IsOdd(a[i][j])) {
                    a[i][j] = -(a[i][j]);
                }
            }
        }
    }

I'd like to rewrite it like this:

    for (int i = 0; i < NUM_COLS; ++i) {
        ProcessColumn(a[i]);
    }

And then I'd add a ProcessColumn function, which in turn would call a ProcessElement function to handle the innermost operation.

Some people would say this is making things too complicated, because you'll end up with more functions and more lines of code, but I think the result is a lot easier to understand, and a lot easier to verify as correct. That's what abstractions do for you.


Comments: Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?