Reusable Software

Michael E. Duffy

10 years ago

There are very few human endeavors in which one-part-per-million accuracy is a requirement.

Why is software so expensive and time-consuming to build? To most people, it seems a bit strange that the first copy of a piece of software takes $1 million and a year of time to develop, but the second copy costs next to nothing and can be delivered almost instantly. The problem is, software is completely unforgiving. A program may consist of millions of bits—ones and zeroes—and if a critical bit is a one instead of a zero (or vice-versa), the entire program may fail to run. There are very few human endeavors in which one-part-per-million accuracy is a requirement.

Looking at the challenge of building software, it would seem one good approach would be reusable components. It would be so much easier if you could take a proven piece of software and connect it to another proven piece and presto: cheaper software, delivered more quickly, with a higher degree of reliability. It’s a manager’s (and an engineer’s) dream come true.

People like to draw parallels with other industries. For example, we have standardized hardware, like nuts and bolts. No manufacturer makes its own nuts and bolts. It buys them, selecting a standard set of qualities (length, diameter, thread pitch) and materials (e.g. stainless steel) from a company that specializes in making nuts and bolts. You should be able to buy software components the same way, no?

When UNIX was developed at Bell Labs in the 1970s, one of the core ideas was that a program should do one thing well. Another core idea was that the output of one program could become the input for another (an idea called “piping,” inspired by the idea of software as plumbing).

Let’s say you wanted to know the number of times the word “the” occurred in a document. The document became the input into a program (named “tr”), which could print out each word in its input on its own line. There was a program (cleverly named “sort”) that could sort lines of text into alphabetical order. There was a program (“uniq”) that would take sorted input and eliminate (or, optionally, count occurrences of) duplicates (sorting makes it a lot easier). And finally, a program named “grep” prints lines of text that match a particular pattern.

The UNIX command line let you string all these programs together to count the instances of “the” in a document. The “tr” program took your document and spit out each word on its own line. That output was connected by a pipe to the input of “sort,” which sorted the lines into order (so all occurrences of “the” are now grouped together). The output of “sort” is piped to “uniq,” which outputs a line with each unique word and its associated count. And finally, the output of “uniq” is piped to “grep,” which outputs the single line containing the word “the” and its count. UNIX experts in the audience: Please forgive me for glossing over some details here.

The idea of connecting the output of one program to the input of another is simple and powerful, and was one reason for the enormous success of UNIX (and its descendant, Linux). Of course, using the UNIX tool set took (a) familiarity with the tools, and (b) some thinking about how to string them together. So, ordinary users were pretty lost when trying to count words in their documents.

Obviously, one can bundle the various programs and pipes together to create a monolithic “word count” program suitable for average users. Microsoft Word has a built-in word count command for just that reason. But the problem with monolithic programs is that it’s difficult— in some cases, impossible—to extract the one function you need for use in some other way.

There are a couple ways in which software is “reused” today. First, general-purpose applications like Microsoft Word and QuickBooks are used by lots of different companies. QuickBooks is reusable in this way because accounting is pretty much the same no matter what you’re doing (although Intuit, the makers of Quicken, did introduce some industry-specific versions of their products). Microsoft simply keeps adding features to Word to accommodate users. It makes the program huge, and most people only use a fraction of its full feature set, but it’s got everything you could possibly want (if only you can find it).

For programmers, software reusability comes in the form of software libraries. These are functions you can call from your own program to perform particular tasks. The popular C programming language has a relatively Spartan standard library, whereas the Python language has a very extensive library. The advantage of “standard” libraries is that they’re pretty well tested. There are also libraries of code developed by third parties, some of which are good and some of which are less so. That’s one of the reasons software reuse is hard: As with UNIX pipes, you have to know what library functions are available, and you have to be clever about fitting them together. And then you may end up having to debug library code you didn’t write. Most software developers would rather debug their own code.

As with most things, human behavior (the common perception that available tools are too hard to understand, use or find) is at the heart of the problem of greater software reuse.

Author

Michael E. Duffy

Michael E. Duffy is a 70-year-old senior software engineer for Electronic Arts. He lives in Sonoma County and has been writing about technology and business for NorthBay biz since 2001.
View all posts