Failing (Gracefully)
(Alternate title: Failing Productively)
I posted some snippets from a recent interview with Fred Brooks in the August issue of Wired (by the way, I’m working through his latest compilation of essays, The Design of Design).
I’ll repost the relevant bits here:
KK: You say that the Job Control Language you developed for the IBM 360 OS was “the worst computer programming language ever devised by anybody, anywhere.” Have you always been so frank with yourself?
FB: You can learn more from failure than success. In failure you’re forced to find out what part did not work. But in success you can believe everything you did was great, when in fact some parts may not have worked at all. Failure forces you to face reality.
I think this is an important lesson. I’ve written about this topic before in a post about, of all things, The World of Warcraft.
From the Wired article:
Where traditional learning is based on the execution of carefully graded challenges, accidental learning relies on failure. Virtual environments are safe platforms for trial and error. The chance of failure is high, but the cost is low and the lessons learned are immediate.
To expand on this, in software, I think it’s important to have lots of little failures. This is the only way to discover and find solutions that work and solutions that don’t work (hopefully on the path to a solution that does work!). In my book, failure is good; it’s a necessary part of the learning process (if I’m not failing, I’m probably not doing anything interesting or challenging). I expect to fail and I expect other developers that I work with to fail. My estimations even account for failure. The important thing, however, is to actually examine your failures and to understand why you’ve failed. More than that, it’s important to understand how to fail. The key is to fail early and fail in small, isolated scenarios and be able to extract from that some concept of what will work and what will not; we call this iterating or prototyping or iterating with prototypes. Then, on a macro scale, examine one’s work once a project is done and identify what one did wrong, what was painful, what could have been done better and actually make the effort to improve.
Brooks also expands on this in The Design of Design. In chapter 8, “Rationalism versus Empiricism in Design”, he writes:
Can I, by sufficient thought alone, design a complex object correctly? This question, particularized to design, represents a crux between two long-established philosophical systems. Rationalism and empiricism. Rationalists believe I can; empiricists believe I cannot.
The empiricist believes that man is inherently flawed, and subject repeatedly to temptation and error. Anything he makes will be flawed. The design methodology task, therefore, is to learn how to determine the flaws by experiment, so that one can iterate on the design.
Brooks boldly states: “I am a dyed-in-the-wool empiricist.” I’m in Brooks’ camp; I’d definitely consider myself an empiricist. It’s evident in my sandbox directory where hundreds of little experiments live that I use to rapidly iterate an idea (and isolate the failures). If you’re an empiricist, then — as Brooks implies — iterative models of design and development come naturally. I find it more productive to go through a series of quick, small prototype and experiments to identify the failures than to end up discovering one big failure (or lots of little small failures) late in a project! As much as we’d like software engineering to be a purely mechanical process (say an assembly line in an automotive plant), I don’t think that this can ever be the case.
So then it follows, if designers and developers work best with an empiricist view of the world, then why do we continue to design, plan, budget, and schedule projects using a waterfall approach? Why do we continue to use a model that does not allow for failure in design or implementation, yet cannot actually prevent failure? “Sin.” Brooks writes in chapter 4 “Requirements, Sin, and Contracts”:
The one-word answer is sin: pride, greed, and sloth… Because humans are fallen, we cannot trust each other’s motivations. Because humans are fallen, we cannot communicate perfectly.
For these reasons, “Get it in writing.” We need written agreements for clarity and communication; we need enforceable contracts for protection from misdeeds by others and temptations for ourselves. We need detailed enforceable contracts even more when the players are multi-person organizations, not just individuals. Organizations often behave worse than any member would.
So it seems that the necessity for contracts best explains the persistence of the Waterfall Model for designing and building complex systems.
I find that quite disappointing and pessimistic and yet, full of truth.
On a recent project, we failed to launch the project entirely even after months of designing, design reviews, sign-offs, and discussions. I had already started writing some framework level code, fully anticipating the project starting within a matter of weeks after the design had been scrutinized ad nauseum and “finalized”. The client insisted on a rigid waterfall approach and wanted to see the full solution in design documents upfront. As absurd as this sounds, the client had already spent more for design artifacts (documents and UML diagrams), by this point, than they had budgeted for delivery (development, testing, validation, and deployment). It was an impossible objective to start with, but we obliged as an organization despite my own protests internally. Tedious, micro-level designs were constructed and submitted, but to what end? The project was scheduled to go live this April. It is now August and after a change of vendors, it isn’t even close to getting off the ground. Instead of many micro-failures along the path to success, this client’s fear of failures (embodied by their goal of designing out all of the risk) has lead them down to the path of one big failure.
So the question then is: how can we overcome this? How do you negotiate and write a contract to build a solution iteratively? How can you effectively build that relationship of trust to break down the sins and the communication barriers? Brooks touches upon various models and why they work, but doesn’t necessarily offer much insight and guidance in how to overcome the “sins” while still working within an enforceable contract. This, I think, is an important lesson to learn not just for individuals, but for organizations. A certain level of failure must be acceptable and in fact, encouraged; this is essentially what iterative design and development means: iterate quickly and find what does and doesn’t work. Make many small mistakes early instead of finding big mistakes in your design or assumptions later.
Footnote: I’m still working through the book and, so far, it has been a great read.