Sunday 21 April 2013

Three Good Questions

Good questions are important. It's often fairly straightforward to find a good answer, if you're given a good question. It's a lot harder to find a good question, given nothing. Over years of doing system design, I found three "meta" questions particularly useful. These have sometimes helped me to imagine things that might go wrong during the construction or operation of a system. Maybe they could help you too.

I was reminded of these "meta" questions by an email from Aaron Sloman following my post on Learning to Program. He said:

"I could have added that I noticed something different when teaching philosophy. Doing conceptual analysis well requires the ability to dredge out of your vast store of knowledge examples that illustrate, test or refute, some concept, distinction, or thesis.

"I found over many years of teaching that I could teach some students to produce examples merely by doing it myself in discussions: they somehow got the idea and were able to do something similar. With other students they seemed to lack the ability to access their long-term memory in an appropriate way, and I was not able to think up a way of teaching them that helped them to develop that ability. I suspect this is related to the ability to do well in the Jeopardy game, but is more complicated, and less a matter of speed.

"More importantly it may also be related to some high level requirements for being a good system designer: too often systems are designed with flaws that the designer should have noticed without waiting for users to discover them. That requires the ability to trawl through what you already know, and I fear some designers are very bad at that, and design bad systems."

So maybe my Three Good Questions help me to "trawl through what I already know", and to imagine what might be. Anyway, here are the questions:
  • Does it scale? Usually it doesn't. When you take a prototype system that works fine in the lab and try to scale it up, you almost always run into problems. Sometimes these are latency or bandwidth problems. (A system which works fine in the lab on a lightly loaded local-area network is going to feel very different to users on a congested wide-area network.) Sometimes there are problems with algorithms that have unfortunate growth characteristics. There might even be an accidental reliance on particular people who are available to make the prototype system work, but who will not be available in a larger deployed system. (For example, I am suspicious that the currently over-hyped "Massively Open Online Courseware" will fail to scale in exactly this way.) There are lots of ways that a small system can fail to scale and only one of them has to be true for it to be a fatal problem for your system.

  • Does it have a start-of-day problem? A "start-of-day problem" is a familiar concept to system designers, but it is not a term in everyday use. A system has a start-of-day problem if it can reproduce itself once it exists, or keep itself going when it is going, but which needs an entirely different mechanism to start it going from nothing. For example, think of the problem of powering-up a ship from nothing. When its main engines are running, electricity comes from generators driven by those engines. (In some ships like the QM2, that's actually all that the main engines do: power transmission to the propellers is entirely electric.) However, the control systems and starter motors for the main engines need electricity to work. What provides that electricity if the main engines are not already going? This is a start-of-day problem, and it is solved in this case by having auxiliary generators to provide that electric power. (But they themselves have their own start-of-day problem, and they need to be provided with further mechanisms to start them from cold ...) We see this problem over and over in system design. For example, it is traditional to write compilers in their own language. But in that case how do you compile the first compiler for that language? (By having a bootstrap compiler written in something else, that you run once then never need again.)

  • What are the security implications? If we are doing system design properly, then we think about security all the way through, not just at the end. We think about exactly what security guarantees we are going to provide and to whom. We think about the traditional security properties of Confidentiality, Integrity and Availability (and maybe a few others). But this question is asking you to go a step further, and to consider the unintended consequences which might result from correctly implementing the security guarantees that you think you want. For example, to prevent an online password-guessing attack, it's usual to lock-out an account after, say, three wrong attempts. That's great for confidentiality, because a locked-out attacker can't now get access to your data. However, the unintended consequence is that neither can you! The lock-out mechanism which delivers better confidentially has also delivered worse availability: if all the attacker in fact wanted was a denial-of-service attack, you have handed it to him on a plate.
As I have been writing this, it struck me that the idea of trying to produce a counter-example is also very strong technique in system design. To phrase this as a "meta" question, we might say: If you think it can't work, can you prove it can't work? Sometimes by trying (and failing) to prove that it can never work, you can actually solve the original problem. For example, I was once working on a project with an apparently insurmountable architectural problem. If we couldn't solve it, the project we were working on would almost certainly be closed down. We'd been trying to solve it for a couple of months, but with no success. So one Friday afternoon, I decided to finally put the issue to rest and I tried to prove to a couple of colleagues that a solution was impossible. I didn't manage to do that, because over the space of about an hour, by trying to construct that proof I was finally able to find a solution to our problem.

System design is always an exercise in imagination. We often know the answer, but we don't know that we know it because we need someone to ask the right question, a useful and productive question which calls forth the answer in a flash of insight. Good system designers distinguish themselves by being able to ask exactly those fruitful and productive questions. Can you teach that? I'm not sure.

No comments:

Post a Comment