All around the web you can find blogs which were once tended with
care, day-by-day, week-by-week. Then one day ... nothing. You're left
wondering what happened. Did their author literally get hit by a bus? Or what?
Rather than leave you in a similar state of puzzlement, I thought
it more polite to sign-off explicitly. I have not (hopefully) been hit
by a bus, but I've become a whole lot busier. I was quite surprised to
see that, according to "wc" anyway, I have this year written
approaching 30,000 words on this blog. But for the foreseeable future,
I'm going to be rather too busy with other things to keep that up.
So I think it's better for now to just stop.
I've enjoyed writing these posts, and I've enjoyed the discussions
that they sparked, mostly by email behind the scenes. (Drop me a line
if you have any thoughts about them. I'm sure I'll have time for
that.) I think I understand agile development much better now through
writing about it here. I'm glad I got various ideas about teaching
programming down in black and white.
I was surprised by the popularity of
the
robots game. I was also surprised by how hard it is to write up
recipes that I cook all the time in a form that might work for other people.
And I've still got a list of things that I intended to write about
but didn't: thoughts on the difference between programming
languages and programming systems; an "absentee review" and
retrospective on The
Mother of All Demos; a development of Meta-II called "metaphor", which
I've put on
GitHub but not really explained. At the end of the day I think
I have made some progress following up my two initial trains of
thought: that we
could do programming a whole lot better, and that we could learn
something from another group of people who produce things every day,
the people who work in kitchens. But I think there's a lot more to be
said, a lot more to be discovered.
So, I hope you enjoy the posts that remain here. (They will probably make more
sense if you start from
the
beginning.) I think I'll be back one day to add some more. But I
may be some time ...
Sunday, 10 November 2013
Sunday, 3 November 2013
"Film Night"
Here are videos of some interesting talks. Grab a bag of popcorn
— or more healthily, a bag of pork scratchings — and
settle in to enjoy ...
Recent Developments in Deep Learning
Geoff Hinton, 30 May 2013
www.youtube.com/watch?v=vShMxxqtDDs (65 minutes)
Neural nets have been around for a long time, but people could not get them to work as well as they had hoped. The explanation for this was that learning by backpropagation would never give better results because:
Monads and Gonads
Douglas Crockford, 15 January 2013
http://www.youtube.com/watch?v=b0EF0VTs9Dc (49 minutes)
Yet another monads tutorial? Is it any more comprehensible than the others? It's certainly more entertaining, if only because Crockford asserts that you don't need to understand Haskell, you don't need to understand Category Theory, and if you've got the balls, you don't even need static types. (As you might expect, he uses Javascript. I really enjoyed this talk, but I found that I had to re-do his Javascript in Python to make sure I had understood it right.)
Mission Impossible: Constructing Charles Babbage's Analytical Engine
Doron Swade, 8 May 2012
http://www.youtube.com/watch?v=FFUuN-ZRLz8 (87 minutes)
This is longer than the others, but there's really no heavy-lifting in this talk. Swade outlines the very ambitious project to complete the design and construction of Charles Babbage's Analytical Engine. (If you want to support the effort, you can make a contribution at plan28.org.)
Recent Developments in Deep Learning
Geoff Hinton, 30 May 2013
www.youtube.com/watch?v=vShMxxqtDDs (65 minutes)
Neural nets have been around for a long time, but people could not get them to work as well as they had hoped. The explanation for this was that learning by backpropagation would never give better results because:
- It required labelled training data.
- Learning time was very slow with deep nets.
- It got stuck in local optima.
Monads and Gonads
Douglas Crockford, 15 January 2013
http://www.youtube.com/watch?v=b0EF0VTs9Dc (49 minutes)
Yet another monads tutorial? Is it any more comprehensible than the others? It's certainly more entertaining, if only because Crockford asserts that you don't need to understand Haskell, you don't need to understand Category Theory, and if you've got the balls, you don't even need static types. (As you might expect, he uses Javascript. I really enjoyed this talk, but I found that I had to re-do his Javascript in Python to make sure I had understood it right.)
Mission Impossible: Constructing Charles Babbage's Analytical Engine
Doron Swade, 8 May 2012
http://www.youtube.com/watch?v=FFUuN-ZRLz8 (87 minutes)
This is longer than the others, but there's really no heavy-lifting in this talk. Swade outlines the very ambitious project to complete the design and construction of Charles Babbage's Analytical Engine. (If you want to support the effort, you can make a contribution at plan28.org.)
Sunday, 27 October 2013
Recipe: Chicken Español
This recipe was reverse-engineered from a Tesco ready-meal.
Ingredients
Method
On the day before you want to cook this, first make a sideways slit in each chicken breast, making a kind of pocket. Then put the chicken breasts in a plastic bag with the olive oil, paprika and a few grinds of salt and pepper. Squidge this around until the breasts are all covered with this marinade then tie the bag and leave it in the fridge overnight.
To make the sauce, fry the onion and garlic in olive oil over a medium heat until the onion is turning brown. Add the chopped tomato, sage, thyme and oregano. Then add the red wine. Add salt and pepper to taste. Cook for around 30 minutes until thickened. (Reduce heat as necessary. Blend with a hand-blender if you want it to be smoother.)
While that is going on, assemble the chicken with its other ingredients. Each chicken breast cooks in the oven in its own container, which should be about twice as big as the breast itself. (Replicating in some sense the original ready-meal experience.) Assemble each container as follows: first put 100g cream cheese in the base. On top of that, sprinkle 20g of grated cheddar and a couple of grinds of salt and pepper. Now take a breast and into the "pocket", place 40g cream cheese mixed with 20g grated cheddar. Place the breast in the container and cover it with 3 slices of choritzo. Place the containers in an oven pre-heated to to 200°C (= gas mark 6 = 400°F) and cook for around 30 minutes.
To serve, lift out each breast onto a plate, stir around the creamy sauce in its container and pour this sauce around the breast. Beside this, place a similar quantity of the tomato-and-wine sauce. (For a vegetable, steamed spinach works very well and makes a nice colour combination.)
Discussion
You will find that you have too much of the tomato-and-wine sauce: the above quantity would work for twice as much chicken. However, it's less fiddly to make a bigger batch. For convenience you can make the sauce well ahead of time and even freeze it and re-heat it. Although it's probably not necessary to give each breast its own little container, I think it probably is necessary to confine the area over which the creamy sauce can spread, so that it doesn't burn and the breasts don't dry out.
Ingredients
3 | Chicken breasts | |
2T | Olive oil | |
2t | Paprika | |
9 slices | Thinly sliced choritzo | |
120g | Cheddar cheese, grated | |
420g | Cream cheese | |
Salt | ||
Ground pepper | ||
For the sauce: | ||
1 | Onion, finely chopped | |
2 | Cloves garlic, crushed | |
3T | Olive oil | |
1 | Tin of chopped tomatoes (440g) | |
pinch | Sage, dried | |
pinch | Thyme, dried | |
pinch | Oregano, dried | |
250ml | Red wine | |
Salt | ||
Ground pepper | ||
(1T = one tablespoon = 15ml; 1t = one teaspoon = 5ml) |
Method
On the day before you want to cook this, first make a sideways slit in each chicken breast, making a kind of pocket. Then put the chicken breasts in a plastic bag with the olive oil, paprika and a few grinds of salt and pepper. Squidge this around until the breasts are all covered with this marinade then tie the bag and leave it in the fridge overnight.
To make the sauce, fry the onion and garlic in olive oil over a medium heat until the onion is turning brown. Add the chopped tomato, sage, thyme and oregano. Then add the red wine. Add salt and pepper to taste. Cook for around 30 minutes until thickened. (Reduce heat as necessary. Blend with a hand-blender if you want it to be smoother.)
While that is going on, assemble the chicken with its other ingredients. Each chicken breast cooks in the oven in its own container, which should be about twice as big as the breast itself. (Replicating in some sense the original ready-meal experience.) Assemble each container as follows: first put 100g cream cheese in the base. On top of that, sprinkle 20g of grated cheddar and a couple of grinds of salt and pepper. Now take a breast and into the "pocket", place 40g cream cheese mixed with 20g grated cheddar. Place the breast in the container and cover it with 3 slices of choritzo. Place the containers in an oven pre-heated to to 200°C (= gas mark 6 = 400°F) and cook for around 30 minutes.
To serve, lift out each breast onto a plate, stir around the creamy sauce in its container and pour this sauce around the breast. Beside this, place a similar quantity of the tomato-and-wine sauce. (For a vegetable, steamed spinach works very well and makes a nice colour combination.)
Discussion
You will find that you have too much of the tomato-and-wine sauce: the above quantity would work for twice as much chicken. However, it's less fiddly to make a bigger batch. For convenience you can make the sauce well ahead of time and even freeze it and re-heat it. Although it's probably not necessary to give each breast its own little container, I think it probably is necessary to confine the area over which the creamy sauce can spread, so that it doesn't burn and the breasts don't dry out.
Sunday, 20 October 2013
Democratic Development
(This post is another follow-up to
Agile
is Antifragile.)
I recently finished The Democracy Project by David Graeber. It's an excellent book, in part a personal history of the Occupy Wall Street movement, and in part an explanation of how to organise groups and solve problems using the consensus processes of "direct democracy". To my surprise, I realised that Graeber's insights can also be applied to software development. I think these insights explain not only what Agile development is really trying to do, but also reveal a cornucopia of organisational techniques that we can reuse to improve our practice of Agile development.
In Agile development, the team is supposed to organise itself — "horizonally" as Graeber would say — rather than being directed "vertically" by some manager. But exactly how? Practically every Agile process mandates short daily meetings, but practically no Agile process suggests in detail how those meetings should be run, or exactly how they lead to the team organising itself. What's really going on here? And why is "horizontal" better than "vertical"? As Graeber explains, what should be going on is decision making by consensus:
The reason for adopting consensus in a group is that "horizontal" decision making produces better decisions. And the key thing to notice is that the point of every meeting is to decide what to do next, in order to further the common purpose of the group. (There can still be some disagreement. In particular, there is no need to agree on "why". Agreement on "what" is sufficient.) Graeber goes on to explain:
You might question whether these principles of consensus are really what Agile development is about. You might say, yes, these principles are going to lead to better decisions. But the developers are, at the end of the day, paid staff. If someone in authority decides to impose a decision, the staff will have to go along with it even if they do detest it. Maybe so. But I think this actually means that consensus is an excellent touchstone for whether or not you are really doing Agile development. I think that at the point when someone outside the team does impose a decision, at that moment you know that you are no longer doing Agile development.
Graeber goes on to explain a particular widely-used consensus process for moderately sized groups which grew out of several different traditions, including Quakerism and radical feminism. I won't go through it here (it's outlined on p214-215) and as Graeber says, maybe in a small group that kind of formality is not needed, if everyone understands the consensus principles. But it does help, I think, to have concrete processes to consider and use in larger groups. Everything invented in the world of Agile development seems to be foreshadowed in the world of "direct democracy". For example, a "Scrum-of-Scrums" meeting is almost exactly the same as a "spokescouncil". I think we can learn a lot from these people.
(If you want to see what some of these formal processes look like in more detail, you can find a good overview in The Five Fold Path of Productive Meetings by Starhawk.)
I recently finished The Democracy Project by David Graeber. It's an excellent book, in part a personal history of the Occupy Wall Street movement, and in part an explanation of how to organise groups and solve problems using the consensus processes of "direct democracy". To my surprise, I realised that Graeber's insights can also be applied to software development. I think these insights explain not only what Agile development is really trying to do, but also reveal a cornucopia of organisational techniques that we can reuse to improve our practice of Agile development.
In Agile development, the team is supposed to organise itself — "horizonally" as Graeber would say — rather than being directed "vertically" by some manager. But exactly how? Practically every Agile process mandates short daily meetings, but practically no Agile process suggests in detail how those meetings should be run, or exactly how they lead to the team organising itself. What's really going on here? And why is "horizontal" better than "vertical"? As Graeber explains, what should be going on is decision making by consensus:
This is how consensus is supposed to work: the group agrees, first, to
some common purpose. This allows the group to look at decision making
as a matter of solving common problems. Seen this way, a diversity of
perspectives, even a radical diversity of perspectives, while it might
cause difficulties, can also be an enormous resource. After all, what
sort of team is more likely to come up with a creative solution to a
problem: a group of people who all see matters somewhat differently,
or a group of people who all see things exactly the same? (Graeber, p203)
The reason for adopting consensus in a group is that "horizontal" decision making produces better decisions. And the key thing to notice is that the point of every meeting is to decide what to do next, in order to further the common purpose of the group. (There can still be some disagreement. In particular, there is no need to agree on "why". Agreement on "what" is sufficient.) Graeber goes on to explain:
The essence of consensus process is just that everyone should be able
to weigh in equally on a decision, and no one should be bound by a
decision they detest. In practice this might be said to boil down to
four principles:
- Everyone who feels they have something relevant to say about a proposal ought to have their perspectives carefully considered.
- Everyone who has strong concerns or objections should have those concerns or objections taken into account and, if possible, addressed in the final form of the proposal.
- Anyone who feels a proposal violates the fundamental principle shared by the group should have the opportunity to veto ("block") that proposal.
- No one should be forced to go along with a decision to which they did not assent.
You might question whether these principles of consensus are really what Agile development is about. You might say, yes, these principles are going to lead to better decisions. But the developers are, at the end of the day, paid staff. If someone in authority decides to impose a decision, the staff will have to go along with it even if they do detest it. Maybe so. But I think this actually means that consensus is an excellent touchstone for whether or not you are really doing Agile development. I think that at the point when someone outside the team does impose a decision, at that moment you know that you are no longer doing Agile development.
Graeber goes on to explain a particular widely-used consensus process for moderately sized groups which grew out of several different traditions, including Quakerism and radical feminism. I won't go through it here (it's outlined on p214-215) and as Graeber says, maybe in a small group that kind of formality is not needed, if everyone understands the consensus principles. But it does help, I think, to have concrete processes to consider and use in larger groups. Everything invented in the world of Agile development seems to be foreshadowed in the world of "direct democracy". For example, a "Scrum-of-Scrums" meeting is almost exactly the same as a "spokescouncil". I think we can learn a lot from these people.
(If you want to see what some of these formal processes look like in more detail, you can find a good overview in The Five Fold Path of Productive Meetings by Starhawk.)
Sunday, 13 October 2013
Recipe: Spare Ribs
This recipe is a bit long-winded, but the results are excellent.
Ingredients
Method
Bring a large pan of water to the boil. (I use a 10 litre stock-pot.) Add the ribs, bring the water back to the boil and leave the ribs to cook for a further 5 minutes. Skim off the scum which rises to the surface. Lift out the ribs and transfer them to a slow-cooker.
Heat the lard in a frying pan and when hot add the fennel, Sichuan pepper, cao guo, cassia bark and star anise. After 1 minute add the onion and cook this until it is brown (around 5 or 10 minutes). Add the garlic and ginger when the onion starts to turn brown: they don't want to cook for so long, only a minute or two. When it's ready, tip the contents of the frying pan into the slow-cooker with the ribs.
Use the 500ml of water to deglaze the frying pan, and add this to the slow-cooker too. Also add the chicken stock and salt to the slow-cooker.
Cook at a gentle heat for 3 hours, moving the ribs around every hour or so.
Lift the ribs out onto a roasting tray and strain the liquid from the slow-cooker through a sieve into a saucepan. Skim off the fat into a container and keep it to hand. Reduce the sauce on a high heat.
Place the roasting tray containing the ribs in an oven pre-heated to to 200°C (= gas mark 6 = 400°F). After 15 minutes, turn the ribs and baste them with some of the skimmed fat. Return to the oven for a further 10 minutes.
Discussion
This recipe evolved from a combination of other recipes, mostly those of Kenneth Lo and Ken Hom. It's not clear to me which of the fiddly details is really necessary to produce an excellent result, and I haven't gone to the trouble of scientifically experimenting to find out. I think that frying the spices and then basting the ribs with the oil at the end must make a significant difference, because that should transfer fat-soluble flavours more reliably to the final product.
As an alternative to putting the salt in with the stock-pot ingredients, you can instead grind salt over the ribs when you place them in the roasting tray, and then again on their other side after you turn them over and baste them with fat.
Ingredients
1.5kg | Spare ribs |
60g | lard |
1t | Fennel, ground |
1t | Sichuan pepper, ground |
1 | Cao guo (or "false cardamom") |
1 | Piece of cassia bark (finger-sized) |
2 | Star anise |
1 | Onion, finely chopped |
3 | Cloves garlic, finely chopped |
2 | Slices of root ginger, finely chopped |
500ml | Water |
500ml | Chicken stock |
2t | Salt |
(1T = one tablespoon = 15ml; 1t = one teaspoon = 5ml) |
Method
Bring a large pan of water to the boil. (I use a 10 litre stock-pot.) Add the ribs, bring the water back to the boil and leave the ribs to cook for a further 5 minutes. Skim off the scum which rises to the surface. Lift out the ribs and transfer them to a slow-cooker.
Heat the lard in a frying pan and when hot add the fennel, Sichuan pepper, cao guo, cassia bark and star anise. After 1 minute add the onion and cook this until it is brown (around 5 or 10 minutes). Add the garlic and ginger when the onion starts to turn brown: they don't want to cook for so long, only a minute or two. When it's ready, tip the contents of the frying pan into the slow-cooker with the ribs.
Use the 500ml of water to deglaze the frying pan, and add this to the slow-cooker too. Also add the chicken stock and salt to the slow-cooker.
Cook at a gentle heat for 3 hours, moving the ribs around every hour or so.
Lift the ribs out onto a roasting tray and strain the liquid from the slow-cooker through a sieve into a saucepan. Skim off the fat into a container and keep it to hand. Reduce the sauce on a high heat.
Place the roasting tray containing the ribs in an oven pre-heated to to 200°C (= gas mark 6 = 400°F). After 15 minutes, turn the ribs and baste them with some of the skimmed fat. Return to the oven for a further 10 minutes.
Discussion
This recipe evolved from a combination of other recipes, mostly those of Kenneth Lo and Ken Hom. It's not clear to me which of the fiddly details is really necessary to produce an excellent result, and I haven't gone to the trouble of scientifically experimenting to find out. I think that frying the spices and then basting the ribs with the oil at the end must make a significant difference, because that should transfer fat-soluble flavours more reliably to the final product.
As an alternative to putting the salt in with the stock-pot ingredients, you can instead grind salt over the ribs when you place them in the roasting tray, and then again on their other side after you turn them over and baste them with fat.
Sunday, 6 October 2013
Time, Value and the dark side of Agile
(This post is a follow-up to
Agile
is Antifragile.)
What is the best way to organise a group of people who are working together to make some software? This must be the central question of software engineering, but we tend to think about it in a very narrow way. It's rather odd that we are happy to talk about processes, from Waterfall to Agile, but we don't like to talk about the kind of practical problems that always arise when people come together to make something for money.
These practical problems are standard fare in business schools and in some mainstream engineering courses, where they are discussed under the label Organisation Theory. This in turn has its roots in ideas of "political economy" that go back over 200 years. Once you understand these ideas, you can see clearly the cause of some tensions that arise in software development. So, let's see if I can introduce you to these ideas — I don't expect in this post to get to the twentieth-century fashions started by Taylor and Mayo, but we should be able to cover enough of the nineteenth century ideas to understand the different feelings people have about Agile and to recognise its dark side.
Where to start? Lisp-hacker Paul Graham, in his essay How to Make Wealth, uses the following example to explain how wealth or "value" can be created by people out of nothing:
Graham uses this parable to explain where the wealth comes from when people work for themselves in a start-up company: it comes from people spending their time working on something that's useful to themselves and to others. This is scarcely a new thought, and although it would be more traditional to say that "value" comes from "socially necessary labour time", Graham's explanation of wealth is essentially the same as the traditional one.
However, there is a subtlety here which Graham glosses over, a subtlety that the old-time political economists were well aware of: it's the issue of how things change when, rather than working for yourself, you work for someone else. Following the culinary theme of this blog, let's invent our own parable, with people working for someone else in the kitchen of a restaurant.
In our make-believe restaurant there is a proprietor who pays the rent and provides the furniture and most of the kitchen equipment. Every day the proprietor buys the ingredients for the meals and pays the wages of the staff. In return, the staff work a certain number of hours each day, during which they transform those raw ingredients into the meals which customers buy. Value has certainly been created, and in return for this newly created value, the customers leave money after they finish their meals, money which is collected by the proprietor.
The key point here is to notice where the new value comes from: it comes from the staff who spend their time preparing and cooking the food. Not from the proprietor, who merely acts as a facilitator or catalyst by providing the money that makes their work possible. In Graham's car-refurbishment example, imagine that over the summer you keep the stripped-down car in your uncle's garage as you do the work. Does the fact that he is providing the garage mean that he is directly creating value? No. He is facilitating the creation of value, but unless he works on the car with you, that's all he is doing. Without him, you would not be able to create value, but without you, he would just have an empty garage and there would be no newly created value, no new wealth.
Now we come to the central tension between the proprietor and the restaurant staff: how does the newly created value get shared out? Assuming the restaurant makes a profit, then after necessary running expenses are paid, how much of the remaining money goes to the staff as wages and how much to the proprietor? They both have a moral right to that money, since they both have to play their part in order for the new value to be created. There is an unavoidable tension here: they are both in the right.
From the point of view of the proprietor, wanting to keep more of the money for themselves, there is no reason to pay the staff more than the current going-rate. Look around: what are other restaurants paying? You really don't need to pay any more than that. If you do, you are certain to find people knocking on your door asking to work for you, regardless of what fraction of the takings you keep for yourself and what fraction you distribute to your staff as wages. And of course, if you pay less than the going rate, your staff will be tempted away by better offers elsewhere. You won't actually have a restaurant if all your staff leave.
So, the necessary level of wages is determined by the world outside the restaurant, and provided the proprietor pays those wages, they can take whatever money is left for themselves. We would naturally expect the proprietor to try and take more rather than less, and there are essentially two main methods to achieve this: the proprietor can either try to increase the productivity of each member of staff — but still pay the same wages — or they can try to "de-skill" the work, then hire less-skilled employees who can be paid less to produce the same value.
The classic way to achieve the first of these aims is to increase productivity by the "division of labour", described by Adam Smith in the eighteenth century and celebrated today on the back of the British 20 pound note. The idea is that if particular people specialise in particular tasks, then they will perform these tasks quicker than if they continually switch from one thing to another. This is the essence of the twentieth-century factory, with jobs reduced to fitting a particular part or tightening a particular bolt all day. And in the kitchen we do see division of labour to some extent, with specialists in sauces and deserts, for example.
More interestingly, and less often noticed, the division of labour can also achieve the second of the proprietor's aims: it can change the nature of the work, de-skilling it and allowing the proprietor to pay reduced wages, at least to some of the staff. This was well understood by computing hero Charles Babbage, who in the 1830s took time off from building the difference engine to write a treatise on political economy, based on the observations he made while touring the factories and workshops of England. In On the Economy of Machinery and Manufactures (1832), Babbage explains:
Babbage took Smith's example of a pin factory and went one better in his analysis: he detailed the precise division of work and the wages paid to each of the workers. Some processes in pin manufacture required both skill and strength; some were so simple that a child could do them — and since this was Victorian England, a child did do them. So the highest paid worker on the staff was a man, at 5s. 4½d. per day; the lowest paid a boy, at 4½d. per day. Which meant, as Harry Braverman points out in Labor and Monopoly Capital (1974):
This explains why it makes sense for the proprietor of the restaurant to employ a less skilled person just to peel potatoes and wash the dishes. The proprietor could hire a skilled cook to do this work, but it will be cheaper to hire someone less skilled, preferably the least capable person who can still do those jobs. A skilled cook would be more expensive because their wage is determined by what they could earn elsewhere, outside this restaurant, not by what they do inside. And exactly the same is true of any job done for a wage — a point which (finally) brings us back to software development.
The proprietor of a software development organisation would obviously like to benefit in the same way as the proprietor of the restaurant. They would like to use division of labour, so that they can reduce their wage-bill by hiring a staff of cheaper, less-skilled people who can each only just do their own particular jobs. Wherever people make things in return for wages there will be this pressure. Software is no different.
But there's a catch. In software, is this really possible? Division of labour makes sense in a mass-production environment, but is software development such an environment? The proprietors of the software world clearly wish it were so, but that wish can only come true to the extent that software development is predictable and routine — and for the most part it isn't. Just like other forms of engineering, software development is around 70% repair and maintenance. Repair is never routine, and in software even the initial production is often unpredictable, because it is actually mostly design work, not manufacture in the normal sense. (This is perhaps the main way in which coding differs from cooking: coding is always about process creation, cooking is usually about process execution.)
However, this practical difficulty doesn't prevent software proprietors from yearning for predictability. It might even explain their unreasonable enthusiasm for the Waterfall process over the years. Any process that promises predictability is also implicitly promising a less skilled staff and a lower wage-bill. In contrast, an Agile process is implicitly demanding a more skilled staff and hence a higher wage-bill — because when you don't believe in prediction you need a flexible, broadly competent staff who can step-up and do things that nobody expected. (Of course, to the extent that a particular project really is unpredictable, the Agile enthusiasts will be correct when they claim that in the end their way is actually better, because the less skilled staff would actually take longer and cost more, if they finished at all.)
What started as a tension between the proprietor and the staff over how to share out the newly created value has turned into a tension over skills and competence. Developers might naturally be expected to take a pride in their craft skills, building them up over the years like any artisan. Perhaps the best expression of this desire can be found in open-source software, where there is never any attempt to make the work of creating software routine, predictable or de-skilled. But the difference here is that open-source software is written by hackers for themselves, not for a wage.
Agile processes promise the developers an opportunity to practice and grow their craft skills. Agile processes promise the proprietor nothing but an extra expense, accepted only because predicting the future is too hard. But some Agile processes implicitly make another promise to the proprietor, a promise with a dark side. And I think this dark side is perhaps why some developers are right to be suspicious of the rhetoric of Agile development.
Remember the two ways the proprietor can make more money? I've been concentrating on de-skilling, but there's also the other way: the proprietor can keep more money for themselves if they can increase the productivity of each member of staff, but still pay the same wages. With a given division of labour and a given level of skill, the proprietor can keep more money for themselves if they can get their staff to work for longer each day or to work with greater intensity. And when we deconstruct the terminology used by the most popular Agile process, Scrum, we find an unfortunate allusion to work at a greater intensity.
For example, the week-by-week cycle of development is called an "iteration" in practically every other software process, including Extreme Programming. In Scrum this time period is called a Sprint. What associations does that conjure up? Well when you sprint you make an extreme effort to go as fast as possible, an effort that you can only keep up for a short time. That's what "sprint" means in everyday life.
The list of unimplemented features is just called the "user stories" in Extreme Programming. In Scrum this list is called the Backlog. What does "backlog" mean in everyday life? It's the stuff that you should have already finished. The metaphor is clear: Rush! Rush! Finish it off quickly! You're late already.
Notice that I'm not saying that this is what these names should mean, or that this is what Scrum practitioners want them to mean, it's just that words come with everyday meanings, and people will tend to understand them this way. Imagine if you went to a restaurant and your waiter told you that the proprietor had introduced a new kind of organisation in the kitchen, called Scrum. The kitchen staff now have a "backlog" of dishes to prepare and they are "sprinting" to get them done. As a customer, I think you would take the metaphor at face value. Now imagine that you are a customer or a manager of a software development team using Scrum. Is there any reason for you not to take the metaphor at face value?
So this is the dark side of Agile: the danger that people outside the development team will hear these metaphors and misunderstand them — as a promise of work at a greater intensity and nothing more. Thinking that, apart from that, Agile is business as usual. Still yearning to make more money out of predictability, lower skills and lower wages. Not understanding that the point of Agile is to be Antifragile.
It's easy to imagine, if you were a developer in such an environment, how you would feel, how you would resist. I think when Agile fails, it must often fail in this way.
What is the best way to organise a group of people who are working together to make some software? This must be the central question of software engineering, but we tend to think about it in a very narrow way. It's rather odd that we are happy to talk about processes, from Waterfall to Agile, but we don't like to talk about the kind of practical problems that always arise when people come together to make something for money.
These practical problems are standard fare in business schools and in some mainstream engineering courses, where they are discussed under the label Organisation Theory. This in turn has its roots in ideas of "political economy" that go back over 200 years. Once you understand these ideas, you can see clearly the cause of some tensions that arise in software development. So, let's see if I can introduce you to these ideas — I don't expect in this post to get to the twentieth-century fashions started by Taylor and Mayo, but we should be able to cover enough of the nineteenth century ideas to understand the different feelings people have about Agile and to recognise its dark side.
Where to start? Lisp-hacker Paul Graham, in his essay How to Make Wealth, uses the following example to explain how wealth or "value" can be created by people out of nothing:
"Suppose you own a beat-up old car. Instead of
sitting on your butt next summer, you could spend the time restoring
your car to pristine condition. In doing so you create wealth. The
world is — and you specifically are — one pristine old car the
richer. And not just in some metaphorical way. If you sell your car,
you'll get more for it. In restoring your old car you have made
yourself richer. You haven't made anyone else poorer."
Graham uses this parable to explain where the wealth comes from when people work for themselves in a start-up company: it comes from people spending their time working on something that's useful to themselves and to others. This is scarcely a new thought, and although it would be more traditional to say that "value" comes from "socially necessary labour time", Graham's explanation of wealth is essentially the same as the traditional one.
However, there is a subtlety here which Graham glosses over, a subtlety that the old-time political economists were well aware of: it's the issue of how things change when, rather than working for yourself, you work for someone else. Following the culinary theme of this blog, let's invent our own parable, with people working for someone else in the kitchen of a restaurant.
In our make-believe restaurant there is a proprietor who pays the rent and provides the furniture and most of the kitchen equipment. Every day the proprietor buys the ingredients for the meals and pays the wages of the staff. In return, the staff work a certain number of hours each day, during which they transform those raw ingredients into the meals which customers buy. Value has certainly been created, and in return for this newly created value, the customers leave money after they finish their meals, money which is collected by the proprietor.
The key point here is to notice where the new value comes from: it comes from the staff who spend their time preparing and cooking the food. Not from the proprietor, who merely acts as a facilitator or catalyst by providing the money that makes their work possible. In Graham's car-refurbishment example, imagine that over the summer you keep the stripped-down car in your uncle's garage as you do the work. Does the fact that he is providing the garage mean that he is directly creating value? No. He is facilitating the creation of value, but unless he works on the car with you, that's all he is doing. Without him, you would not be able to create value, but without you, he would just have an empty garage and there would be no newly created value, no new wealth.
Now we come to the central tension between the proprietor and the restaurant staff: how does the newly created value get shared out? Assuming the restaurant makes a profit, then after necessary running expenses are paid, how much of the remaining money goes to the staff as wages and how much to the proprietor? They both have a moral right to that money, since they both have to play their part in order for the new value to be created. There is an unavoidable tension here: they are both in the right.
From the point of view of the proprietor, wanting to keep more of the money for themselves, there is no reason to pay the staff more than the current going-rate. Look around: what are other restaurants paying? You really don't need to pay any more than that. If you do, you are certain to find people knocking on your door asking to work for you, regardless of what fraction of the takings you keep for yourself and what fraction you distribute to your staff as wages. And of course, if you pay less than the going rate, your staff will be tempted away by better offers elsewhere. You won't actually have a restaurant if all your staff leave.
So, the necessary level of wages is determined by the world outside the restaurant, and provided the proprietor pays those wages, they can take whatever money is left for themselves. We would naturally expect the proprietor to try and take more rather than less, and there are essentially two main methods to achieve this: the proprietor can either try to increase the productivity of each member of staff — but still pay the same wages — or they can try to "de-skill" the work, then hire less-skilled employees who can be paid less to produce the same value.
The classic way to achieve the first of these aims is to increase productivity by the "division of labour", described by Adam Smith in the eighteenth century and celebrated today on the back of the British 20 pound note. The idea is that if particular people specialise in particular tasks, then they will perform these tasks quicker than if they continually switch from one thing to another. This is the essence of the twentieth-century factory, with jobs reduced to fitting a particular part or tightening a particular bolt all day. And in the kitchen we do see division of labour to some extent, with specialists in sauces and deserts, for example.
More interestingly, and less often noticed, the division of labour can also achieve the second of the proprietor's aims: it can change the nature of the work, de-skilling it and allowing the proprietor to pay reduced wages, at least to some of the staff. This was well understood by computing hero Charles Babbage, who in the 1830s took time off from building the difference engine to write a treatise on political economy, based on the observations he made while touring the factories and workshops of England. In On the Economy of Machinery and Manufactures (1832), Babbage explains:
"... the master manufacturer, by dividing the work to be executed
into different processes, each requiring different degrees of skill or
force, can purchase exactly that precise quantity of both which is
necessary for each process; whereas if the work were executed by one
workman, that person must possess sufficient skill to perform the most
difficult, and sufficient strength to execute the most laborious, of
the operations into which the art is divided." (pp175-176)
Babbage took Smith's example of a pin factory and went one better in his analysis: he detailed the precise division of work and the wages paid to each of the workers. Some processes in pin manufacture required both skill and strength; some were so simple that a child could do them — and since this was Victorian England, a child did do them. So the highest paid worker on the staff was a man, at 5s. 4½d. per day; the lowest paid a boy, at 4½d. per day. Which meant, as Harry Braverman points out in Labor and Monopoly Capital (1974):
"If the minimum pay for a craftsman capable of performing all
operations is no more than the highest pay in the above listing, and
if such craftsmen are employed exclusively, then the labour costs of
manufacture would be more than doubled, even if the very same
division of labor were employed and even if the craftsmen produced
the pins at the very same speed as the detail workers." (p80)
[italics in original]
This explains why it makes sense for the proprietor of the restaurant to employ a less skilled person just to peel potatoes and wash the dishes. The proprietor could hire a skilled cook to do this work, but it will be cheaper to hire someone less skilled, preferably the least capable person who can still do those jobs. A skilled cook would be more expensive because their wage is determined by what they could earn elsewhere, outside this restaurant, not by what they do inside. And exactly the same is true of any job done for a wage — a point which (finally) brings us back to software development.
The proprietor of a software development organisation would obviously like to benefit in the same way as the proprietor of the restaurant. They would like to use division of labour, so that they can reduce their wage-bill by hiring a staff of cheaper, less-skilled people who can each only just do their own particular jobs. Wherever people make things in return for wages there will be this pressure. Software is no different.
But there's a catch. In software, is this really possible? Division of labour makes sense in a mass-production environment, but is software development such an environment? The proprietors of the software world clearly wish it were so, but that wish can only come true to the extent that software development is predictable and routine — and for the most part it isn't. Just like other forms of engineering, software development is around 70% repair and maintenance. Repair is never routine, and in software even the initial production is often unpredictable, because it is actually mostly design work, not manufacture in the normal sense. (This is perhaps the main way in which coding differs from cooking: coding is always about process creation, cooking is usually about process execution.)
However, this practical difficulty doesn't prevent software proprietors from yearning for predictability. It might even explain their unreasonable enthusiasm for the Waterfall process over the years. Any process that promises predictability is also implicitly promising a less skilled staff and a lower wage-bill. In contrast, an Agile process is implicitly demanding a more skilled staff and hence a higher wage-bill — because when you don't believe in prediction you need a flexible, broadly competent staff who can step-up and do things that nobody expected. (Of course, to the extent that a particular project really is unpredictable, the Agile enthusiasts will be correct when they claim that in the end their way is actually better, because the less skilled staff would actually take longer and cost more, if they finished at all.)
What started as a tension between the proprietor and the staff over how to share out the newly created value has turned into a tension over skills and competence. Developers might naturally be expected to take a pride in their craft skills, building them up over the years like any artisan. Perhaps the best expression of this desire can be found in open-source software, where there is never any attempt to make the work of creating software routine, predictable or de-skilled. But the difference here is that open-source software is written by hackers for themselves, not for a wage.
Agile processes promise the developers an opportunity to practice and grow their craft skills. Agile processes promise the proprietor nothing but an extra expense, accepted only because predicting the future is too hard. But some Agile processes implicitly make another promise to the proprietor, a promise with a dark side. And I think this dark side is perhaps why some developers are right to be suspicious of the rhetoric of Agile development.
Remember the two ways the proprietor can make more money? I've been concentrating on de-skilling, but there's also the other way: the proprietor can keep more money for themselves if they can increase the productivity of each member of staff, but still pay the same wages. With a given division of labour and a given level of skill, the proprietor can keep more money for themselves if they can get their staff to work for longer each day or to work with greater intensity. And when we deconstruct the terminology used by the most popular Agile process, Scrum, we find an unfortunate allusion to work at a greater intensity.
For example, the week-by-week cycle of development is called an "iteration" in practically every other software process, including Extreme Programming. In Scrum this time period is called a Sprint. What associations does that conjure up? Well when you sprint you make an extreme effort to go as fast as possible, an effort that you can only keep up for a short time. That's what "sprint" means in everyday life.
The list of unimplemented features is just called the "user stories" in Extreme Programming. In Scrum this list is called the Backlog. What does "backlog" mean in everyday life? It's the stuff that you should have already finished. The metaphor is clear: Rush! Rush! Finish it off quickly! You're late already.
Notice that I'm not saying that this is what these names should mean, or that this is what Scrum practitioners want them to mean, it's just that words come with everyday meanings, and people will tend to understand them this way. Imagine if you went to a restaurant and your waiter told you that the proprietor had introduced a new kind of organisation in the kitchen, called Scrum. The kitchen staff now have a "backlog" of dishes to prepare and they are "sprinting" to get them done. As a customer, I think you would take the metaphor at face value. Now imagine that you are a customer or a manager of a software development team using Scrum. Is there any reason for you not to take the metaphor at face value?
So this is the dark side of Agile: the danger that people outside the development team will hear these metaphors and misunderstand them — as a promise of work at a greater intensity and nothing more. Thinking that, apart from that, Agile is business as usual. Still yearning to make more money out of predictability, lower skills and lower wages. Not understanding that the point of Agile is to be Antifragile.
It's easy to imagine, if you were a developer in such an environment, how you would feel, how you would resist. I think when Agile fails, it must often fail in this way.
Sunday, 29 September 2013
Recipe: Tangerine-Peel Chicken
This recipe was originally based on one by Kenneth Lo, which he says
was adapted from the Chengtu Dining Rooms, Chengtu.
Ingredients
Method
Chop the chicken thighs through the bone into two or three pieces. Leave the skin on.
Combine the sliced onion, shredded root ginger, salt, soy sauce and rice wine in a bowl. Add the chicken pieces and rub this marinade into the chicken. Leave in the fridge for at least 1 hour.
Prepare the sauce ingredients now: shred the red pepper and chilli (discard the seeds) and break the tangerine peel into small pieces.
Shake the chicken pieces free of the onion and ginger. Semi-deep-fry in two batches in a wok, using about 200ml oil. (Semi-deep-frying only really works in a wok, where you have a deep-enough pool of hot oil in the middle and you keep turning the chicken pieces and splashing over the hot oil.) Cook until the chicken pieces are quite brown. (This might be around 5 minutes. You want to get the internal temperature up to 70°C.) Put the chicken pieces to one side.
Pour out your hot oil into a suitable container. Clean the wok in the sink and dry it in the usual way. Now make the sauce: heat 2T of oil in the wok. When hot, add the red pepper, chilli and tangerine peel. Stir-fry for one and a half minutes over a medium heat. Add the remaining ingredients and stir together for one minute.
Return the chicken pieces to the pan and mix with the sauce. Mix and turn for two minutes over a medium heat.
Discussion
This recipe might in some ways be closer to the original than Kenneth Lo's version, since I've substituted Sichuan pepper and rice wine where he uses crushed peppercorns and sherry. On the other hand, my chicken pieces are larger — in his recipe they are "bite sized" and I'm sure that's more authentic.
Ingredients
4 | Chicken thighs (say 600g) |
1 | Medium onion, sliced |
2 | Slices root ginger, finely shredded |
1t | Salt |
2t | Light soy sauce |
1T | Rice wine |
200ml | Oil for semi-deep-frying |
For the sauce: | |
1 | Small red pepper (say 100g), finely shredded |
1 | Dried chilli, finely shredded |
2T | Dried tangerine peel, broken into small pieces |
2T | Oil for frying |
1T | Light soy sauce |
3T | Chicken stock |
1t | Sugar |
1t | Sichuan pepper, lightly crushed |
(1T = one tablespoon = 15ml; 1t = one teaspoon = 5ml) |
Method
Chop the chicken thighs through the bone into two or three pieces. Leave the skin on.
Combine the sliced onion, shredded root ginger, salt, soy sauce and rice wine in a bowl. Add the chicken pieces and rub this marinade into the chicken. Leave in the fridge for at least 1 hour.
Prepare the sauce ingredients now: shred the red pepper and chilli (discard the seeds) and break the tangerine peel into small pieces.
Shake the chicken pieces free of the onion and ginger. Semi-deep-fry in two batches in a wok, using about 200ml oil. (Semi-deep-frying only really works in a wok, where you have a deep-enough pool of hot oil in the middle and you keep turning the chicken pieces and splashing over the hot oil.) Cook until the chicken pieces are quite brown. (This might be around 5 minutes. You want to get the internal temperature up to 70°C.) Put the chicken pieces to one side.
Pour out your hot oil into a suitable container. Clean the wok in the sink and dry it in the usual way. Now make the sauce: heat 2T of oil in the wok. When hot, add the red pepper, chilli and tangerine peel. Stir-fry for one and a half minutes over a medium heat. Add the remaining ingredients and stir together for one minute.
Return the chicken pieces to the pan and mix with the sauce. Mix and turn for two minutes over a medium heat.
Discussion
This recipe might in some ways be closer to the original than Kenneth Lo's version, since I've substituted Sichuan pepper and rice wine where he uses crushed peppercorns and sherry. On the other hand, my chicken pieces are larger — in his recipe they are "bite sized" and I'm sure that's more authentic.
Sunday, 22 September 2013
Putting the 'P' in CAP
Brewer's CAP Theorem says that in a distributed system you can
guarantee at most two of Consistency, Availability and Partition
Tolerance. So there's a trade-off: when you are designing your system
you will have to decide which two of these three properties you want to
always maintain, and which one you are prepared to drop. However,
there appears to be a bit of confusion about what it would really
mean to drop partition tolerance. Is that even possible? (For example,
see You
Can't Sacrifice Partition Tolerance by Coda Hale.)
In fact you can "trade off" partition tolerance and build a system that guarantees both consistency and availability. This is exactly the design decision that was made by the engineers who built traditional telephone networks. However, this decision wasn't a "trade off" in the usual sense, where you get to save money on one thing and spend it on something else — instead they had to spend quite a lot more money to build the unusually reliable nodes and communication links that let them drop partition tolerance as a whole-system property.
To see how this works, I'll first give a (very brief) explanation of what the CAP Theorem says, in order to pave the way for a (fairly brief) explanation of the techniques you can use to build reliable sub-systems. If you haven't come across the CAP Theorem before, I think the nicest introduction is Brewer's CAP Theorem by Julian Browne. (Some of what I say here is also based on Perspectives on the CAP Theorem by Seth Gilbert and Nancy Lynch, which is a more detailed overview; it's also worth reading CAP Twelve Years Later: How the "Rules" Have Changed, which is a retrospective and commentary by Eric Brewer himself.)
Consistency This is close to the distributed systems idea of "safety". A system is "safe" if it never says anything wrong, if every response sent to any client of the system is always correct. In systems of any complexity, this often amounts to saying that every request must appear to all clients to have been executed atomically (in a single instant) by some single central node.
Availability This is close to the distributed systems idea of "liveness". A system is "live" if every request from every client eventually receives a response. In practice there's a lower limit on how quick a response could be (light-speed delay across whatever fraction of the distributed system is necessary for that response) and there's an upper limit, after which clients or people get bored, decide that the system has failed and decide to take remedial action.
Partition Tolerance We say that a system has partition tolerance if it behaves as intended by its designers in the face of arbitrary message delay or message loss in the underlying communications system. Usually this will be because of a "network partition" where some link or node fails, but in best-effort networks this can also include congestion, for example due to packets being dropped on a congested port. A major practical problem with partition tolerance is that very often the different parts of a distributed system will disagree about whether or not there currently is a partition.
The CAP Theorem says that you can build a distributed system with any two of these three properties. Traditional "ACID" SQL databases choose to drop availability: they delay responses to clients until they are certain to be consistent. More avant-garde "BASE" NoSQL systems choose to drop consistency: under pressure they give fast but possibly out-of-date responses and patch things up later. And old-fashioned telephone networks drop partition tolerance: they use nodes and communication links which are so reliable that the distributed system (almost) never sees message loss or arbitrary delay. But how do you do that?
The usual pragmatic solution in any situation where a component might fail is to replicate that component. For example, a plane with two engines should be able to reach its destination if one of them fails. In our case things are slightly more interesting because if we replicate nodes, and these check each other, what's to stop a failed node wrongly accusing a correct node of having failed? A bad node could shut down a good node! To get over this problem, we build nodes into fail-stop pairs with a separate checker:
This sub-system works roughly like this: a request comes into the checker on one (or both) of the links on right. The checker forwards this request to both node1 and node2. These nodes are exact duplicates, and work separately to produce what should be identical responses. The checker makes sure that these responses match, in which case it sends that response back to the client. But if the responses don't match, the checker stops and refuses to communicate on any of its ports. Making this sub-system run again is a maintenance action. (When this architecture is used in a railway-signalling system, the checker might physically blow a fuse to make certain that it won't work again without maintenance.) In addition to making sure that the results from node1 and node2 match, the checker also sets a time limit for their responses, and similarly fail-stops if this limit is exceeded. (So at the lowest level, the checker enforces availability or "liveness" as well as consistency between node1 and node2.)
There are a lot of subtleties here. To defend against software errors, it is preferable to have two different implementations of the system in node1 and node2. To defend against hardware failure, it is preferable to have different hardware in node1 and node2, or at least to have a different encoding for data on the interfaces to the two nodes. (In this case the checker ensures that responses are equivalent, rather than bit-identical.) Each node may also run "routining" code in the background which continually checks on the consistency of its internal data, to guard against bit-errors in memory. If a node finds such an error it logs this problem and then simply stops itself. (The checker will then cleanly fail the whole sub-system when it subsequently doesn't get a response to a request.)
And what happens if the checker fails? It should be possible to build a checker which is more reliable than a node, because it is usually much simpler than a node. However, it's going to fail sooner or later. If it completely stops, that's actually ok, but it might fail in some more insidious way, and not detect a subsequent failure in node1 or node2. Depending on how paranoid you are feeling, you might therefore double-up the checker, so that both responses get passed through the first checker and checked again by the second checker. (Or more ingeniously, you might apply these same techniques recursively inside the checker.)
With this architecture, we solve one of our most tricky problems: reliably detecting partitions of one node. We can combine two of these fail-stop pairs into a master-slave combination continually exchanging I-am-still-alive signals, usually in the form of the data updates needed to keep the slave's data-structures in sync with the master. If the master fails it will stop cleanly, the slave will notice this after a short, predictable delay and will take over from the master. (An alternative architecture which is sometimes used at the lowest level is to have three nodes rather than two and to arrange for a two-out-of-three majority vote at the checker. This requires a more complex and therefore more error-prone checker, but has the advantage that when a one node fails, recovery is immediate and the remaining two good nodes can continue as a fail-stop pair.)
In this way we can build resilient processing nodes and we can use the same techniques to build resilient communications nodes which forward data from one link to another. And then by a combination of link replication, forward-error-correction, bandwidth reservation and automatic fail-over from one link to another we can ensure that failures on a few links cannot impede traffic in the communication network for more than a short period. (It is customary to "over-dimension" these systems so that they have considerably more capacity than their predicted peak load.) If this architecture is taken to the extremes seen in traditional telephone switches, it's even possible to completely decommission and rebuild a switch while all the time it carries its rated traffic.
So you can "trade off" partition tolerance, but it's actually rather costly. It's rather curious that by making the sub-systems more fragile, we can make the whole system more robust. There's an almost biological feel to this — it's a bit like apopotosis or "programmed cell-death", where a cell detects that it's going a bit wrong and rather than turn into a cancer it commits suicide very cleanly. It's also rather curious that the properties we enforce at the lowest level of the checker are consistency and availability — exactly the properties that we want to have at the top level.
In practice, as noted by Gilbert, Lynch and Brewer in the papers I mentioned earlier, in real systems we never trade off all of consistency, availability or partition tolerance. In practice, we compromise a little on one to gain a little more of another, and we make different compromises at different times or for different purposes. But if you see a system which appears to be getting more than its fair share of both consistency and availability, look a little closer: it must be based on low level resilience using the sort of techniques I've described here.
In fact you can "trade off" partition tolerance and build a system that guarantees both consistency and availability. This is exactly the design decision that was made by the engineers who built traditional telephone networks. However, this decision wasn't a "trade off" in the usual sense, where you get to save money on one thing and spend it on something else — instead they had to spend quite a lot more money to build the unusually reliable nodes and communication links that let them drop partition tolerance as a whole-system property.
To see how this works, I'll first give a (very brief) explanation of what the CAP Theorem says, in order to pave the way for a (fairly brief) explanation of the techniques you can use to build reliable sub-systems. If you haven't come across the CAP Theorem before, I think the nicest introduction is Brewer's CAP Theorem by Julian Browne. (Some of what I say here is also based on Perspectives on the CAP Theorem by Seth Gilbert and Nancy Lynch, which is a more detailed overview; it's also worth reading CAP Twelve Years Later: How the "Rules" Have Changed, which is a retrospective and commentary by Eric Brewer himself.)
Consistency This is close to the distributed systems idea of "safety". A system is "safe" if it never says anything wrong, if every response sent to any client of the system is always correct. In systems of any complexity, this often amounts to saying that every request must appear to all clients to have been executed atomically (in a single instant) by some single central node.
Availability This is close to the distributed systems idea of "liveness". A system is "live" if every request from every client eventually receives a response. In practice there's a lower limit on how quick a response could be (light-speed delay across whatever fraction of the distributed system is necessary for that response) and there's an upper limit, after which clients or people get bored, decide that the system has failed and decide to take remedial action.
Partition Tolerance We say that a system has partition tolerance if it behaves as intended by its designers in the face of arbitrary message delay or message loss in the underlying communications system. Usually this will be because of a "network partition" where some link or node fails, but in best-effort networks this can also include congestion, for example due to packets being dropped on a congested port. A major practical problem with partition tolerance is that very often the different parts of a distributed system will disagree about whether or not there currently is a partition.
The CAP Theorem says that you can build a distributed system with any two of these three properties. Traditional "ACID" SQL databases choose to drop availability: they delay responses to clients until they are certain to be consistent. More avant-garde "BASE" NoSQL systems choose to drop consistency: under pressure they give fast but possibly out-of-date responses and patch things up later. And old-fashioned telephone networks drop partition tolerance: they use nodes and communication links which are so reliable that the distributed system (almost) never sees message loss or arbitrary delay. But how do you do that?
The usual pragmatic solution in any situation where a component might fail is to replicate that component. For example, a plane with two engines should be able to reach its destination if one of them fails. In our case things are slightly more interesting because if we replicate nodes, and these check each other, what's to stop a failed node wrongly accusing a correct node of having failed? A bad node could shut down a good node! To get over this problem, we build nodes into fail-stop pairs with a separate checker:
This sub-system works roughly like this: a request comes into the checker on one (or both) of the links on right. The checker forwards this request to both node1 and node2. These nodes are exact duplicates, and work separately to produce what should be identical responses. The checker makes sure that these responses match, in which case it sends that response back to the client. But if the responses don't match, the checker stops and refuses to communicate on any of its ports. Making this sub-system run again is a maintenance action. (When this architecture is used in a railway-signalling system, the checker might physically blow a fuse to make certain that it won't work again without maintenance.) In addition to making sure that the results from node1 and node2 match, the checker also sets a time limit for their responses, and similarly fail-stops if this limit is exceeded. (So at the lowest level, the checker enforces availability or "liveness" as well as consistency between node1 and node2.)
There are a lot of subtleties here. To defend against software errors, it is preferable to have two different implementations of the system in node1 and node2. To defend against hardware failure, it is preferable to have different hardware in node1 and node2, or at least to have a different encoding for data on the interfaces to the two nodes. (In this case the checker ensures that responses are equivalent, rather than bit-identical.) Each node may also run "routining" code in the background which continually checks on the consistency of its internal data, to guard against bit-errors in memory. If a node finds such an error it logs this problem and then simply stops itself. (The checker will then cleanly fail the whole sub-system when it subsequently doesn't get a response to a request.)
And what happens if the checker fails? It should be possible to build a checker which is more reliable than a node, because it is usually much simpler than a node. However, it's going to fail sooner or later. If it completely stops, that's actually ok, but it might fail in some more insidious way, and not detect a subsequent failure in node1 or node2. Depending on how paranoid you are feeling, you might therefore double-up the checker, so that both responses get passed through the first checker and checked again by the second checker. (Or more ingeniously, you might apply these same techniques recursively inside the checker.)
With this architecture, we solve one of our most tricky problems: reliably detecting partitions of one node. We can combine two of these fail-stop pairs into a master-slave combination continually exchanging I-am-still-alive signals, usually in the form of the data updates needed to keep the slave's data-structures in sync with the master. If the master fails it will stop cleanly, the slave will notice this after a short, predictable delay and will take over from the master. (An alternative architecture which is sometimes used at the lowest level is to have three nodes rather than two and to arrange for a two-out-of-three majority vote at the checker. This requires a more complex and therefore more error-prone checker, but has the advantage that when a one node fails, recovery is immediate and the remaining two good nodes can continue as a fail-stop pair.)
In this way we can build resilient processing nodes and we can use the same techniques to build resilient communications nodes which forward data from one link to another. And then by a combination of link replication, forward-error-correction, bandwidth reservation and automatic fail-over from one link to another we can ensure that failures on a few links cannot impede traffic in the communication network for more than a short period. (It is customary to "over-dimension" these systems so that they have considerably more capacity than their predicted peak load.) If this architecture is taken to the extremes seen in traditional telephone switches, it's even possible to completely decommission and rebuild a switch while all the time it carries its rated traffic.
So you can "trade off" partition tolerance, but it's actually rather costly. It's rather curious that by making the sub-systems more fragile, we can make the whole system more robust. There's an almost biological feel to this — it's a bit like apopotosis or "programmed cell-death", where a cell detects that it's going a bit wrong and rather than turn into a cancer it commits suicide very cleanly. It's also rather curious that the properties we enforce at the lowest level of the checker are consistency and availability — exactly the properties that we want to have at the top level.
In practice, as noted by Gilbert, Lynch and Brewer in the papers I mentioned earlier, in real systems we never trade off all of consistency, availability or partition tolerance. In practice, we compromise a little on one to gain a little more of another, and we make different compromises at different times or for different purposes. But if you see a system which appears to be getting more than its fair share of both consistency and availability, look a little closer: it must be based on low level resilience using the sort of techniques I've described here.
Sunday, 16 June 2013
Summer Recess
Summer is nearly upon us. When I started this blog in freezing January I had
the idea that it could be a vehicle to help me figure out some half-baked
ideas. I think that's worked out quite well, although my posts have
often been a bit on the long side, more like short essays.
But now I need to get some code written and I also need to go on holiday. Writing essays doesn't really help with either of those. So I've decided that this blog can have a "long vacation" or summer recess.
See you in the autumn.
But now I need to get some code written and I also need to go on holiday. Writing essays doesn't really help with either of those. So I've decided that this blog can have a "long vacation" or summer recess.
See you in the autumn.
Sunday, 9 June 2013
The Robots Game
(This post is another follow-up to
Learning
to Program.)
When stage magician James Randi was doing cabaret work, he had the habit of opening his act with a newspaper tearing-and-reassembly routine. With the music playing, he would stride out in front of the audience with a piece of newspaper folded under his arm. He would unfurl the newspaper and tear it up into pieces. Then he would shuffle the pile of pieces, give it a flip and it would all be back together again. The music would stop, he would throw the newspaper aside and address the audience, saying: "I don't know how that trick's done and I don't care."
Now the funny thing about this, as Randi explains, is that there was in fact something about the routine that he didn't know. The rest of the act went better when he started out this way, but for a long time he didn't know why. "Most magicians," says Randi, "don't know why their tricks work. They know they work because they've done them and done them repeatedly, and sometimes they do things on stage that they don't quite understand, but they know that they work."
What was Randi's opening routine really doing? "I started doing it and I found it worked very well," he says. "But I only realised years into it that I was sizing up the audience. I would be looking around the audience while tearing up the newspaper and seeing people who were paying attention and others that maybe had a few too many drinks, or they were distracted in some way and were talking to the person beside them." Who should he call on from the audience to take part in subsequent routines? The way the newspaper routine worked was that it told him the answer to that question and made the rest of his act work better.
I've opened my programming teaching for several years now with "The Robots Game". It seems to work very well for me, but I'm not entirely sure why. So, in this post I'll describe the game and its rules and then I'll speculate on how it might work as the opening routine for a programming course.
Equipment
This is a board game for six players so for each group of six students you will need:
The board, printed out on a large piece of card or foam-board:
The six "robot" playing pieces, numbered 1 to 6:
Printed move-sheets for the players to write directions, and pencils/pens:
You will also need (as you can see in some of the photos) a normal six-sided dice.
Rules
(1) This is a race game played by two teams. The winning team is the first to get all their "robots" from the start to the finish and off the board. (Note: the winner is the first team to get all their robots off the board, not just the first one.)
(2) Each team consists of three players and each player controls one specific robot. Each robot has an identifying number. As you can see from the photos, my robot playing pieces have the numbers on the top, with 1, 2, 3 in one colour for one team and 4, 5, 6 in another colour for the other team.
(3) The robots start facing into the board on the six starting squares on one end. (The pieces must have some way to show what direction they are facing. My robot playing pieces have arrows on the top, but in a previous version I used little Lego figures with numbers stuck to their backs.)
(4) Each player starts with their own blank move-sheet on which they will write directions for their robot. (After your students arrange themselves into two teams of three, get each student to write the number of their robot on their move-sheet, so there's no doubt as to which is which.)
(5) The game consists of a series of turns, continuing until one team has won. Each turn proceeds as follows:
(5 a) Every player writes on their move-sheet up to 10 moves for their robot. As you can see, the move-sheets I use have the numbers 1 to 10 down one edge, so the moves for each turn can be written vertically down one column. Valid moves are Forward (F), Back (B), Right (R) and Left (L). It's fine to not use all 10 moves. (It's even fine to not use any of your moves, in which case your robot will just stay in the same place.) Naturally, players on the same team will wish to coordinate their moves with each other, but hide their plans from the other team.
(5 b) When everyone has finished writing their moves, someone throws the dice. Depending on what number comes up, the robots execute their moves in sequence starting with that number. Thus, for example, if the dice shows 4, then all robot 4's moves will be executed first, followed by all robot 5's, then 6, then 1, then 2 and finally 3. (Thus when writing moves, the players cannot be sure exactly where the other robots will be, even on their own team.)
(5 c) To execute the moves for each robot, the player controlling that robot reads out the sequence of moves for that turn from their move-sheet. Someone else actually moves their robot (preferably someone from the other team). The reason for this is that people can make mistakes writing down their moves, and rather than giving them the opportunity to correct them, we want to ensure that the robot executes the moves that they actually wrote down.
(5 d) Each move is executed as follows: Forward attempts to move the robot forward one square in the direction it is facing. This succeeds if that square is an unoccupied green square or if the shunting rule applies (see 5 e). Otherwise the move fails. (For example if the square in front is a brick-wall.) Back attempts to move the robot one square backwards and succeeds or fails in the same way as for a forwards move. Right and Left turn the robot in place through 90-degrees clockwise or anti-clockwise, respectively. These moves always succeed.
(5 e) Shunting. If a robot attempts to move into a square occupied by another robot, this will succeed provided that (i) the moving robot has just crossed one empty square ("it has momentum") AND (ii) there is an empty green square on the other side of the stationary robot ("it has somewhere to go"). When a shunting move succeeds, the moving robot afterwards occupies the square where the stationary robot was, and the stationary robot is shunted along one square in the obvious way. (Note that part (i) means that you can only shunt another robot one square. Continuing to move in the same direction will fail. You can do two shunts in a turn, but you have to back up and take another run-up at the stationary robot.)
(5 f) All moves of a particular robot are executed in order, irrespective of whether its earlier moves that turn succeeded or failed. Thus a robot might use several moves trying and failing to go forwards ("spinning its wheels") then turn and succeed in going forwards in a different direction. (But probably not end up in the place its controller intended.)
(6) Exit squares and winning. The four squares with outward-facing arrows at the end of the board are exit squares. When a robot moves onto one of these squares it is immediately removed from play, and takes no further part in the game. Moves are made as normal each turn for the remaining robots, skipping over robots which have been removed from play. The first team to have all three robots out of play in this way is the winner.
Discussion
The game is quite fun and doesn't take very long to play — usually around a quarter of an hour or less. It's almost always quite close at the end, because of course it's a race between the last robot in each team. There's plenty of opportunity for delaying tactics and clever blocking moves near the exit by the team which is behind, provided they don't just individually run for the exit as fast as possible.
But turning back to the idea from James Randi, how does this game work? It seems from my experience to be doing something useful, but how does it really work as an opening routine for a programming class? Perhaps first of all, I think it lets me give the impression to the students that the rest of the class might be fun. Lots of students don't seem to like the idea of programming, so perhaps playing a team game like this at the start of the class surprises them into giving it a second chance.
I think also that there is an element of "sizing the audience up" — it's a way to see how the students interact with one another, to see who is retiring and who is bold, who is methodical and who is careless. The people who like clever tricks in the game seem often to be the people who like clever tricks in programming. There is also some evidence that facility with mental rotation is correlated with programming ability. (See Spatial ability and learning to program by Sue Jones and Gary Burnett in Human Technology, vol.4(1), May 2008, pp.47-61.) To the extent that this is true, I might be getting a hint about who will have trouble with programming from seeing who has trouble making their robot turn the correct direction.
I'm keen to point out to the students, after the games are over, that the game really has some relation to programming. I point out that they have been writing sequences of instructions, little programs for the robots, which then get blindly executed. I point out that although computers seem clever, they really have no more intelligence than the wooden playing pieces in the game. Computers just execute their moves billions of times faster, and that makes them seem smart.
I also ask if anyone found that their robot ever did something that they didn't want it to do. (Inevitably there are a few cases where people just wrote down the wrong move and their robot trundled off in an unexpected direction.) I point out that this is an ordinary thing in programming. It's going to happen again, it happens to the best of programmers, so don't be embarrassed. Just learn how to play the detective, to find and fix your inevitable mistakes. Computers, like the robots, don't do what we want. They do what we say.
(James Randi's explanation of his newspaper routine is from the the first Security and Human Behavior Workshop which took place in July 2008.)
(Added 27 July 2013.) I've had some requests for an image of the game board. Here is an 5673 x 4046 pixel image, which should print nicely at any reasonable size.
When stage magician James Randi was doing cabaret work, he had the habit of opening his act with a newspaper tearing-and-reassembly routine. With the music playing, he would stride out in front of the audience with a piece of newspaper folded under his arm. He would unfurl the newspaper and tear it up into pieces. Then he would shuffle the pile of pieces, give it a flip and it would all be back together again. The music would stop, he would throw the newspaper aside and address the audience, saying: "I don't know how that trick's done and I don't care."
Now the funny thing about this, as Randi explains, is that there was in fact something about the routine that he didn't know. The rest of the act went better when he started out this way, but for a long time he didn't know why. "Most magicians," says Randi, "don't know why their tricks work. They know they work because they've done them and done them repeatedly, and sometimes they do things on stage that they don't quite understand, but they know that they work."
What was Randi's opening routine really doing? "I started doing it and I found it worked very well," he says. "But I only realised years into it that I was sizing up the audience. I would be looking around the audience while tearing up the newspaper and seeing people who were paying attention and others that maybe had a few too many drinks, or they were distracted in some way and were talking to the person beside them." Who should he call on from the audience to take part in subsequent routines? The way the newspaper routine worked was that it told him the answer to that question and made the rest of his act work better.
I've opened my programming teaching for several years now with "The Robots Game". It seems to work very well for me, but I'm not entirely sure why. So, in this post I'll describe the game and its rules and then I'll speculate on how it might work as the opening routine for a programming course.
Equipment
This is a board game for six players so for each group of six students you will need:
The board, printed out on a large piece of card or foam-board:
The six "robot" playing pieces, numbered 1 to 6:
Printed move-sheets for the players to write directions, and pencils/pens:
You will also need (as you can see in some of the photos) a normal six-sided dice.
Rules
(1) This is a race game played by two teams. The winning team is the first to get all their "robots" from the start to the finish and off the board. (Note: the winner is the first team to get all their robots off the board, not just the first one.)
(2) Each team consists of three players and each player controls one specific robot. Each robot has an identifying number. As you can see from the photos, my robot playing pieces have the numbers on the top, with 1, 2, 3 in one colour for one team and 4, 5, 6 in another colour for the other team.
(3) The robots start facing into the board on the six starting squares on one end. (The pieces must have some way to show what direction they are facing. My robot playing pieces have arrows on the top, but in a previous version I used little Lego figures with numbers stuck to their backs.)
(4) Each player starts with their own blank move-sheet on which they will write directions for their robot. (After your students arrange themselves into two teams of three, get each student to write the number of their robot on their move-sheet, so there's no doubt as to which is which.)
(5) The game consists of a series of turns, continuing until one team has won. Each turn proceeds as follows:
(5 a) Every player writes on their move-sheet up to 10 moves for their robot. As you can see, the move-sheets I use have the numbers 1 to 10 down one edge, so the moves for each turn can be written vertically down one column. Valid moves are Forward (F), Back (B), Right (R) and Left (L). It's fine to not use all 10 moves. (It's even fine to not use any of your moves, in which case your robot will just stay in the same place.) Naturally, players on the same team will wish to coordinate their moves with each other, but hide their plans from the other team.
(5 b) When everyone has finished writing their moves, someone throws the dice. Depending on what number comes up, the robots execute their moves in sequence starting with that number. Thus, for example, if the dice shows 4, then all robot 4's moves will be executed first, followed by all robot 5's, then 6, then 1, then 2 and finally 3. (Thus when writing moves, the players cannot be sure exactly where the other robots will be, even on their own team.)
(5 c) To execute the moves for each robot, the player controlling that robot reads out the sequence of moves for that turn from their move-sheet. Someone else actually moves their robot (preferably someone from the other team). The reason for this is that people can make mistakes writing down their moves, and rather than giving them the opportunity to correct them, we want to ensure that the robot executes the moves that they actually wrote down.
(5 d) Each move is executed as follows: Forward attempts to move the robot forward one square in the direction it is facing. This succeeds if that square is an unoccupied green square or if the shunting rule applies (see 5 e). Otherwise the move fails. (For example if the square in front is a brick-wall.) Back attempts to move the robot one square backwards and succeeds or fails in the same way as for a forwards move. Right and Left turn the robot in place through 90-degrees clockwise or anti-clockwise, respectively. These moves always succeed.
(5 e) Shunting. If a robot attempts to move into a square occupied by another robot, this will succeed provided that (i) the moving robot has just crossed one empty square ("it has momentum") AND (ii) there is an empty green square on the other side of the stationary robot ("it has somewhere to go"). When a shunting move succeeds, the moving robot afterwards occupies the square where the stationary robot was, and the stationary robot is shunted along one square in the obvious way. (Note that part (i) means that you can only shunt another robot one square. Continuing to move in the same direction will fail. You can do two shunts in a turn, but you have to back up and take another run-up at the stationary robot.)
(5 f) All moves of a particular robot are executed in order, irrespective of whether its earlier moves that turn succeeded or failed. Thus a robot might use several moves trying and failing to go forwards ("spinning its wheels") then turn and succeed in going forwards in a different direction. (But probably not end up in the place its controller intended.)
(6) Exit squares and winning. The four squares with outward-facing arrows at the end of the board are exit squares. When a robot moves onto one of these squares it is immediately removed from play, and takes no further part in the game. Moves are made as normal each turn for the remaining robots, skipping over robots which have been removed from play. The first team to have all three robots out of play in this way is the winner.
Discussion
The game is quite fun and doesn't take very long to play — usually around a quarter of an hour or less. It's almost always quite close at the end, because of course it's a race between the last robot in each team. There's plenty of opportunity for delaying tactics and clever blocking moves near the exit by the team which is behind, provided they don't just individually run for the exit as fast as possible.
But turning back to the idea from James Randi, how does this game work? It seems from my experience to be doing something useful, but how does it really work as an opening routine for a programming class? Perhaps first of all, I think it lets me give the impression to the students that the rest of the class might be fun. Lots of students don't seem to like the idea of programming, so perhaps playing a team game like this at the start of the class surprises them into giving it a second chance.
I think also that there is an element of "sizing the audience up" — it's a way to see how the students interact with one another, to see who is retiring and who is bold, who is methodical and who is careless. The people who like clever tricks in the game seem often to be the people who like clever tricks in programming. There is also some evidence that facility with mental rotation is correlated with programming ability. (See Spatial ability and learning to program by Sue Jones and Gary Burnett in Human Technology, vol.4(1), May 2008, pp.47-61.) To the extent that this is true, I might be getting a hint about who will have trouble with programming from seeing who has trouble making their robot turn the correct direction.
I'm keen to point out to the students, after the games are over, that the game really has some relation to programming. I point out that they have been writing sequences of instructions, little programs for the robots, which then get blindly executed. I point out that although computers seem clever, they really have no more intelligence than the wooden playing pieces in the game. Computers just execute their moves billions of times faster, and that makes them seem smart.
I also ask if anyone found that their robot ever did something that they didn't want it to do. (Inevitably there are a few cases where people just wrote down the wrong move and their robot trundled off in an unexpected direction.) I point out that this is an ordinary thing in programming. It's going to happen again, it happens to the best of programmers, so don't be embarrassed. Just learn how to play the detective, to find and fix your inevitable mistakes. Computers, like the robots, don't do what we want. They do what we say.
(James Randi's explanation of his newspaper routine is from the the first Security and Human Behavior Workshop which took place in July 2008.)
(Added 27 July 2013.) I've had some requests for an image of the game board. Here is an 5673 x 4046 pixel image, which should print nicely at any reasonable size.
Sunday, 2 June 2013
Recipe: Vinegar and Pepper Soup with Sliced Fish
This sounds a bit bizarre, but it's really nice — though you probably
want to eat it one little bowl at a time, interspersed with other
dishes in a Chinese meal, rather than all by itself. The recipe is
originally from Kenneth Lo.
Ingredients
Method
Cut the fish into thin slices and rub with salt and cornflour. Then wet with the egg-white and leave for 30 minutes.
Chop the spring onion into 5mm pieces.
Put the stock and the root ginger slices in a pan and bring to the boil. Simmer for 5 minutes.
Add the fish slices, chopped spring onion, white pepper, soy sauce and vinegar. Simmer for another 5 minutes.
Discussion
I suppose that being a fish soup, we should really use a fish stock, but I never have. I would use the brown chicken stock that I described here. Since this is a clear soup you probably want to filter the stock through a sieve lined with wet muslin to make it look more respectable. (If you are really keen, you might want to try doing this from frozen, leaving it to drip slowly into a bowl in the fridge. The result is better but it takes a long time: a day or two.)
Ingredients
250g | Fish fillet (cod is good, any white fish is ok) |
1t | Salt |
1T | Cornflour |
1 | Egg white |
3 | Spring onions |
1 litre | Stock |
3 slices | Root ginger (skin removed, about 2mm thick) |
0.5t | Ground white pepper |
4t | Soy sauce (light) |
5T | Vinegar (white wine vinegar works well) |
(1T = one tablespoon = 15ml; 1t = one teaspoon = 5ml) |
Method
Cut the fish into thin slices and rub with salt and cornflour. Then wet with the egg-white and leave for 30 minutes.
Chop the spring onion into 5mm pieces.
Put the stock and the root ginger slices in a pan and bring to the boil. Simmer for 5 minutes.
Add the fish slices, chopped spring onion, white pepper, soy sauce and vinegar. Simmer for another 5 minutes.
Discussion
I suppose that being a fish soup, we should really use a fish stock, but I never have. I would use the brown chicken stock that I described here. Since this is a clear soup you probably want to filter the stock through a sieve lined with wet muslin to make it look more respectable. (If you are really keen, you might want to try doing this from frozen, leaving it to drip slowly into a bowl in the fridge. The result is better but it takes a long time: a day or two.)
Sunday, 26 May 2013
Agile is Antifragile
Earlier this year I read the excellent
book Antifragile
by Nassim Nicholas Taleb. He's a bit full-of-himself and has a
somewhat quirky writing style, but the ideas in the book are so good
that this is easy to forgive. (And in fact, when I had finished the
book, I turned around and read it through again, which I never
do, so that must say something about the content.)
I've been thinking about the connection between Taleb's ideas and software development for some time — something like five years in the case of my recent essay Bugs with Long Tails — but his previous books don't make it quite so clear how to put into practice the maxim that one should "invest in preparedness, not prediction". In the face of uncertainty, what exactly does it mean to be well prepared?
Taleb has invented the new term "antifragile" as a label for those things which are not merely robust to uncertainty, but actually relish it and benefit from the uncertainty. For example, a tourist who follows a schedule is fragile, because if the bus doesn't arrive it's a disaster. By contrast a "flâneur" who makes no decisions ahead of time doesn't care. If the bus doesn't arrive, it's an opportunity to see things and do things that would never have been on any schedule. Better than robust, they are antifragile: when the unexpected happens, it's an advantage.
Antifragile cherishes "optionality": the ability to do something if it benefits you without the obligation to do it when it doesn't benefit you. You don't need to predict what will happen in the future, provided that you have placed your bets so that you incur a small fixed loss but in return you have the chance to make a large gain. (And because you have an option, not an obligation, if it goes wrong you get to walk away. You never suffer a large loss.) When losses are certain but small and gains uncertain but large, a trial-and-error approach will in the long term bring more than survival: it will bring success.
The problem with this approach is that it takes a kind of moral courage. People hunger for apparent predictability, for a reassuring chain of small gains even at the risk of a huge loss that wipes them out. Day-to-day it's harder to endure the drip-drip of the small losses incurred by the antifragile approach, even when you consciously recognise the long-term benefits. (See Thinking, Fast and Slow by Daniel Kahneman for an explanation of this curious effect.)
In business, the people responsible for each part of a firm will tend to "play it safe": they will try to ensure that they usually do make a small gain, even if they run the risk of a large loss. But from the point of view of the whole business this is a poor choice: it would be much better if the people in charge of each part accepted small losses, provided that in return they had a fair chance of a large gain and no chance of a large loss. They usually don't make this choice, because they don't want to be blamed for the small failure if their bet doesn't pay off. Since they avoid blame, they get to keep their jobs for now, but maybe not for long, because the business as a whole becomes fragile.
A business which is fragile has no choice but to try to predict the future. But in many cases, predicting the future or even just estimating the probability of future events is much harder than most people think. And as I have argued in Bugs with Long Tails, there is evidence that software development is one of those cases where prediction is too hard. We can't do it. And if we can't do it, we shouldn't try. We should "invest in preparedness, not prediction". We should be antifragile. What does that mean in software? It looks a lot like what people usually call "agile".
I'll give just a few examples here, because I'm sure you can fill in more for yourself:
Another relevant example of antifragilility can be found in the architectural work of Christopher Alexander. (Though it's curious that Taleb doesn't mention Alexander, especially since he does mention How Buildings Learn by Stewart Brand.) It's well known that in the 1990s, the same people in the object-oriented programming community who came up with the idea of agile development also worked on "patterns", inspired by Alexander's books The Timeless Way of Building and A Pattern Language: Towns, Buildings, Construction.
Less well-known in the software development world is that Alexander became dissatisfied with patterns because they did not reliably generate the kind of "living structure" of buildings that he describes in The Timeless Way of Building: buildings that have the "Quality Without A Name". This was a problem, but a problem with a happy ending, because he describes the solution in his massive follow-on work The Nature of Order. (Over 2000 pages in four volumes: Book 1, Book 2, Book 3, Book 4.)
Alexander has even more to say in this magnum-opus than Taleb in Antifragile, but the key point for us here is the idea of a "generative sequence". The patterns are useful, but to do their work properly they must be used in a disciplined way, in a specific generative sequence where nothing has to be predicted ahead of time, but decisions only made based on what is visible right now, in the situation at hand. For example, Alexander gives a 24-step generative sequence for building a Japanese tea house (vol. 2, p303-304) starting with a secluded garden. Within this sequence, step 12 is to lay out a particular path, step 13 to choose a stone water basin and place it on this path, and so on. The trick is that having made a decision at one step it is never necessary to go back and revisit an earlier decision, nor is it necessary to consider a later decision ahead of time. There is no planning.
It's surprising that this approach works at all. Why should it be possible to find generative sequences which allow things to be designed step by step and never go back to revisit an earlier decision? And yet it does seem to be possible to find these sequences, and they do seem to be able construct things which have the "Quality Without A Name". Expert knowledge therefore encompasses not just the patterns, but the generative sequences that make them work and the principles used to discover generative sequences. (And most of The Nature of Order is about these principles.) As Alexander notes:
Generated structures don't avoid mistakes — their construction involves many mistakes. But these are cheap mistakes, easy to notice and quick to correct. Always corrected before going on to the next step. Planned structures probably involve the same number of mistakes, but a plan doesn't give you any feedback, so every mistake persists and is built into real-life.
We borrowed the idea of patterns from Alexander. We could frame agile development as another borrowing, an application of Alexander's idea of generated structure in the world of software development. We still seem quite a long way from having reliable generative sequences in software, apart from the tacit craft-knowledge embodied in individuals and small groups, but maybe in the future we can do better. Maybe we could see the agile "movement" as a search for such generative sequences. However, I think an even better framing would be to see both agile development and generative sequences as instances of antifragility. This perspective emphasises what gives them both their power: avoiding plans and prediction when we can't know enough to predict correctly.
(I've noticed that I'm not the only one to have seen a link between Taleb's ideas and agile development: see Agile=Antifragile. If you'd like to see Taleb talking about his ideas, you might like to watch this video filmed at the New York Public Library. You can also find much better account of how Alexander's ideas relate to software in Part One of Patterns of Software by Richard Gabriel.)
I've been thinking about the connection between Taleb's ideas and software development for some time — something like five years in the case of my recent essay Bugs with Long Tails — but his previous books don't make it quite so clear how to put into practice the maxim that one should "invest in preparedness, not prediction". In the face of uncertainty, what exactly does it mean to be well prepared?
Taleb has invented the new term "antifragile" as a label for those things which are not merely robust to uncertainty, but actually relish it and benefit from the uncertainty. For example, a tourist who follows a schedule is fragile, because if the bus doesn't arrive it's a disaster. By contrast a "flâneur" who makes no decisions ahead of time doesn't care. If the bus doesn't arrive, it's an opportunity to see things and do things that would never have been on any schedule. Better than robust, they are antifragile: when the unexpected happens, it's an advantage.
Antifragile cherishes "optionality": the ability to do something if it benefits you without the obligation to do it when it doesn't benefit you. You don't need to predict what will happen in the future, provided that you have placed your bets so that you incur a small fixed loss but in return you have the chance to make a large gain. (And because you have an option, not an obligation, if it goes wrong you get to walk away. You never suffer a large loss.) When losses are certain but small and gains uncertain but large, a trial-and-error approach will in the long term bring more than survival: it will bring success.
The problem with this approach is that it takes a kind of moral courage. People hunger for apparent predictability, for a reassuring chain of small gains even at the risk of a huge loss that wipes them out. Day-to-day it's harder to endure the drip-drip of the small losses incurred by the antifragile approach, even when you consciously recognise the long-term benefits. (See Thinking, Fast and Slow by Daniel Kahneman for an explanation of this curious effect.)
In business, the people responsible for each part of a firm will tend to "play it safe": they will try to ensure that they usually do make a small gain, even if they run the risk of a large loss. But from the point of view of the whole business this is a poor choice: it would be much better if the people in charge of each part accepted small losses, provided that in return they had a fair chance of a large gain and no chance of a large loss. They usually don't make this choice, because they don't want to be blamed for the small failure if their bet doesn't pay off. Since they avoid blame, they get to keep their jobs for now, but maybe not for long, because the business as a whole becomes fragile.
A business which is fragile has no choice but to try to predict the future. But in many cases, predicting the future or even just estimating the probability of future events is much harder than most people think. And as I have argued in Bugs with Long Tails, there is evidence that software development is one of those cases where prediction is too hard. We can't do it. And if we can't do it, we shouldn't try. We should "invest in preparedness, not prediction". We should be antifragile. What does that mean in software? It looks a lot like what people usually call "agile".
I'll give just a few examples here, because I'm sure you can fill in more for yourself:
- Don't try to guess what the customer/product-owner wants. Instead of trying to predict what you can't predict, insist that they are always available to say what they want. (Or admit that they don't yet know, in which case you can help them find out.)
- Big projects are known to have worse cost over-runs than small projects and this is because they rely so much on prediction. So agile development splits big projects into short iterations/sprints a few weeks in length, each of which delivers some finished product, worthwhile on its own. A few of these iterations may fail, but that's fine. Learn your lesson and move on: small, isolated losses; large overall gains. To paraphrase Kent Beck: "The purpose of planning is not to predict the future, but to know that you are working on the most important thing right now."
- In Extreme Programming there are the principles "You Aren't Going To Need It" and "Do The Simplest Thing That Could Possibly Work". The least productive development work is building code that is never needed. But we don't have to predict what code we need right now, because we already know. For code that we don't need right now, we have to predict what people in the future will need our code to do, how efficient it must be in time and space, and lots of other details that we currently know nothing about. Very often we are wrong in these predictions. Better to make a bet that we will never need that code and accept the small loss of changing the code later if it proves inadequate. The big payoff if we win the bet is less code and less overall work.
Another relevant example of antifragilility can be found in the architectural work of Christopher Alexander. (Though it's curious that Taleb doesn't mention Alexander, especially since he does mention How Buildings Learn by Stewart Brand.) It's well known that in the 1990s, the same people in the object-oriented programming community who came up with the idea of agile development also worked on "patterns", inspired by Alexander's books The Timeless Way of Building and A Pattern Language: Towns, Buildings, Construction.
Less well-known in the software development world is that Alexander became dissatisfied with patterns because they did not reliably generate the kind of "living structure" of buildings that he describes in The Timeless Way of Building: buildings that have the "Quality Without A Name". This was a problem, but a problem with a happy ending, because he describes the solution in his massive follow-on work The Nature of Order. (Over 2000 pages in four volumes: Book 1, Book 2, Book 3, Book 4.)
Alexander has even more to say in this magnum-opus than Taleb in Antifragile, but the key point for us here is the idea of a "generative sequence". The patterns are useful, but to do their work properly they must be used in a disciplined way, in a specific generative sequence where nothing has to be predicted ahead of time, but decisions only made based on what is visible right now, in the situation at hand. For example, Alexander gives a 24-step generative sequence for building a Japanese tea house (vol. 2, p303-304) starting with a secluded garden. Within this sequence, step 12 is to lay out a particular path, step 13 to choose a stone water basin and place it on this path, and so on. The trick is that having made a decision at one step it is never necessary to go back and revisit an earlier decision, nor is it necessary to consider a later decision ahead of time. There is no planning.
It's surprising that this approach works at all. Why should it be possible to find generative sequences which allow things to be designed step by step and never go back to revisit an earlier decision? And yet it does seem to be possible to find these sequences, and they do seem to be able construct things which have the "Quality Without A Name". Expert knowledge therefore encompasses not just the patterns, but the generative sequences that make them work and the principles used to discover generative sequences. (And most of The Nature of Order is about these principles.) As Alexander notes:
"The significance of generated structure lies in the concept of
mistakes. Fabricated plans always have mistakes — not
just a few mistakes, but tens of thousands, even millions of
mistakes. It is the mistake-ridden character of the plans which marks
them as fabricated — and that comes from the way they are
actually generated, or made, in time. Generated plans have few
mistakes." (vol. 2, p186)
Generated structures don't avoid mistakes — their construction involves many mistakes. But these are cheap mistakes, easy to notice and quick to correct. Always corrected before going on to the next step. Planned structures probably involve the same number of mistakes, but a plan doesn't give you any feedback, so every mistake persists and is built into real-life.
We borrowed the idea of patterns from Alexander. We could frame agile development as another borrowing, an application of Alexander's idea of generated structure in the world of software development. We still seem quite a long way from having reliable generative sequences in software, apart from the tacit craft-knowledge embodied in individuals and small groups, but maybe in the future we can do better. Maybe we could see the agile "movement" as a search for such generative sequences. However, I think an even better framing would be to see both agile development and generative sequences as instances of antifragility. This perspective emphasises what gives them both their power: avoiding plans and prediction when we can't know enough to predict correctly.
(I've noticed that I'm not the only one to have seen a link between Taleb's ideas and agile development: see Agile=Antifragile. If you'd like to see Taleb talking about his ideas, you might like to watch this video filmed at the New York Public Library. You can also find much better account of how Alexander's ideas relate to software in Part One of Patterns of Software by Richard Gabriel.)
Sunday, 19 May 2013
Recipe: Lamb kebabs
This is really two recipes: the lamb kebabs themselves and a
yoghurt-and-cucumber thing to go with them.
Ingredients
Method
Blend all the ingredients except the mince together, then thoroughly mix this into the mince and set aside in the fridge (for at least 2 hours, but all day if possible).
Split the mixture into 6 equal pieces and form each piece into a sausage shape on a kebab skewer. Grill for 10 minutes, then turn over and grill for another 10 minutes. (Of course, in summer you can cook them on the barbecue.)
Yoghurt and Cucumber
This goes very well with these kebabs. It's a bit like tsatsiki.
Ingredients
Method
Combine all the ingredients and mix well. Chill in the fridge for a couple of hours. Serve.
Ingredients
500g | Minced lamb |
1 | Egg |
1t | Ground coriander |
1t | Ground cumin |
1T | Paprika |
0.5t | Cayenne pepper |
0.25t | Salt |
pinch | Ground pepper |
(1T = one tablespoon = 15ml; 1t = one teaspoon = 5ml) |
Method
Blend all the ingredients except the mince together, then thoroughly mix this into the mince and set aside in the fridge (for at least 2 hours, but all day if possible).
Split the mixture into 6 equal pieces and form each piece into a sausage shape on a kebab skewer. Grill for 10 minutes, then turn over and grill for another 10 minutes. (Of course, in summer you can cook them on the barbecue.)
Yoghurt and Cucumber
This goes very well with these kebabs. It's a bit like tsatsiki.
Ingredients
500g | Greek yoghurt |
1 | Cucumber: peeled, de-seeded and chopped |
1T | Fresh coriander, chopped finely |
1T | Fresh mint, chopped finely |
1T | White wine vinegar |
0.25t | Salt |
pinch | Ground pepper |
(1T = one tablespoon = 15ml; 1t = one teaspoon = 5ml) |
Method
Combine all the ingredients and mix well. Chill in the fridge for a couple of hours. Serve.
Sunday, 12 May 2013
A Day at elBulli
I'm looking at my copy of A Day at elBulli, a very thick book by Ferran Adrià which gives a minute-by-minute account of a typical day at the restaurant elBulli. I don't suppose that I get this book down from the shelf more than once a year, but I think more than any other book it was this one which convinced me that software development has a lot more in common with cooking than with architecture.
elBulli was a rather unusual restaurant. (It closed in 2011.) It opened for only 160 days a year, summer through to winter, serving one sitting of 50 guests each day. The kitchen staff of about 40 prepared a fixed "tasting menu" of around 30 miniature dishes for each guest, served by a front-of-house staff of about 25. (Although tiny, each dish generally took more effort to prepare than most main courses in most restaurants. Only 50 customers a day sounds easy. Think of 1500 complex main-courses a day and you have a better idea how much effort was involved.)
They served around 8000 guests a year, but they got many, many more requests for reservations. Around 2 million. And yet they only charged around €200 per head. Clearly they could have charged much more; clearly the main purpose was not to make money, but rather to research new recipes and techniques. It was a research restaurant. Who would have thought such a thing was possible?
When I first read the minute-by-minute descriptions in the book, I was reminded somehow of happy experiences working in an Extreme Programming development team. Everyone doing hands-on work to some extent, everyone conscious of working together to make something, always working with one eye on the clock. I was also struck by the discipline, similar to XP:
- Simple, direct organisation, with stand-up meetings at the start and end of the day.
- Clear up as you go. Even though it's late and everyone is tired, everything is left neat and tidy at the end of the day, ready for a fresh start tomorrow.
- Process, timing and organisation are everything. But the physical artifacts used to track progress are very simple: order-sheets, plans for mise-en-place, shopping lists.
What lessons can software developers learn by looking at a restaurant like this? First maybe that they both have customers, and the customer's experience in both cases has very little to do with the essential nature of producing that experience. What does the customer see? Only the front-of-house.
There are front-of-house staff and kitchen staff. Though the kitchen is the heart (without the kitchen there is nothing) most guests don't see it, and know almost nothing about the production of their food. This certainly has echoes in software development. The customer is oblivious to what goes on behind the scenes, and the effort to make it work. But strangely, in software the front-of-house staff often set the terms for the kitchen staff despite having little hands-on knowledge about the production processes.
Another lesson is that process is key. It's all about process. But not the kind of stupid "big process" checklist in some binder somewhere that everyone ignores. No, the key is the precise minute-by-minute, hour-by-hour knowledge of how to actually do things the best way. Sometimes this is written down, but often not.
At elBulli, most kitchen staff did "process execution". A few, particularly Ferran Adrià himself, focused on process creation, figuring out what dishes to make and how best to make them. Every day inventing and perfecting processes, refining recipes and the techniques for making existing dishes. Experimenting with new ones. This is an obsession with process: hundreds of trials, failures, tweaks and variations to perfect the process to make each dish. And from the point of view of the customer, this long effort and many failures was completely hidden, though the final dishes could not be served without it.
The meta-idea here is to have complete control of process, a process even for creating processes. Now with software development, for all our claimed enthusiasm for process, are we really in the same league as these guys? I don't think so. We just don't see this kind of perfectionism very often, and when we do see it (for example in Extreme Programming) it is thought of as something odd and bizarre. Not like elBulli, a magnificent example of the art, which others admire, respect, and aspire to emulate.
Have a look at A Day at elBulli. It's worth considering: what if we applied that kind of attitude to software?
Sunday, 5 May 2013
Recipe: Red-cooked Pork
This recipe was originally based on one from Cheap Chow: Chinese
Cooking on Next to Nothing by Kenneth Lo, but it's diverged a
little bit since then. The result is quite similar and as Kenneth Lo
says: "The combination of meat, fat, skin and gravy is probably about
the most savoury and appetizing thing in the whole realm of Chinese
culinary creation".
Ingredients
Method
Preheat oven to 150°C (= gas mark 2 = 300°F).
Heat the oil in a wok or frying pan, and brown the belly pork slices in batches, a few at a time. Place them in a large casserole.
Mix all the other ingredients and pour this sauce into the casserole too. Arrange the belly pork slices so that they are packed closely together. Cook in the oven for about 4 hours, turning the pieces over every hour, so that they cook evenly.
Discussion
I think everyone has a slightly different recipe for red-cooked pork. The Kenneth Lo recipe in Cheap Chow (p48) takes the meat in a single piece, rather than sliced, and cooks it for 1 hour with only a quarter of the sauce in the above recipe, then another 1 hour 30 minutes after adding another quarter of the sauce (and no five-spice powder). Fucia Dunlop in Revolutionary Chinese Cookbook (p78) uses fresh ginger, star anise, chilli and cassia bark for flavour, and simmers the dish in a wok for 50 minutes rather than in the oven. (Which leaves it a bit chewy for my taste.)
Ingredients
1.5kg | Belly pork slices (each about 1cm thick) |
6T | Sunflower oil |
5t | Light soy sauce |
180ml | Medium-dry sherry |
180ml | Water |
1t | Five-spice powder |
2t | Sugar |
2 | Star anise |
(1T = one tablespoon = 15ml; 1t = one teaspoon = 5ml) |
Method
Preheat oven to 150°C (= gas mark 2 = 300°F).
Heat the oil in a wok or frying pan, and brown the belly pork slices in batches, a few at a time. Place them in a large casserole.
Mix all the other ingredients and pour this sauce into the casserole too. Arrange the belly pork slices so that they are packed closely together. Cook in the oven for about 4 hours, turning the pieces over every hour, so that they cook evenly.
Discussion
I think everyone has a slightly different recipe for red-cooked pork. The Kenneth Lo recipe in Cheap Chow (p48) takes the meat in a single piece, rather than sliced, and cooks it for 1 hour with only a quarter of the sauce in the above recipe, then another 1 hour 30 minutes after adding another quarter of the sauce (and no five-spice powder). Fucia Dunlop in Revolutionary Chinese Cookbook (p78) uses fresh ginger, star anise, chilli and cassia bark for flavour, and simmers the dish in a wok for 50 minutes rather than in the oven. (Which leaves it a bit chewy for my taste.)
Sunday, 28 April 2013
Completely Reliable Conclusions
(This post is another follow-up to
Learning
to Program, which in turn was a kind of follow-up to
Not
a Tool, but a Philosophy of Knowledge.)
I've been reading A Revolution in Mathematics? What Really Happened a Century Ago and Why It Matters Today (Notices of the AMS, vol.59 no.1, Jan 2012, pp31-37) and a stack of other papers by mathematician Frank Quinn. He describes how, since the early twentieth century, professional mathematicians have worked in a particular way, the way they found most effective. Shockingly, Quinn points out that this is not how students are taught to do mathematics (in the USA) at any level up to and including a large part of undergraduate material. Instead, because of a misunderstanding by maths educators, students are taught outdated and inferior methods from the nineteenth century. This difference, although it seems both obvious and remarkable once pointed out, has remained unnoticed by mathematicians and educators until very recently.
Quinn goes on to argue that this unnoticed difference could to a large extent explain the poor and declining results of mathematics education in the USA. (And to the extent that other countries use the same poor methods, we should expect the same poor results elsewhere too.) The problem has become worse in recent years, says Quinn, largely because those in control of maths education believe that their methods are just fine, and believe that success will come with more vigorous application of the same poor methods. So rather than change these methods, the authorities have successfully championed "more of the same", leading to a "death spiral" from which escape appears uncertain.
I find Frank Quinn's ideas very convincing. I've done enough maths tuition that from personal experience I recognise some of the concrete problems of student understanding that he describes. It's rather frightening to see these problems explained as merely symptoms of a larger systemic problem. I think Quinn's ideas are crucial for the future of maths teaching, but I believe there are lessons here for us too, as we consider how best to teach programming. It's possible that we too have been going about things the wrong way, misunderstanding the fundamentals, and that the problems we face in teaching programming are exactly the same problems that Quinn describes. If so, the good news is that we can immediately draw on Quinn's experience and suggestions of how to do it better. I'll return to this thought later, but first let's examine more closely what he says about the difference between nineteenth and twentieth century methods in mathematics.
Up until the late nineteenth century, a mathematical proof depended to a large extent on physical intuitions and on understanding what a mathematical model "really" meant:
The great revolution in mathematics, which took place from about 1890 to 1930, was to replace intuitive concepts and intuitive proofs with systems of explicit rules, like the rules of a game, without worrying about what they "really" meant. To some people, including philosophers and elite mathematicians with excellent intuition, this seemed like a loss, but it brought with it an amazing benefit: arguments based on consistent application of these rules led to completely reliable conclusions. This meant that mathematicians could build a much more complex edifice of rules and proofs than would ever have been possible in the nineteenth century, safe in the confidence that however complex, it was all still correct. The new methods also opened up mathematics research to ordinary mathematicians, not just to super-stars with extraordinary intuition. Of course, constructing the required proofs still required imagination and experience, but there was now a systematic way to proceed when you got stuck:
As programmers, we can recognise that debugging a proof under the twentieth century mathematical regime is very much like debugging a program: "failures are localized" and so, with disciplined reasoning, "they can usually be identified and fixed". Twentieth century methods are "error-displaying", in the sense that if a mathematical argument produces a false conclusion, then it will be possible to find the error, because the error will be in the mis-application of some rule. Mistakes happen; potential proofs usually have errors, in the same way that programs, when first written, usually have bugs. But if the steps of a proof are set out in precise detail (similar to the precise detail you need in a computer program) then you will always be able to find exactly where a rule was broken. This in turn will often suggest an alternative approach or a fix that will mend the proof.
(And of course, another link with computing is that it's one thing to define a system of rules, it's another thing to be sure of applying them completely reliably, with no mistakes. How can you construct social and physical systems which guarantee this, given only fallible humans and unreliable physical artifacts? Nothing in the real world is perfect. What custom and technique could in practice guarantee complete reliability? Mathematicians have worked out their own answers to that question, but that question is exactly the concern of computer science! That question calls forth all of computer science from low-level machine design, through compilers, operating systems, programming and interaction design, to psychology and organisation theory. It is scarcely co-incidence that the mathematicians John von Neumann and Alan Turing were computer pioneers. They wanted to see their method embodied in machinery.)
So, standard practice in modern "core" mathematics revolves around systems of formal rules and the completely reliable conclusions that people can draw from them. What about mathematics education? In what sense is it still using nineteenth century methods and why exactly is that bad?
With nineteenth century methods intuition is key, and mathematics education has concentrated on intuitive understanding first and skill at applying formal rules second (or never). One problem with this is that correct intuition in mathematics comes from working with rules and internalising them. Trying to hand students an intuition first is very dangerous, because they will often often get the wrong intuition, or just be confused. And confusion is perhaps the better outcome, because intuitions once gained can be very hard to change. Quinn cites Do naïve theories ever go away? by Dunbar et al. (in Thinking with Data: 33rd Carnegie Symposium on Cognition, Ed. Lovett and Shah, 2007). The problem is that subsequent learning doesn't seem to actually correct an earlier misunderstanding, it only modifies the misunderstanding: "even when conceptual change appears to have taken place, students still have access to the old naïve theories and ... these theories appear to be actively inhibited rather than reorganized and absorbed into the new theory" (Dunbar et al., 2007, p.202). A bad intuition is dangerous because it is so sticky and difficult to change.
In the USA, with its popular "reform math" curriculum, this emphasis on "understanding" has gone hand-in-hand with a drastic decline in the practical abilities demanded of students at every level. Expertise in disciplined reasoning, with careful step-by-step application of formal rules, is seen by the nineteenth century educational authorities as at best a secondary concern. Since it is unimportant, the authorities think it can be dropped with no loss. But this is a mistake which has dire consequences. Quinn comments that, as a university maths lecturer, he is responsible for developing the math skills needed by engineering students. However:
Since rigorous thinking is not emphasised by the educational authorities, it is scarcely surprising that students have internalised the principle that precision is not important. Quinn gives an example of how this typically plays out on the ground:
What, you might ask, has this got to do with learning to program computers? Well, first of all, these are the same people we are trying to teach how to program! Quinn's example is eerily similar to the experiences I've had with some programming students, people who seem unable to comprehend that complete precision is necessary. There is such a gap of understanding that in many cases it seems impossible to cross it. You can think that you have communicated, but then you look at what they are doing later and you see that they didn't get it at all. They are not working precisely to find their error. They are still just superstitiously shuffling the symbols in their program, hoping that they will hit the jackpot and their program will appear to work.
What can we do? Maybe, if Quinn is right, not much. Maybe for them the race is lost. Maybe we would have had to start a decade or more earlier, with very different maths teaching in school. And so, if we really think "everyone should learn to code", maybe that's where we should start today, before it's too late.
Secondly, in programming education we may by chance have thrown the baby out with the bathwater, just as the reform maths educators did with calculators in schools. In the old days, before calculators, children had to learn to do long multiplication, for example 431 × 27, using pencil and paper. This was a bit of a chore, now largely dropped in favour of tapping the problem into a calculator, which gets the answer more reliably. However, it turns out that the children were learning a lot more by accident than how to multiply long numbers. They were learning to set out their working in an error-displaying form, essentially a twentieth century proof that their answer was correct. They had to be absolutely precise in their working, but if they made a mistake, they could check their working, find the mistake and correct it. Their teachers could periodically take in this work and confirm that their pupils were working accurately and in a proper error-displaying way. Not only all that, but children were intuitively learning something about the mathematical structure of numbers by working with them, so that when they came to polynomials they found that working out (4x² + 3x + 1) × (2x + 7) was not much different to working out 431 × 27. (In fact it's a bit simpler, because there are fewer carries.) To someone with a calculator, they are entirely different.
I wonder if, in the way we try to teach programming nowadays, we may have fallen into some similar traps, by not realising what students accidentally learned in the past. For example — and here's a heretical thought — are languages with GOTO really as bad as we imagine for novice programmers? Dijkstra claimed that "It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration". But don't forget this was just a claim: where's the evidence? Dijkstra himself obviously first learned to program in a language with only GOTOs, as I did, and I'm fairly happy that it did us no lasting damage. In fact I think it forced us to think about our code and to work with it in a particular detailed way, and this earlier practice may have served us well later, even when we programed in structured languages.
The question of whether structured languages are better later, for expert coding, is not the point. Clearly for doing sums, using a calculator is better later than using pencil and paper. But for the reasons outlined above, it's not better earlier. The question in either case should not be just "can students do the old problems more reliably?". Of course students with calculators get the answer more reliably than students without, but they don't learn those other early skills which support future learning. Could this, perhaps, be the case with us too? (Although this view is far from mainstream, I am relieved that I'm not the only person to suggest it. See Where Dijkstra went wrong: the value of BASIC as a first programming language.)
And finally, whether we are computer scientists promoting "computational thinking" or core mathematicians promoting twentieth century methods, it strikes me that we actually want the same thing. Perhaps we should make common cause. The easiest way to save core mathematics might be through computer science. After all, since the mathematicians did us the favour of founding our discipline, the least we could do in return would be to help them save theirs.
(There's lots more interesting stuff on Frank Quinn's education webpage. Well worth a read.)
I've been reading A Revolution in Mathematics? What Really Happened a Century Ago and Why It Matters Today (Notices of the AMS, vol.59 no.1, Jan 2012, pp31-37) and a stack of other papers by mathematician Frank Quinn. He describes how, since the early twentieth century, professional mathematicians have worked in a particular way, the way they found most effective. Shockingly, Quinn points out that this is not how students are taught to do mathematics (in the USA) at any level up to and including a large part of undergraduate material. Instead, because of a misunderstanding by maths educators, students are taught outdated and inferior methods from the nineteenth century. This difference, although it seems both obvious and remarkable once pointed out, has remained unnoticed by mathematicians and educators until very recently.
Quinn goes on to argue that this unnoticed difference could to a large extent explain the poor and declining results of mathematics education in the USA. (And to the extent that other countries use the same poor methods, we should expect the same poor results elsewhere too.) The problem has become worse in recent years, says Quinn, largely because those in control of maths education believe that their methods are just fine, and believe that success will come with more vigorous application of the same poor methods. So rather than change these methods, the authorities have successfully championed "more of the same", leading to a "death spiral" from which escape appears uncertain.
I find Frank Quinn's ideas very convincing. I've done enough maths tuition that from personal experience I recognise some of the concrete problems of student understanding that he describes. It's rather frightening to see these problems explained as merely symptoms of a larger systemic problem. I think Quinn's ideas are crucial for the future of maths teaching, but I believe there are lessons here for us too, as we consider how best to teach programming. It's possible that we too have been going about things the wrong way, misunderstanding the fundamentals, and that the problems we face in teaching programming are exactly the same problems that Quinn describes. If so, the good news is that we can immediately draw on Quinn's experience and suggestions of how to do it better. I'll return to this thought later, but first let's examine more closely what he says about the difference between nineteenth and twentieth century methods in mathematics.
Up until the late nineteenth century, a mathematical proof depended to a large extent on physical intuitions and on understanding what a mathematical model "really" meant:
The conventional wisdom is that mathematics has always depended on
error-free logical argument, but this is not completely true. It is
quite easy to make mistakes with infinitesimals, infinite series,
continuity, differentiability, and so forth, and even possible to get
erroneous conclusions about triangles in Euclidean geometry. When
intuitive formulations are used, there are no reliable rule-based
ways to see these are wrong, so in practice ambiguity and mistakes
used to be resolved with external criteria, including testing against
accepted conclusions, feedback from authorities, and comparison with
physical reality.
(See A
Revolution in Mathematics?, p31.)
The great revolution in mathematics, which took place from about 1890 to 1930, was to replace intuitive concepts and intuitive proofs with systems of explicit rules, like the rules of a game, without worrying about what they "really" meant. To some people, including philosophers and elite mathematicians with excellent intuition, this seemed like a loss, but it brought with it an amazing benefit: arguments based on consistent application of these rules led to completely reliable conclusions. This meant that mathematicians could build a much more complex edifice of rules and proofs than would ever have been possible in the nineteenth century, safe in the confidence that however complex, it was all still correct. The new methods also opened up mathematics research to ordinary mathematicians, not just to super-stars with extraordinary intuition. Of course, constructing the required proofs still required imagination and experience, but there was now a systematic way to proceed when you got stuck:
When someone reaches his personal limits of heuristic reasoning and
intuition, the reasons for failure are obscure and there is not much
that can be done about it. This is why advanced mathematics was
limited to a few extraordinary people up through the nineteenth
century, and why students feel stupid when they reach their limits
today. The great discovery of the early twentieth century was that
basing mathematics on disciplined reasoning rather than intuition
makes it accessible to ordinary people. When people reach the limits
of good basic logical skills then the failures are localized and can
usually be identified and fixed. There is a clear, though disciplined
and rigorous, way forward. Experts do eventually develop powerful
intuitions, but these can now be seen as a battery, charged by
thousands of hours of disciplined reasoning and refinement.
(See
Reform Mathematics Education ..., p11.)
As programmers, we can recognise that debugging a proof under the twentieth century mathematical regime is very much like debugging a program: "failures are localized" and so, with disciplined reasoning, "they can usually be identified and fixed". Twentieth century methods are "error-displaying", in the sense that if a mathematical argument produces a false conclusion, then it will be possible to find the error, because the error will be in the mis-application of some rule. Mistakes happen; potential proofs usually have errors, in the same way that programs, when first written, usually have bugs. But if the steps of a proof are set out in precise detail (similar to the precise detail you need in a computer program) then you will always be able to find exactly where a rule was broken. This in turn will often suggest an alternative approach or a fix that will mend the proof.
(And of course, another link with computing is that it's one thing to define a system of rules, it's another thing to be sure of applying them completely reliably, with no mistakes. How can you construct social and physical systems which guarantee this, given only fallible humans and unreliable physical artifacts? Nothing in the real world is perfect. What custom and technique could in practice guarantee complete reliability? Mathematicians have worked out their own answers to that question, but that question is exactly the concern of computer science! That question calls forth all of computer science from low-level machine design, through compilers, operating systems, programming and interaction design, to psychology and organisation theory. It is scarcely co-incidence that the mathematicians John von Neumann and Alan Turing were computer pioneers. They wanted to see their method embodied in machinery.)
So, standard practice in modern "core" mathematics revolves around systems of formal rules and the completely reliable conclusions that people can draw from them. What about mathematics education? In what sense is it still using nineteenth century methods and why exactly is that bad?
With nineteenth century methods intuition is key, and mathematics education has concentrated on intuitive understanding first and skill at applying formal rules second (or never). One problem with this is that correct intuition in mathematics comes from working with rules and internalising them. Trying to hand students an intuition first is very dangerous, because they will often often get the wrong intuition, or just be confused. And confusion is perhaps the better outcome, because intuitions once gained can be very hard to change. Quinn cites Do naïve theories ever go away? by Dunbar et al. (in Thinking with Data: 33rd Carnegie Symposium on Cognition, Ed. Lovett and Shah, 2007). The problem is that subsequent learning doesn't seem to actually correct an earlier misunderstanding, it only modifies the misunderstanding: "even when conceptual change appears to have taken place, students still have access to the old naïve theories and ... these theories appear to be actively inhibited rather than reorganized and absorbed into the new theory" (Dunbar et al., 2007, p.202). A bad intuition is dangerous because it is so sticky and difficult to change.
In the USA, with its popular "reform math" curriculum, this emphasis on "understanding" has gone hand-in-hand with a drastic decline in the practical abilities demanded of students at every level. Expertise in disciplined reasoning, with careful step-by-step application of formal rules, is seen by the nineteenth century educational authorities as at best a secondary concern. Since it is unimportant, the authorities think it can be dropped with no loss. But this is a mistake which has dire consequences. Quinn comments that, as a university maths lecturer, he is responsible for developing the math skills needed by engineering students. However:
Our goals for student learning are set by what
it takes to deal effectively with the real world, and can't be
redefined. The problem is that, as compared with fixed real-world
goals, useful preparation of incoming students has been declining for
thirty years. The decline accelerated 10-15 years ago and the bottom
has almost dropped out in the last five years.
(See
Reform Mathematics Education ..., p3, written in 2012.)
Since rigorous thinking is not emphasised by the educational authorities, it is scarcely surprising that students have internalised the principle that precision is not important. Quinn gives an example of how this typically plays out on the ground:
Recently a student came to try to get more partial credit. He had put
a plus instead of a comma in an expression, turning it from a vector
to a real number. "But there was an example that looked like this, and
anyway it is only one symbol and almost all the others are right." He
had never heard the words 'conceptual' and 'error' used together; it
made no sense to him and he would not accept it as a justification for
a bad grade. (See
Reform Mathematics Education ..., p4.)
What, you might ask, has this got to do with learning to program computers? Well, first of all, these are the same people we are trying to teach how to program! Quinn's example is eerily similar to the experiences I've had with some programming students, people who seem unable to comprehend that complete precision is necessary. There is such a gap of understanding that in many cases it seems impossible to cross it. You can think that you have communicated, but then you look at what they are doing later and you see that they didn't get it at all. They are not working precisely to find their error. They are still just superstitiously shuffling the symbols in their program, hoping that they will hit the jackpot and their program will appear to work.
What can we do? Maybe, if Quinn is right, not much. Maybe for them the race is lost. Maybe we would have had to start a decade or more earlier, with very different maths teaching in school. And so, if we really think "everyone should learn to code", maybe that's where we should start today, before it's too late.
Secondly, in programming education we may by chance have thrown the baby out with the bathwater, just as the reform maths educators did with calculators in schools. In the old days, before calculators, children had to learn to do long multiplication, for example 431 × 27, using pencil and paper. This was a bit of a chore, now largely dropped in favour of tapping the problem into a calculator, which gets the answer more reliably. However, it turns out that the children were learning a lot more by accident than how to multiply long numbers. They were learning to set out their working in an error-displaying form, essentially a twentieth century proof that their answer was correct. They had to be absolutely precise in their working, but if they made a mistake, they could check their working, find the mistake and correct it. Their teachers could periodically take in this work and confirm that their pupils were working accurately and in a proper error-displaying way. Not only all that, but children were intuitively learning something about the mathematical structure of numbers by working with them, so that when they came to polynomials they found that working out (4x² + 3x + 1) × (2x + 7) was not much different to working out 431 × 27. (In fact it's a bit simpler, because there are fewer carries.) To someone with a calculator, they are entirely different.
I wonder if, in the way we try to teach programming nowadays, we may have fallen into some similar traps, by not realising what students accidentally learned in the past. For example — and here's a heretical thought — are languages with GOTO really as bad as we imagine for novice programmers? Dijkstra claimed that "It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration". But don't forget this was just a claim: where's the evidence? Dijkstra himself obviously first learned to program in a language with only GOTOs, as I did, and I'm fairly happy that it did us no lasting damage. In fact I think it forced us to think about our code and to work with it in a particular detailed way, and this earlier practice may have served us well later, even when we programed in structured languages.
The question of whether structured languages are better later, for expert coding, is not the point. Clearly for doing sums, using a calculator is better later than using pencil and paper. But for the reasons outlined above, it's not better earlier. The question in either case should not be just "can students do the old problems more reliably?". Of course students with calculators get the answer more reliably than students without, but they don't learn those other early skills which support future learning. Could this, perhaps, be the case with us too? (Although this view is far from mainstream, I am relieved that I'm not the only person to suggest it. See Where Dijkstra went wrong: the value of BASIC as a first programming language.)
And finally, whether we are computer scientists promoting "computational thinking" or core mathematicians promoting twentieth century methods, it strikes me that we actually want the same thing. Perhaps we should make common cause. The easiest way to save core mathematics might be through computer science. After all, since the mathematicians did us the favour of founding our discipline, the least we could do in return would be to help them save theirs.
(There's lots more interesting stuff on Frank Quinn's education webpage. Well worth a read.)
Subscribe to:
Posts (Atom)