Story Points are useless for what people think they are for

One of the main functions of Project Management (indeed, Management In General) is to try to get some predictability out of a delivery process. People want to know when stuff will be done, and the question is not an unreasonable one. Indeed in many fields you will get good and reliable answers to such questions – the arrival time of your FedEx parcel, for example. However the field of professional software development has a poor reputation in this regard, and we will consider one aspect of that in this article.

So how do you predict when stuff will be done? Well, one common (and common sense) approach is to break down your big chunk of stuff into multiple smaller chunks of stuff, estimate how long each of those smaller chunks will take, then munge up all the small estimates to answer the original question. Broadly speaking this is the approach taken by many software development teams, the majority of whom run Scrum, the currently dominant management methodology.

So what does The Scrum Bible have to say about this? The 2016 Scrum Guide says your Product Backlog must be estimated (the word gets 9 mentions in the 17 pages of the Guide), and it says who should do the estimating (the Development Team), but it doesn’t say how or in what units. The most common method used is Planning Poker, a topic for another article. The most common unit of estimation is Story Points, and here we will discuss how successful these Story Point estimations actually are.

It can be difficult to explain to people, especially Senior Management, exactly what Story Points are. They are a measure of Effort, not Scheduling, but the classic unit of effort is the man-day. It is quite natural to ask how Story Points relate to man-days, and often the answer is confused and/or confusing, especially when it turns out there are even different types of man-days (real vs ideal – again, a topic for another article).

But let us brush all that aside for now and assume that Senior Management is either on board or does not object, and your team is producing Story Point estimates and using them for forecasting in Classic Scrum Mode. Does it actually work on its own terms? In other words, ‘do Story Point estimates correlate well with how long it takes to get those things done’?

Let’s graph up some actual measurements and see. On one axis of our graphs we will have Story Point estimates (these are typically limited to certain values, like 1,3,5,8,13….). On the other axis we will have calendar time, the time these chunks of work take to get from Start State (typically: going into development) to End State (hopefully: done). This is often known as Cycle Time or Lead Time, nobody can ever agree on the difference between the two terms, so I usually label this axis Time In Progress (as an analogy to Work In Progress, or WIP, a common Kanban term).

Scrum Theory leads you expect that there should be a strong positive relationship between SP and TIP – the higher the estimate, the longer it takes to deliver. On average, work estimated as 8 SPs should take eight times longer than work estimated at 1SP. Is that what we find in practice?

Below I present four such graphs, two from teams in organisations where I have worked, two from the interwebs. Note that the X and Y axes are the same way round in my graphs, and flipped in graphs 3 and 4, but all tell the exact same tale:

estimation-sp-vs-tip-big-data-2016

Source: Agile Fixer

estimation-sp-vs-tip-front-end-2016

Source: Agile Fixer

estimation-sp-vs-tip-andrew-pattison

Source

estimation-sp-vs-tip-michael-dubakov

Source

The conclusion is clear, and for Scrumdamentalists, startling: there is no useful relationship between the two measures. In other words, the SP estimate on a chunk of work is utterly useless for forecasting when it will be done, which in Scrum Theory is the main (only?) reason for doing them in the first place.

I present here evidence from four teams, but have seen this pattern elsewhere over the years, though sadly I do not have the evidence to hand. And whenever I have seen people graph this up with real data, this is exactly what is found.

What do we do with this conclusion? Every activity in delivery should be subject to a cost-benefit analysis. If that activity turns out to have a higher cost than benefit, then it should cease (or at least be reduced as much as possible), and/or done in a different way that manages to produce a net plus to the organisation.

Does this mean that Story Point estimating is a complete waste and should cease immediately? An interesting question indeed. My thoughts:

  1. the villain here is more likely to be ‘estimating in general’ rather than specifically ‘estimating in Story Points a la Scrum’ – but the graphs I have to hand to show SPs as their unit, hence the focus here
  2. if the cost of even inaccurate estimates is low, the urgency of eliminating this activity is similarly low. So if you can do this activity quickly and easily, maybe there’s no reason to change. But bear in mind that there are other potential costs: misleading forecasts can undermine faith in delivery teams and the entire delivery process, which is a serious cost indeed.
  3. if a Story-Point-estimation process has benefits/purposes other than delivery forecasting, it may still pass a cost-benefit analysis.

In fact I have successfully run many teams without estimating at all, whether in Story Points or some other unit – I am far from wedded to this as value-adding activity. And where my teams have done such estimating, we have a) done it quickly, and b) derived other value from it, so it was still a net plus.

And of course abandoning sacred processes is itself hard. The organisation’s need to have a clue when stuff will be done is not going to go away, so if Story Point estimating won’t serve, you probably need to have something else at hand. Once more there’s much material here for other articles, but for now I offer this conclusion:

Story Point estimating a la Scrum can be shown, empirically, to have little or no relationship to actual delivery times. It is therefore useless as a forecasting measure. If that is the only or principal reason for the practice, you should give serious consideration to reducing or eliminating it entirely, as the time spent is wasted and would be better used for other activities.

Project Management 101: Effort != Scheduling

Effort and Scheduling (alternatively: Effort and Duration) are not the same, as every Project Manager knows (or should know). It is startling how many people confuse the two, leading to all kinds of mayhem.

Here Mr Frog has asked a question about effort, and received an answer, but he then makes the canonical error of thinking it is easy to map effort onto a calendar to derive a scheduling.

effort-scheduling

If you’re asking about how much work something is, that is Effort. The unit for human effort is person-days, or some variant thereof (person-weeks, person-months etc etc), and its calculation is trivial multiplication if you know both inputs:

# of people length of time for each total effort
2 1 days 2 person-days
1 2 days 2 person-days
2 1 week 2 person-weeks
5 2 months 10 person-months

If you are asking about calendar dates (often, when something will start, or – more importantly – when it will finish), the topic is Scheduling, and the unit is calendar days (or weeks, months etc). The below shows 2 calendar weeks, not to be confused with 2 person-weeks as above; you might be able to do 2 person-weeks of work in an hour, if you have enough people. 2 calendar weeks takes 2 weeks, by definition.

calendar-2-weeks

There is of course some relationship between Effort and Scheduling for any given piece of work, but it is not a simple one, depending on such things as resource availability, possibilities for parallelisation, and the massive uncertainties present in all complex work.

It doesn’t help matters when people confuse the inputs and outputs of such calculations, which is why it is imperative to keep these two concepts separate, and to be cautious and show your working when trying to derive one from the other.

Terrible Scrum jargon

Every field of knowledge has its own jargon, which can be as helpful to insiders (bringing clarity and precision) as it can be irritating to outsiders (bringing obfuscation and exclusion).

Where your field is truly specialist, such jargon may be justified. While a lay person might be confused by a particle physicist’s talk of gluons, anti-particles and wave functions, those words were invented to describe concepts that did not previously exist, and the jargon of quantum mechanics is no more bizarre than the world it tries to describe.

In the more commonplace field of managing software development teams, the underlying concepts are people, their roles, and how they organise themselves and talk to each other, for which vocabulary already exists. When you are trying to improve communications, it’s best to avoid management speak or silly terminology.

The practice of Scrum, the most common Agile methodology, has accumulated a few of these terrible jargon words. The very term ‘Scrum’ itself is an odd one – if you’ve ever talked to a veteran front-rower in a Rugby Union team, they’ll tell you that rugby scrums are a nasty underworld of eye gouging and worse.

rugby-scrum

But perhaps there’s something in the metaphor of ‘8 people all pushing in the same direction’, so we’ll give that one a pass for now. But there are some truly terrible examples of Scrum jargon that we should really do without. Let’s start with the three worst offenders:

1) Grooming (the Backlog)

This word was bad enough before the Jimmy Savile scandal rocked the UK, but these days it’s even harder to keep a straight face. If you have to do something to the Backlog, prepare it, tidy it, organise it, whatever – don’t groom the damn thing.

2) Impediments

This word abounds in Scrum stand-up meetings, in the 3rd question that every team member is supposed to answer. It’s not a common English word, with a stuffy, old-fashioned feel about it, and is mostly associated with speech impediments. The word ‘blocker’ is preferable, but best of all is to ask ‘do you think you’ll have any problems?’ or ‘will you need any help?’.

3) Pigs & chickens

It’s hard to believe this one ever got any traction. It’s based on a joke – which even if well told is still a lousy joke – about the difference between a pig and a chicken’s involvement in the classic English breakfast of bacon and eggs. The chicken is ‘involved’, while the pig is… wait for it… ‘committed’. Hilarious, huh?

This is then used in the Scrum context to distinguish between core team members who must attend vital meetings and participate in decision-making, and peripheral people whose attendance is not mandatory and might not even be tolerated.

While you might sometimes want to make this distinction, I would advise against doing so using words that are not just utterly confusing, irrelevant, and based on an obscure and humourless joke, they are also downright insulting: ‘chicken’ means coward, and ‘pig’ is an insult in a vast array of cultures and religions. A (very nice) English colleague of mine once referred to his Spanish and Portuguese team-mates as ‘pigs’ with this innocent meaning in mind, but they were genuinely outraged and never forgave him.

While these are the worst offenders, two more terms commonly used within Scrum are worth (dis)honourable mention. It has become so familiar to refer to chunks of work in Scrum teams as Stories that it’s hard to remember how bizarre this term sounds when you first encounter it. Although there is a historical reason for this usage (material for a future article), the fact remains that the everyday meaning of the word ‘Story’ has nothing to do with the world of professional software development, and its usage in this context only adds to confusion.

A closely associated term is the infamous estimation unit known as the Story Point. I’m sure many readers will have experienced the struggle of trying to explain what on earth ‘Story Points’ are to a bewildered senior stakeholder, who cannot understand the relevance of this apparent nonsense to his simple question “when will it be done?”. But Story Points are at least an honest attempt to confront very real difficulties with estimation and forecasting, and it’s not as if these problems disappear when you switch unit.

To finish off, it is worth reflecting that Scrum, at its heart, is a system for organising teams around some meetings and a some documents. At least ‘meetings’ and ‘documents’ are uncontroversial terms in a work setting, right? But somehow in much of the Scrum literature these meetings have been elevated to point where they are known as ceremonies, which you may well this is a somewhat pompous description of a bunch of people having a chat around a table in an office. And the documents that are the input/output of such meetings are themselves known as artifacts (this is actually the word used in the Scrum Guide itself), which suggests that some Scrum advocates might have got a little too emotionally involved in the Indiana Jones series of movies.

All in all, while this article is written somewhat tongue-in-cheek, there is a real price to be paid in the workplace for using terminology that is unclear, silly, vain, or otherwise unfit for purpose. If we want clarity of vision and direction for our teams, a good place to start is to use plain speaking, rather than obfuscation and jargon.