Story Points are useless for what people think they are for

One of the main functions of Project Management (indeed, Management In General) is to try to get some predictability out of a delivery process. People want to know when stuff will be done, and the question is not an unreasonable one. Indeed in many fields you will get good and reliable answers to such questions – the arrival time of your FedEx parcel, for example. However the field of professional software development has a poor reputation in this regard, and we will consider one aspect of that in this article.

So how do you predict when stuff will be done? Well, one common (and common sense) approach is to break down your big chunk of stuff into multiple smaller chunks of stuff, estimate how long each of those smaller chunks will take, then munge up all the small estimates to answer the original question. Broadly speaking this is the approach taken by many software development teams, the majority of whom run Scrum, the currently dominant management methodology.

So what does The Scrum Bible have to say about this? The 2016 Scrum Guide says your Product Backlog must be estimated (the word gets 9 mentions in the 17 pages of the Guide), and it says who should do the estimating (the Development Team), but it doesn’t say how or in what units. The most common method used is Planning Poker, a topic for another article. The most common unit of estimation is Story Points, and here we will discuss how successful these Story Point estimations actually are.

It can be difficult to explain to people, especially Senior Management, exactly what Story Points are. They are a measure of Effort, not Scheduling, but the classic unit of effort is the man-day. It is quite natural to ask how Story Points relate to man-days, and often the answer is confused and/or confusing, especially when it turns out there are even different types of man-days (real vs ideal – again, a topic for another article).

But let us brush all that aside for now and assume that Senior Management is either on board or does not object, and your team is producing Story Point estimates and using them for forecasting in Classic Scrum Mode. Does it actually work on its own terms? In other words, ‘do Story Point estimates correlate well with how long it takes to get those things done’?

Let’s graph up some actual measurements and see. On one axis of our graphs we will have Story Point estimates (these are typically limited to certain values, like 1,3,5,8,13….). On the other axis we will have calendar time, the time these chunks of work take to get from Start State (typically: going into development) to End State (hopefully: done). This is often known as Cycle Time or Lead Time, nobody can ever agree on the difference between the two terms, so I usually label this axis Time In Progress (as an analogy to Work In Progress, or WIP, a common Kanban term).

Scrum Theory leads you expect that there should be a strong positive relationship between SP and TIP – the higher the estimate, the longer it takes to deliver. On average, work estimated as 8 SPs should take eight times longer than work estimated at 1SP. Is that what we find in practice?

Below I present four such graphs, two from teams in organisations where I have worked, two from the interwebs. Note that the X and Y axes are the same way round in my graphs, and flipped in graphs 3 and 4, but all tell the exact same tale:

estimation-sp-vs-tip-big-data-2016

Source: Agile Fixer

estimation-sp-vs-tip-front-end-2016

Source: Agile Fixer

estimation-sp-vs-tip-andrew-pattison

Source

estimation-sp-vs-tip-michael-dubakov

Source

The conclusion is clear, and for Scrumdamentalists, startling: there is no useful relationship between the two measures. In other words, the SP estimate on a chunk of work is utterly useless for forecasting when it will be done, which in Scrum Theory is the main (only?) reason for doing them in the first place.

I present here evidence from four teams, but have seen this pattern elsewhere over the years, though sadly I do not have the evidence to hand. And whenever I have seen people graph this up with real data, this is exactly what is found.

What do we do with this conclusion? Every activity in delivery should be subject to a cost-benefit analysis. If that activity turns out to have a higher cost than benefit, then it should cease (or at least be reduced as much as possible), and/or done in a different way that manages to produce a net plus to the organisation.

Does this mean that Story Point estimating is a complete waste and should cease immediately? An interesting question indeed. My thoughts:

  1. the villain here is more likely to be ‘estimating in general’ rather than specifically ‘estimating in Story Points a la Scrum’ – but the graphs I have to hand to show SPs as their unit, hence the focus here
  2. if the cost of even inaccurate estimates is low, the urgency of eliminating this activity is similarly low. So if you can do this activity quickly and easily, maybe there’s no reason to change. But bear in mind that there are other potential costs: misleading forecasts can undermine faith in delivery teams and the entire delivery process, which is a serious cost indeed.
  3. if a Story-Point-estimation process has benefits/purposes other than delivery forecasting, it may still pass a cost-benefit analysis.

In fact I have successfully run many teams without estimating at all, whether in Story Points or some other unit – I am far from wedded to this as value-adding activity. And where my teams have done such estimating, we have a) done it quickly, and b) derived other value from it, so it was still a net plus.

And of course abandoning sacred processes is itself hard. The organisation’s need to have a clue when stuff will be done is not going to go away, so if Story Point estimating won’t serve, you probably need to have something else at hand. Once more there’s much material here for other articles, but for now I offer this conclusion:

Story Point estimating a la Scrum can be shown, empirically, to have little or no relationship to actual delivery times. It is therefore useless as a forecasting measure. If that is the only or principal reason for the practice, you should give serious consideration to reducing or eliminating it entirely, as the time spent is wasted and would be better used for other activities.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s