The Perils of Project Parallelisation

In previous articles we have been at pains to stress the key point that a team only provides value to an organisation when it finishes work, which for a software team generally means ‘having working and tested code in the production environment’. To maximise this value we need to focus ruthlessly on getting stuff finished, and preventing unfinished work from piling up.

A common and seemingly logical response to such delivery pressures is to try to bring forward the start date on important work (and there’s never a shortage of that – sometimes almost everything seems important). After all, the sooner you start, the sooner you’ll finish – right? In simple situations this may be true, as this Gannt Chart illustrates:

Gannt Charts 1 - start earlier, finish earlier

What about more complex and realistic situations? If we have three projects, is the optimal approach to bring all the start dates forward as far as possible and run them concurrently?

Gannt Charts 2 - too much parallelisation

It turns out that in these situations, the adage “the sooner you start, the sooner you’ll finish” may NOT be true, however intuitive it seems*. In fact, starting too much too soon can lead to catastrophically non-productive teams, and a huge waste of time and money.

Let’s illustrate this with a fun example. We are building Christmas Trees, and it is a four step process. Only when all four steps are complete do we have a finished product that we can sell, generating value for our organisation – before that point nothing can be sold, and no value at all has been generated.

Xmas Tree 0 - steps to complete

Now imagine we have four projects, to build four differently coloured trees: one yellow, one red, one green and one blue. Across the four products that makes a total of 16 steps. The temptation is to start all four projects as soon as possible, working on all of them in parallel, in the hope that this will bring forward their end dates (the thing that matters) as well. Let’s see if that’s what actually happens.

First let’s visualise doing the projects in series, with a smiling customer greeting each finished tree:Xmas Tree production line 1 - working in series

Now let’s compare what happens if we get all four tree projects started as quickly as possible, and keep all four going in parallel until all are completed. Given a fixed resource base, this means people will have to continually switch between projects as we go:

Xmas Tree production line 2 - working in parallelAll we have done is swapped the order of the same 16 steps, so the total time to complete all four projects is the same. Did we achieve anything by parallelising? Well yes, but nothing good – we have in fact massively delayed the delivery points to our customers of three of the four projects (yellow, red and green), while blue remains unchanged. In other words, we have made 75% of our projects take longer by over-parallelising.

Xmas Tree production line 3 - series and parallel compared

And in fact reality would be much worse than this. Writing code is an intense intellectual activity, where true productivity comes in those precious periods where you manage to ‘get in the zone’, for some people an almost trance-like state where you are juggling multiple variables and possible paths in your short-term memory. It is perhaps comparable to playing chess, or writing a novel. It can take a long time to get into that mental state, and interruptions can be fatal, especially long ones. Post-interruption, you can’t simply hop straight back to where you were in that complex mental process; you may have to start again from scratch, and might never get that inspiration back in quite the same way.

This means that context switching has a big efficiency penalty. Let’s reflect that in our diagrams; the parallel example has a ton of context switching, further delaying all the deliverables. Compared to working in series (where there will be context switching only once a project is complete), we can see the terrible effects of excessive parallelisation on delivery timelines. Now all four projects are seriously delayed by working in parallel:

Xmas Tree production line 4 - series and parallel compared + context-switching

And we are ignoring other factors still. The biggest learning opportunity in product development occurs when you deliver a working product to your customer and get real market feedback. Working in series, we bring these learning opportunities forward, and the learning is compounded each time. In contrast, in a ‘release everything at the end’ big-batch approach, the learning opportunities are deferred, and any mistakes made in manufacturing the first tree will already have been repeated in the other three.

Let’s factor this into our example. A team working in series has, through some mix of customer feedback and introspecting on their experiences, discovered feature improvements (a plant pot, star decorations) and simplifications/savings (a tree that can be manufactured in three steps rather than four):

Xmas Tree 6 - product evolution

Including this into our diagram shows even greater benefits of working in series; not only is the speed-to-market gap bigger than before, as a process improvement is leveraged in subsequent production runs, but now there is a quality gap too thanks to the faster learning cycle. Working efficiently in series, you get stuff sooner, and it is better stuff:

Xmas Tree production line 5 - series and parallel compared + ideas

These examples illustrate the real life penalty of not focusing ruthlessly on getting things done and delivered in a sensible, priority order. All of these gains come through focus and discipline, not working extra hours; they are the epitome of “working smarter, not harder”. They are also a direct realisation of key Agile Manifesto principles: deliver working software frequently, and working software is the primary measure of progress.

Of course, none of this is meant to imply that no parallelisation whatsoever is permitted – as ever, the extremes of the argument are absurd. A measured level of parallelisation, where resources that are no longer needed for a first goal drop naturally down to a second, is often optimal:

Gannt Charts 3 - just right

But of the two extremes, it is a rare thing to find an organisation that works too much in series. It is far more common to find teams obsessed with starting but not finishing, where focus is not maintained, attention is allowed to wander, and precious resources are frittered away for little reward. Try the following cartoon out on your tech team: the chances are that the reaction will be a somewhat rueful smile…

Multi-tasking - new framework side project

In our next article we will see how the concepts discussed here apply not just at the strategic level (roadmaps, programmes, projects), but also at the tactical level (bugs, stories etc – tickets that appear on an Agile Board), where limiting WIP (Work In Progress) is critical to delivery success.

 

*see Essential Kanban Condensed

An Agile Story – a guide to this website

The number of articles here on Agile Fixer is growing all the time. This article is designed to help you navigate to those that may be of particular interest.

Once upon a time, in a wonderful far-off land called “Agile”, there lived a man, who told a tale…

So I work in a biggish company, and we have quite a few software teams. I know some of the tech guys personally, and it’s my job to know how much they collectively cost – and it is a lot. We’ve had a patchy history of building apps and websites in this organisation. Software development seems very hard to do well, with many teams struggling with their processes, and unable to give convincing answers to even the most basic questions about how much work something involves and when it will be done.

In the last few years a lot of our teams have started doing a thing called ‘Agile’, which apparently is a better way than whatever they did before. But it’s certainly no magic bullet – it seems to introduce a lot of confusing jargon; for example, some of these teams now say ‘when something will be done’ using a made-up unit called ‘Story Points’, which doesn’t seem to be any real improvement if I’m honest. Perhaps part of the problem is all the big, long meetings these expensive techie people spend their time in (many of the developers actually complain bitterly about this) – I sure hope they’re using this time wisely.

Agile heaven - it doesn't fix everything

Actually these techies have one meeting that seems a bit different from the others – it’s a quick one in the morning that is often held in an open space, with a lot of people standing up. You only see the techies doing this, nobody else in the organisation; it’s a bit odd, really.

When you watch them in these morning meets nobody seems to take notes, and they’re almost always gathered around a whiteboard, or a big TV screen. Apparently they put a visualisation of the team’s work up on that screen; some teams reckon this visualisation is really critical and think carefully about how to do it well, while others seem to be much more slapdash about it. One of the senior tech managers told me that the teams that take these visualisations more seriously seem to work much better than the rest, both internally and with the rest of the organisation. I wondered why they don’t just make all the teams follow best practice, but there seems to be an ideological reason behind these very different attitudes.

Curious, I chatted to a guy from one of these ‘better’ teams. He gave me a run-through of what’s on their big screen, which is called an ‘Agile Board’. It’s a 2-dimensional grid, basically. There’s a whole bunch of columns, which show you how work is progressing. I asked how they choose the columns, and was told it varies from team-to-team, but that one way to think about it is you first need the steps to get work ready for the developers to start, and then you need the steps for the coding itself, and the testing, until it’s all delivered to the customer. Makes sense.

I told them that in our business unit we don’t have any of that. But perhaps we should – we’re often in a bit of a mess ourselves, truth be told. Our managers try to prioritise our work, but it just seems to create more noise rather than helping. “That’s why we use these Agile Boards – to make it really clear what everyone should be working on” said my tech friend.

So I went and had a closer look at these boards. They’re not just divided into columns, they have rows (called ‘swimlanes’) as well – this seems to be a key part of how they sort out what their real priorities are. One of their key concerns is to make sure people get stuff properly finished before picking up something new (they have some some techniques to help this) – a bit of that discipline would help in my area for sure.

I’m not saying we’re bad or lazy workers, mind – often it’s our bosses’ fault, since they cancel or change work after we’ve already started on it, wasting a ton of time. We should be a lot stricter about what work gets started and what doesn’t. My tech friend said his team have an excellent way of doing exactly this, which they call a ‘triaging process’.

The more we talked about these boards, the more interesting they seemed. The tech guys reckon that work spends most of its life just sitting around, waiting, rather than being acted upon, and if you highlight and clamp down on these waiting periods, you become a more effective team overnight. This really is starting to sound like “working smarter, not harder”.

I’m gonna have to have a think about how to apply some of this thinking to my own team. Apparently recruitment are using a similar system to track the state of their hires, so it’s definitely not just for techies!

To be continued (as more articles are added)…

Agile Board design – buffer states

You’re a day away from a release deadline, and disaster strikes; two of your four QAs, half the team, are struck down with a nasty stomach bug. We need to sort out immediately what that means for our release; what testing work has already started and now needs to be reassigned, and what hasn’t even been started yet, and will therefore probably get dropped out of this release entirely. Does our Agile Board help with such decisions by making it instantly obvious what has started, and what hasn’t?

This is the kind of common real-world situation that can be helped by the good use of “buffer states” on your board, and that is the topic of this article.

As we know from previous articles, an Agile Board shows the workflow of a team, the series of steps its work goes through on the journey from conception to finished. Along the way the work is passed along a virtual production line, with various different combinations of people engaging in different value-adding activities, until we get the finished product.

The ideal is that the delays between (and within) each of these value-adding activities is minimal or even nil, the appropriate metaphor being a slick sprint relay race:

baton-passing-in-relay-race

Unfortunately the real world is rarely anything like this. A good team might approach this ideal in dealing with an Emergency, where everything is sacrificed for speed whatever the cost in disruption elsewhere, but for normal work there will most certainly be gaps where work sits idle. In fact, when measured, most work spends a shockingly high percentage of its life sitting around unattended, waiting to be picked up. The metrics of work idleness*, ‘touch time’ and ‘flow efficiency’, will be the topic of a future article, but for now it should be clear that reducing such idleness to a minimum is one of the main objectives of a good team, and visual management of idle work is essential to that goal.

This means separating out such idle states and making them visible as columns on an Agile Board, appearing as ‘in-between’ columns or ‘buffer states’ that separate out the columns for the main value-adding activities.

If you’ve ever worked with an Agile Board you have likely come across buffer states, even if you didn’t think of them that way. Even the simplest board of all, “To Do — Doing — Done” is really nothing more than a couple of buffer states around a single activity, and it can be rewritten to a format that makes this explicit:

agile-board-simplest-with-equivalent-expressed-as-buffer-states

Let’s apply this thinking to the example workflow sketched out across two previous articles (here and here), to see what we can learn. After such an exercise we will have a deeper understanding of this workflow, and will have a conceptual framework for further tweaking and improvement. Here it is again, a complete workflow across a dozen columns:

agile-board-complete-workflow

These columns tell us that we started with an idea, and then applied the following list of discrete activities to it:

  1. Triaging
  2. Ticket Prep
  3. Development
  4. QA
  5. Sign-off
  6. Deployment to live

So what happens if we take this list, and just wrap each item in buffer states? Let’s do that for the first two adjacent activities, Triaging and Ticket Prep, and see what results:buffer-states-around-triaging-and-ticket-prep

The first thing to notice here is that we end up with two buffer states next to each other, which makes no practical sense. A ticket which has completed Triage simply is ready for Ticket Prep, so the two columns Triage Complete and Ready for Ticket Prep are duplicates. One of the redundant columns must be removed; it doesn’t matter much which one goes, but it’s nonsense to have both.

With this in mind, let’s see what happens if we apply just a pre-activity buffer, ‘Ready for X’, before each of the six main activities in the workflow. This gives us the 14 column workflow below, with the buffer states highlighted; collectively, they now show all the states where no value-adding work is being done.

buffer-states-added-to-whole-workflow

Comparing this to the original workflow, two useful steps have been lost, Handshaking and Code Review. These are neither buffer states, nor a main activity state. Instead, both of these are reviewing steps – Handshaking is a review of the Ticket Prep activity (to see if it was done to the required standard), and Code Review is the same for the Dev step. Let’s insert these two states back into the workflow, renamed to be consistent with the other steps as “Ticket Prep Review” and “Dev Review”. This gives us the 16-state fully-buffered workflow below:

buffer-states-and-review-states-added-to-whole-workflow

Now let’s compare this to the original 12-state workflow, with its (three) buffer and (two) review states highlighted:

agile-board-complete-workflow-with-buffer-states-marked

So which of the two is better? The differences come down to column naming (which is a matter of taste), and whether every discrete activity is buffered. Is it actually better to buffer every activity?

Arguably it is. In a previous article on triaging, we saw that that buffer states make it easier to describe the precise meaning of each column, as you are no longer munging both waiting and doing into a single column. Is that sufficient to declare we should always buffer every activity on a board? Not quite; there is a countervailing pressure, that Agile Boards with too many columns can become cumbersome and hard to scan and read, whether they are physical or implemented in software.

Physical boards with 16 columns are relatively rare; you need a mighty wall to display it on, and there aren’t too many of those available in most offices. For software boards, the limitation is the legibility of a screen with many columns on it. If everyone in your office has monitors with huge resolution, or (perhaps soon!) Minority Report style VR screens, you won’t be affected by this limitation. But most of us will sadly run out of wall inches or readable pixels, and so there is some reason to limit the number of columns where possible, unless you wish to split the board in two (which can be done – I will cover this in a future article).

Let’s return to our original 12-state workflow, and see what buffer states are and are not included. It turns out that none of Triaging, Ticket Prep or Sign-off are buffered, and Deploying to Live is effectively unbuffered as well, with one column (‘Ready for Live’) representing both buffer and activity itself. Activities that are buffered are Dev and QA. What are the reasons for treating different activities in this way?

In the real teams where this very Agile Board setup worked well, the unbuffered activities of Triaging, Sign-off and Live Deploy were relatively quick, uncontroversial and unlikely to go wrong, and so were able to function with a less detailed board representation. Coding (Dev) and testing (QA) are almost always, by contrast, multi-factorial and highly complex activities, drawing on the bulk of the team’s manpower and budget, and with equivalently high need for transparency, hence the decision to buffer those columns.

Between Triaging and Ticket Prep a buffer might have been helpful, but both activities were owned by the same Product sub-function, and so tended to be run by the same small group of people. The resulting good interpersonal communications and the ease of ‘handing work over to yourself’ meant that extra detail on this part of the board was unnecessary to smooth team operations.

So it turns out that the buffering decisions on this Agile Board were well fitted to its context; there is a rationale for buffering some activities but not others. A fully buffered Agile Board may be a wonderful thing – but if you can reduce column count without compromising the flow of work, that is also fine. And the only test that matters is the empirical one of whether your setup works well in practice.

Finally, I will show a slightly different kind of board setup which you sometimes see in the literature and in the workplace. Many Agile Board software systems do not allow sub-columns (which is why my examples show simple columns that are the height of the whole board), but some software systems do, and of course on a physical board you can have any setup you like within the available space. Allowing sub-columns leads to slightly different board layouts such as the following, taken from this book:

agile-board-with-sub-columns

Whatever you think of the choice of just three main activities, this board design does neatly buffer each activity with a sub-column titled ‘Ready’, and is a common layout that you should be aware of as a variation.

In conclusion, we have drawn attention to important ways that Agile Board columns can be classified into principal activity, buffer and review states. We have seen that where activities are complex and expensive, you may expect a greater benefit from going into more detail on your board by adding buffer and possibly review states. Where activities are relatively quick and straightforward, you may wish to forgo such details and see if a single column for the activity will serve your needs. As ever, it is the context that decides, and it is up to you to find what setup and tweaks work optimally for your teams.

*Note: this is idleness of work, emphatically not the quite different concept of the idleness of people. The former must always be cracked down on, but the latter is a more nuanced thing; indeed, having some slack built in to people’s normal work practices is essential to allow for responsiveness. This will, again, be a topic for a future article. 

Triaging Part 2 – practical details that make the difference

In a previous article we introduced the core concepts of triaging: the origins of the term in emergency medicine, what it means to have triaged a ticket in the world of software development, and the basic metrics around doing it well. In this follow-on article, we will look at a few of the subtleties in this very first part of a team’s workflow, the steps from ticket creation through to finishing triage.

So what are the ways these steps can vary between teams? Let’s consider a few:

Who can create tickets?
Teams can and should formulate (then publicise, and enforce) their own rules around this. At one extreme, it could be “anyone at all”, even people outside the organisation. Or you could be more restrictive: only people inside the organisation, only people in this particular department, only members of this team, only selected members of this team, or only one particular person.

There’s a trade-off here: the smaller the number of people who are allowed to create tickets, the easier it is to standardise the ticket creation process around your optimal set of requirements (eg what data fields must be filled in so a ticket is triageable), but the more you need to work to establish good comms methods for allowing outside people’s work requests to be translated into tickets by these Creators. If, on the other hand, Ticket Creation is open to many people, the team needs to carefully police the quality of created tickets, and publicise guidelines around doing so. There’s no right answer here – each team has to figure out their own arrangements by an empirical process of trial and error.

Who can triage?
Again, this is contextual, but a typical arrangement is that the Triage Team is a combination of internal team management roles, people with such titles as ‘Tech Lead’, ‘QA Lead’ and ‘Product Owner’.

What needs to be recorded?
Agile doesn’t mean chaos or never writing anything down, but there’s no point being wasteful and requiring unnecessary documentation either. With triaging it is often sufficient to take less than a minute to make just a single comment (eg in a Jira ticket) along the lines of “Ticket triaged by Rob and Jenny – bottom of the backlog”. That simple audit trail can save a mountain of pain later.

How do you keep on top of your triaging?
Given that we want Triage Time and Untriaged Count to be as near to zero as possible at all times, we need untriaged tickets to show prominently somewhere, as a reminder that there is triaging to do. Jira, arguably the most powerful work management system commonly used by software teams, tends to dump minimally-filled-out new tickets right at the bottom of the backlog, where they might not get looked at for a very long time. If you are in that situation, you need to have a robust team process to combat Jira’s tendency to hide these new tickets away.

When I run tech teams, I often take on myself the task of scanning the bottom of the Jira backlog several times a day for recently-created, untriaged tickets. I would then move such tickets into the ‘Triaging’ state, stick them (temporarily) high up the backlog so they get noticed by all on the team’s Agile Board, I comment/assign them so my Triage Team are additionally notified by email to make the priority call, and as appropriate use other comms methods (F2F, IM etc) to make sure they know of the incoming wounded. None of this is very onerous if done “little and often”, and it has worked very well in multiple teams.

What options are there for representing triaging on an Agile Board?
In an earlier article we looked at one simple pattern for representing the early part of a team’s workflow, where triaging itself is represented by a single step, with no buffer states on either side:

agile-board-basic-triage-setup

The meaning of the columns in this case would be:

Idea: a ticket has been created in the system, nothing more. It may or may not be fit for the next step in its life. If not, this could be because it contains mostly garbage, or perhaps because it hardly contains anything at all. The creator, with or without help, must rectify this before the ticket can progress.
Triaging: in the absence of any buffer states, this column contains tickets that have been assessed as ‘fit to be triaged’ but where that process has not yet started, and those currently undergoing triaging.
Ticket Prep: in the absence of any buffer states, this column now represents tickets that ‘have been triaged and no more’, and tickets where in addition ‘more detailed ticket prep has already started’.

An alternative Agile Board setup, with buffer states either side of the main Triaging activity, might look like this:

agile-board-advanced-triage-setup

This time the column semantics are as follows:

Idea: a ticket has been created in the system, nothing more. Nobody other than its creator necessarily knows what’s in it.
Ready for Triage: this “pre-state buffer” means that someone has checked the information in the ticket, and verified that it is sufficient for a triage decision. If such a check reveals that a ticket has failed to meet this standard, that ticket stays in the Idea state and is appropriately reassigned and commented.
Triaging: the triage itself is happening now.
Triage Complete: a “post-state buffer”. Triaging has finished, meaning that the ticket has been placed in an appropriate slot in the team’s priority-ordered list of work. Nothing more has yet happened.
Ticket Prep: the next major step in the ticket’s life after triaging has now begun

This extra detail on the board has a cost and a benefit. Having a large number of columns can cause issues both with physical boards (are they wide enough to fit all the columns?) and software boards (does your monitor have enough pixels to display all columns legibly?). The benefit is that you are more accurately able to represent what is really going on – see how much easier the column semantics were to explain in the second example above. Teams must figure out by trial and error what columns work best for them, but I often find that you can pare columns down around the activity of Triaging, but definitely need buffers around some downstream activities such as Dev and QA. Buffer states will be discussed in more depth in a future article.

How exactly do I collect Triage metrics?
The two metrics introduced in the previous articleTriage Time (average time to get new tickets triaged) and Untriaged Count (how many current tickets haven’t finished triage yet), are read off your Agile Board, and so of course depend on how the board is set up. In the two examples we are discussing, the detailed setup contains three columns where tickets would count as untriaged, and the simple setup just two, as illustrated below. The diagram below shows how the columns map to each other across the two variants, and the “Triage Line” indicates the point at which triaging has been completed in each case:

triage-complete-shown-on-different-board-setups

To conclude, if you care about smooth and efficient software delivery, there’s much important work to be done well before the first line of code is written. It’s notable that many teams focus only on modelling and tweaking the delivery part of their workflow, and sorely neglect the upstream steps, such a triaging. This neglect only stores up problems that come to light during delivery itself, when they are much more costly to fix.  Hopefully between this article and the previous one you have the guidance you need to put a robust Triage Process in place in your teams; the rewards for doing so will be felt throughout the rest of your work.

Triaging Part 1 – why it matters, and how to do the basics

Triaging is a critical step in the lifecycle of any piece of work. I touched upon it in a previous article, and here we go into it in more depth.

The term has a medical origin. Deriving from the French triager, meaning ‘to separate out’, it is best to imagine either a military hospital near the front (known to many as a MASH unit), or the Emergency Department of a civilian hospital. New wounded are brought to the unit at unpredictable times, in unpredictable quantities.

The initial processing of these new arrivals is urgent, and must be done quickly and efficiently; there are, after all, limited resources of medical care to go round, so what is available must be used as efficiently as possible, where you will get the most reward (lives saved) for the effort. To achieve this goal this the wounded are ‘triaged’, which means divided into three categories:

  • the worst cases – they will die anyway. These ones should be made as comfortable as possible, but they are actually not the highest priority, as medical intervention will make no difference.
  • the minor cases – they will survive anyway. Obviously these are not the highest priority.
  • the ones where immediate action will make the difference between life and death. It is these that are the highest priority – they must be prepped for surgery (or whatever is appropriate) as fast as is humanly possible. Every second counts.

Thus medical triaging amounts to a quick prioritisation exercise. It is done on the basis of limited information (you can’t spend hours investigating each new wounded case, you have perhaps minutes or seconds), but you obviously can’t do it on the basis of no information. The skilled triager knows exactly how much information is needed to make the triage call; he gathers that amount (no more, no less), makes the decision, then the patient is moved on to the next stage of their care, and the triage team moves on to the next wounded person, until all have been triaged.

Note that the triage team does not administer treatment – that is the next step. In some cases, that next step might follow one second later, but it is still conceptually a next step, possibly performed by a different person in a different place. The triager has to keep going with the rest of the triage as their highest priority, because the next case could be even worse – until you’ve looked, you just don’t know.

Triage is therefore complete when every single new case has been sorted into the correct category, in other words when the untriaged count = 0. Note that even if the helicopter brings in just one single wounded soldier, you still can’t afford any delays in starting triage, since you have no idea what state that person is in. That single guy could be your top general, right on the boundary between life and death, and a lackadaisical triage process means the chance to save him could be lost.

All of this has parallels in the world of software development, except – hopefully – there’s less blood. Instead of wounded soldiers being helicoptered in to base, the raw material for triaging is new tickets. Starting as soon as possible after new tickets are created, the triaging itself means sorting them into their correct position in the priority-ordered list that is the product backlog. That ticket is then triaged and the triage team moves on to the next one, until untriaged count = 0, when triaging is complete (for now), and the next steps can begin.

The one disanalogy between medicine and tech triage is that a new ticket might not have enough information to be triageable – some people write absolute gobbledigook in new tickets, or barely write anything at all. In that case you send it back to source for more information, and try to triage it again when it returns. There’s no medical equivalent of that – you wouldn’t send a soldier back to the Front to get their wound explained more clearly.

What is the tech equivalent of the three medical triage categories above? One way to think about it is that you are choosing between 6 broad slots in the backlog for a new ticket. For each new ticket, you need to decide whether the right place for it is…

  1. …right at the very top of the backlog, ie it is the most important thing of all. It is quite likely to be an emergency to have dropped in this position, so your team needs a way to deal with emergencies.
  2. …a specific slot in the top quarter of the backlog. If so, pick the precise slot in this quarter with some care, as much of the work this high up will have been started already, and is progressing across your Agile Board. This new ticket might need some extra attention to get it to Ready for Dev quickly, and once there you might then put it right at the top of the R4D column, meaning it’s the very next thing the devs pick up whenever one becomes free.
  3. …somewhere in the second quarter of your backlog. If so, don’t bother being so precise about exactly where in this section it drops – it’s likely to be a bit of time before this work starts advancing across your board, and the backlog will probably have been revised several times before that happens.
  4. …somewhere in the third quarter of your backlog. Even less precision is merited here. Some of this stuff will never get built at all, frankly.
  5. …somewhere in the last quarter of your backlog. Most of this stuff will never get built, and you should consider carefully whether this ticket is even worth preserving at all. If not, you are at the next option…
  6. …the rubbish bin, ie the ticket is worthy only of being thrown away. Maybe it’s a dumb idea, maybe it’s not such a bad idea but there’s just no chance of this happening on any timescale that people remotely care about. Depending on what ticket management system you are using, you may actually delete it, or perhaps just resolve as invalid, or won’t fix, or similar.

These 6 choices are shown in the pictorial example below. In this case, a new ticket ends up being slotted into the bottom quarter of the backlog, option 5:

triaging-process

Once triage is complete (well done! untriaged count = 0), your next duty is to make sure all interested parties know of this triage decision, including whoever requested/created it. In some systems (eg Jira) it may be sufficient to add them as Watchers to the ticket so that they automatically receive email notifications of updates; in other contexts, other comms may be necessary. When these parties do find out how you’ve prioritised their work, they may be fine with it, or may vigorously disagree, but it’s best to have those discussions out in the open immediately, rather than to leave potential future conflicts stewing away in the background.

It’s in everyone’s interest – the team, suppliers, customers, stakeholders – to have a good triage process – there may be some horror lying in wait in that unopened ticket! The metrics around this process are a) ‘Triage Time’, the time it takes for newly created tickets to be triaged, and b) ‘Untriaged Count’, the number of tickets in the system that have not been triaged yet, and are therefore of unknown priority. The aim is to get both metrics as low as possible. Oddly, I have never seen the Triage process specifically called out as a key measure of how well a team is running, and yet in my experience there is a very strong relationship between being a good team, and doing the basics well – and triaging is one of the basics for sure.

In the next article we’ll examine the steps from ticket creation to triage completion in more detail. For now we can summarise the best practice lessons for triaging as: do it quickly and efficiently, don’t let untriaged tickets lie around, aim to get your untriaged count back to zero daily if possible, and let the concerned parties know the results of your triage as soon as you have done it. If you do all this, you’ll be one of the relatively rare teams that have a tight Triage Process, and you’ll have taken a big step towards becoming a good team.

Agile Board usage – why tickets shouldn’t move left

So a ticket has failed QA, and the fail’s a bad one, it needs to be fixed for sure. A dev (probably the one who coded it in the first place) needs to change the code, and then it needs re-reviewing, re-testing, and hopefully will pass second time round.

This is one of the most common situations in a software development team (unless you’re the one team in the world who manages, without cheating, to produce no bugs at all). How should these steps in the life of this ticket be represented on an Agile Board?

There are two basic patterns that can be used. We’ll start with the most commonly used pattern, the ‘Looping Method’, which can be summarised as ‘move the ticket left, back to the In Dev column, for the code amends, and then to the right through Code Review, Ready for QA etc, just as it did on the first iteration. In fact, every time a developer needs to work on that ticket, it jumps left to In Dev for the new code changes, then moves across again to the right’.

Note that there’s nothing special about the In QA column we are using in this example. The Looping Method applies equally if the bug had been found in different column, such as Sign-off, or Ready for Live, it’s just that the ticket would just have to jump that much further to the left to land back in ‘In Dev’, so the loop is bigger.

A pictorial representation is below. The ticket reaches In Dev for the first time and is assigned to a developer to write the initial code. It moves rightwards through Code Review and Ready for QA, then a tester picks it up at ‘In QA’, assigns it to themselves and starts testing. The nasty bug is found, and the ticket jumps three columns to the left and is reassigned to a Dev. This ticket will now loop between columns until every nasty bug is fixed, and it can finally move on beyond QA. At that point more bugs might be found, and more looping will start.

tickets-move-left

It’s easy to see why teams might work in this way –  it’s just doing the same thing on the second, third or fourth iteration of coding as happened the first time. Is there any problem here?

In fact there is. Due to this looping, tickets are jumping horizontally whenever a different sub-team (often, but not always, Dev) needs to work on them next. As a result you are now unable to use the triangulation method of reading a board, as this method uses both horizontal (‘Done-ness’) as well as vertical (‘Priority’) planes to work out the correct order of tickets. But there is no alternative to triangulation to reliably order all tickets on a board, so the Looping Method destroys a key property of a good Agile Board, that it broadcasts to everyone what to pick up next.

how-to-read-fail

In fact the triangulation method tells us that moving tickets left is functionally equivalent to de-prioritising them, as moving left means a ticket is ‘less Done’, and therefore less urgent. But a moment’s thought tells us that it is simply not true that finding a bug deprioritises a ticket, so on the Looping Method our board is Not Telling The Truth – a big problem.

Let’s consider which of the following options we would prefer an available Dev to pick, other things being equal: a) fix a bug on a ticket they had previously coded, b) work on other code that hasn’t reached QA a first time, or c) start something entirely new? The answer is clearly a), to fix the just-found bug. There’s a good chance of getting such a ticket finished and off the board with relatively little extra effort – it genuinely is closer to Done. Remember, it is only finishing tickets that provides any value to the organisation, so it would be crazy to deprioritise a ticket just because it needs some dev attention.

In fact the behaviour we generally want to encourage on finding a bug is for developers and testers to work in close collaboration, with tight feedback cycles of testing, recoding, reviewing and retesting, until the bug is fixed. If each step suffers delays, even relatively minor bugs can take an absolute age to fix, with both parties context switching each time, and quite possibly the underlying codebase changing considerably as other tickets flow to the right. Deprioritising tickets will lead to loss of focus and increase in delays, so that’s the very last thing you want to do to a ticket when it fails QA.

Indeed, if you allow or encourage Devs to start on new code in preference to fixing already-written code that just failed QA, the team will quickly end up with a mountain of unfinished, buggy work piled up in the middle of their board. Such a team – great at starting things, but lousy at finishing them – can only be very unproductive.

Of course, not all teams using the Looping Method sink that far. Good practices outside the Agile Board – strong leadership, effective meetings and internal comms – might compensate for a deficiency in how the board works. But good board usage is designed to make all this far, far easier. What is the alternative practice we can use here?

The alternative is the ‘Tickets Don’t Move Left’ method (TDML)*. It says simply that when a ticket requires the help of some other part of the team to move onwards, you leave it where it has reached on the board and reassign. This way the board continues to represent Done-ness accurately, so the triangulation method works and you can always use your board to figure out the Most Important Thing To Do.

Let’s see this method in diagrammatic form.  The first two parts are exactly as before, but when the ticket reaches QA and the bug is found, it stays in the In QA column and is reassigned to the Dev for re-coding:

tickets-dont-move-left

And then when it’s time for a second round of Code Review on the recoding, the ticket again stays where it is, but is reassigned to a different Dev for that review. And then it will go back to the QA for retesting, again, without moving column:

tickets-dont-move-left-2

This is really a trivial change in board usage. Team members just have to get used to relying on the Assignee field to know what to pick up, and understand that they might need to pick up from any column on the board, rather than only ever bothering to check one particular part of the board (for Devs this would typically be the In Dev and Code Review columns).

I have moved teams from the Looping Method to TDML method in a single day. It really isn’t that hard, and the payoff is enormous; we stop working in a way that continually destroys a board’s ability to clearly represent the priority of work. The triangulation method works, so you get the huge benefit of easily reading your board and knowing what to pick up next.

how-to-read-success

On top of this, there is a more subtle attitudinal change when you become accustomed to looking across the whole board to find your work. A large part of the Agile mindset is to have everyone working and thinking as a team, rather than the siloed mentality of a Waterfall process, where in strict order Analysts do X, then Devs do Y, then QAs do Z, and the success of the thing-as-a-whole is always someone else’s problem.

The correct team attitude is that the entire workflow is a shared responsibility. In a truly Agile team it’s not your narrow job title that defines what you do; your job is to use all your skills to help the team get its highest priority work all the way to ‘Done’. And TDML is the perfect visual representation of that approach, where everyone is co-operating fluidly with everyone else, as need dictates, to maximise the value of the whole, with a collective sense of ownership of the flow of work.

In that sense the board usage recommendation of this article is very powerful. Though TDML is on one level no more than a minor tweak to how teams use an Agile Board under some circumstances, it has a strong resonance with how team members need to think and act such that they are genuinely a team, working together constructively towards a common goal.

*a caveat to the TDML rule: as with all rules, there is a superior rule which takes precedence: ‘Common Sense Always Wins’. So if a ticket is ever simply in the wrong place on a board, you should always correct it and Make The Board Tell The Truth, whether that involves the ticket moving North, South, East or West. Even the best rule can be taken too dogmatically; it doesn’t make it a bad rule, it just means you need to use Common Sense at all times.

Note: I am indebted to David Shrimpton for introducing me many years ago to the key concepts in this article.

Agile Board design – the theory and practice of swimlanes

In previous articles I have discussed the theory and the practice of choosing vertical columns for an Agile Board to represent the workflow steps that tickets pass through as they move across the board from left-to-right. But a board is a two-dimensional grid, and we now turn to the other, vertical plane. For some reason, rather than the conventional terminology of columns and rows, the jargon in Agile Board World is to talk of columns and swimlanes, so that is our topic here.

swimlanes-real

We have argued that using the vertical plane to show priority is a better method than using priority flags, and that one-ticket-wide columns on an Agile Board implement this beneficial principle. You can move a ticket up to show an increase in priority, but other other tickets have to move down to make room – your board is both showing and enforcing a trade-off decision:

priorities-on-a-board

This example shows tickets moving within a single swimlane. But what if you have multiple swimlanes, stacked vertically one on top of the other? In this case you must stick by the rule that vertical position is always and only used to show priority, and that means we must set up our boards such that higher swimlanes have a greater priority than lower ones. This restricts the meaning we can give to swimlanes on a board.

Let’s consider the hypothetical board below, which has 14 tickets on it, distributed across three swimlanes. I have used the triangulation method to show the order of tickets. Whatever meanings we give to the red, yellow and green swimlanes, it must be the case that ticket 4 is more important than ticket 5, and ticket 11 is more important than 12.

priorities-in-swimlanes

Very often you will find Agile Boards are set up with swimlanes corresponding to functional areas, and this may conflict with the principle that vertical position shows priority. If it does conflict, you should not use your swimlanes to show functional areas (there are other means of filtering down to the functional level, which I will cover in a future article).

I recently spent some time in a team building an enterprise level identity system, and three of our functional areas were 1) registration, 2) sign-out and 3) user uplift. As it turns out, there was not a clean and strict mapping of priorities onto these functional areas; some registration work was relatively important, and some was not, and the same applied to the other two functional areas. We therefore did not set up a swimlane for each functional area, as the board would have been falsely indicating the priority of some tickets.

So what should you map your swimlanes to, so that you don’t break the vertical position shows priority principle? There are no doubt many possible answers, but I will consider two swimlane patterns below that between them cover a large percentage of cases.

First let’s consider a team that is in control of its own delivery cadence, and either is already doing, or aspires to move in the direction of, Continuous Delivery; in other words, they release work to Live in frequent small batches. What swimlanes might be chosen by such a team?

This team’s board needs to cope with both normal, planned work, and the all-too-common situation that something super-urgent crops up, requiring normal planned work be dropped immediately until the crisis is over. Such emergency situations are typically due to the discovery of a nasty bug in Production.

Since Emergencies are the highest of all possible priorities, they go in the highest lane up the board, with ‘normal work’ sitting below it, as here:

swimlane-example-continuous-deployment

How do these swimlanes get used in practice? We of course want visual representation of work and preferred team behaviour to dovetail neatly and reinforce each other, and this swimlane layout can certainly do so.

Team rules around emergencies are usually something like: 1) verify any claim that new work really deserves ‘emergency’ status, 2) if it does, everybody who can help with the emergency does so immediately and only returns to other tasks when they can do no more on the emergency work.

A top-of-the-board Emergency Lane is admirably suited to this. Any ticket in that lane is above the fold on a computer screen, and is the very first thing any user’s eyes fall on. A new ticket arriving in this lane cannot be missed, and is a clear signal for the appropriate group of people to verify if the ticket is a genuine emergency or not, in which case they move the ticket out of the top lane. For tickets that stay in the Emergency Lane, that is again a strong visual signal that work exists that needs full and immediate attention, with team members resuming other work lower down the board only when the emergency allows.

The disruption caused by emergency tickets is considerable, even worse than normal context switching, since a) there is usually no option to get the prior work to a nice, neat point before dropping it, and b) working on an emergency often leaves you in a frazzled mental and emotional state, not easy to switch back out of. This makes it critically important to ensure the Emergency Lane is treated with proper respect, ie for genuine emergencies only. Such disruption is only merited in extreme cases.

There can be a temptation for people to try to slip tickets into the Emergency Lane just to get stuff done faster, and the team need to be on their guard against any freeloading on this crisis-management process. The very term “Emergency Lane” is helpful; less impactful alternative names such as “Expedite” don’t provoke such an emotional response. Saying you want to “expedite” a ticket is really quite different from looking a colleague in the eye and declaring that your work is a genuine emergency!

This two lane pattern of Emergency Lane and Normal Prio Lane works, without much modification, for the majority of modern software development teams. It also applies to most teams outside the world of software – I have successfully run a marketing and a finance team using this pattern of Agile Board, for example.

However I use a different board pattern in situations where releasing continually is impossible, meaning you cannot avoid having less frequent, big-batch releases, which can be seen queueing one after the other a considerable time in advance. A canonical example would be an iOS team, where the release process has a major bottleneck that cannot be overcome; namely, the submission and approval process of the App Store, which takes around a week, during which time you can make no further changes to the submitted binary without the clock starting again.

Typically iOS teams respond to this constraint by aiming for one or two releases to the App Store a month, and it would be quite normal to have company personnel actively working across perhaps the next three iOS releases. This would be reflected in the structure of swimlanes on the iOS team Agile Board:

Swimlane example - iOS.png

This hypothetical team has version 1.2.1 already out in the App Store, and the main focus of team is currently on the next release, v1.3. As the end of this release approaches, team members who are freed up will gradually shift on to the following release, v1.4. Naturally this is of lower priority (as reflected by the lower position of its swimlane), and if anything comes up in v1.3 people will switch back to that. The position of the swimlanes helps team members make the right decisions – it’s more important to get the v1.3 binary completely finished and into the App Store queue this week than to work on some functionality in v1.4 that isn’t shipping till next month.

And this pattern repeats. Perhaps the planned release after v1.4 is a major new version, v2.0. Some people (perhaps Product, Architects, Designers…) are already starting to put some preliminary work into that one, and that can be neatly shown on the board. But if they were needed to work on v1.3 or v1.4, you would generally expect them to interrupt work on v2.0 to do so, and the board shows that visually – those lanes are higher up, so that work is higher priority. You can use the triangulation method and trust what it tells you.

In summary, these are two practical swimlane patterns that together cover the needs of many teams;  one for small-batch release setups where the version number of your tech product changes frequently, and it is hard to predict ahead of time exactly which version a given piece of work will end up in. The other is for large-batch release setups, where these things are much more likely to be known up front. In both cases the swimlanes respect the rule of using vertical position to show priority, and all the advantages of being able to easily read the board by triangulation are retained.

It is of course always possible that any given team may empirically discover alternative swimlane arrangements that work for them, but to abandon the use of the the vertical plane to show priority is usually a retrograde step, and should be avoided, as with the examples given here.

How to read an Agile Board

In previous articles we have discussed what an Agile Board might look like for a typical software development team, but this will bring no rewards unless we know how to use it properly, for example in a properly conducted stand-up meeting.

A major function – arguably, the prime function – of an Agile Board is to make team priorities absolutely clear, so that all team members can be sure that they are working on What The Team Needs Most. For this to happen, everybody needs to be reading the board in the same consistent way, and that way must be simple, or people will get it wrong.

This article will explain what that simple method is.

Let’s start with some examples. Below is a simplified cut of an Agile Board, with just two tickets showing. Let’s say you are a team member reading this board, and as it happens your skill set and the state of those tickets means you are able to assist on either one of them. Question: which one should you pick to work on first, X or Y?

ticket-choice-vertical

The correct answer is pick X before Y, because in a properly functioning Agile Board, the vertical plane is reserved to show priority. Higher up means higher priority, and X is higher up, so you work on X first, as below:

ticket-choice-vertical-plus-answer

There is little controversy here; I have asked this question to real people hundreds of times, and everyone basically gets this right first time.

Now let’s try the same exercise for a different cut of the board. Again, you are a team member reading this board, and you are able to assist on both the tickets that are showing. Which one should you pick first, X or Z?

ticket-choice-horizontal

There is an equally clear answer here, which the great majority of people also get right first time. The answer is to pick Z before X, because Z is further to the right, ie it is closer to being Done, and ‘Done’ is what counts, the thing that delivers value to the organisation. Sadly in life it usually is very easy to start things, but it can be hard to finish them off, so an effective team concentrates on getting stuff finished before new stuff is started, and this desirable behaviour is promoted by reading the board in this way.

ticket-choice-horizontal-plus-answerThe third and final quiz question is a combination of the two above. Again, other things being equal, in which order should someone pick up the following three tickets?

ticket-choice-both-planes

The answer is pick Z before X, and X before Y. This is a simple combination of the two rules above, which leads to reading the board in a kind of triangulation method, from top-right (where you’ll find the most important tickets on a board) to bottom left (where you find the least important):

ticket-choice-both-planes-plus-answer

Now let’s apply this triangulation method to a more realistic board setup. This is exactly what a team is trying to do in every Stand-up meeting, so that it can discuss the tickets in priority order. Applying the triangulation method to this board we can easily label the tickets on it from 1 (the most important) down to the 17, the very least. The ‘Done’ tickets are ignored in this exercise – by definition, if they are Done, there is no work left to do on them.

ticket-choice-full-board

And how do you apply this rule to a board which has Swimlanes? Since the correct use of Swimlanes (discussed in detail here) maintains the principle of using the vertical plane for priority (i.e. higher up Swimlanes are higher priority than lower swimlanes), it is still easy to apply the triangulation rule and again number every ticket on the board from most important (1) to least (17):

ticket-choice-full-board-with-swimlanes

Sometimes, when applying the triangulation method, a team member will voice an objection: what the board is saying is not correct! And indeed in a well-functioning team there is a bi-directional effort going on at all times:

  1. work in the order the board tells you to work
  2. make sure the board is always telling the truth about the order to work on

Any team member must be empowered to challenge the signals the board gives out if there is any doubt. Far better to do so than to slavishly follow what the board says even if it is potentially wrong, or to do what you believe is correct without reflecting that choice on the board, leaving your teammates in the dark as to what you’re doing and why.

So a comment such as the following is an entirely legitimate statement to hear in a good team: “the board is telling me to pick up ticket 512, but surely I should be doing ticket 197 instead, for reasons a,b,c,d… “. If, on examination, reasons a, b, c and d stand up, then as far as possible the board should be amended to reflect the Real Truth, and that team member has done everyone a service. It is everyone’s duty to Make The Board Tell The Truth!

With the whole team taking collective ownership of their Agile Board, ensuring that The Board Always Tells The Truth, that team can jointly and individually use the triangulation rule to work out what to do next, and everyone ends up working on What The Team Needs Most. And the end result of that is a team that is continually aligned and with every chance of delivering effectively, with all the increased motivation to individuals and value to the organisation that such a result entails.

Why priority flags don’t work, and what to use instead

The ability to show the priority of a given piece of work is pretty important, I think all would agree. But what is the best way of recording and showing this information? Unfortunately the most common method, using a priority flag, is far from the best.

Most users of work-managing software will encounter plenty of priority flags, usually set by some kind of dropdown control. There’s no consistency between systems in terms of the vocabulary or number of choices that are offered; sometimes the choices are mere numbers, sometimes words, sometimes a mix. And even within a specific set of users of a single system it is rare to have universal agreement on what a ‘2’ or a ‘Minor’ actually means, or when each should be applied. One man’s Blocker is another man’s P4, it often turns out.

priority-dropdowns

And this vagueness often (always?) leads to a dangerous phenomenon – ‘priority inflation’, analogous to grade inflation, where far too many tickets end up in the the very top priority buckets, rendering your prioritisation system useless.

This phenomenon is easily explained by simple psychology. When you are entering work into a system, it’s usually because you want it to be done. Let’s say you have 5 different priority levels to choose from; you know that a prio 5 ain’t gonna get done before the Heat Death of the Universe, and probably a prio 4 is in the same boat. So really you’re now choosing between prios 1, 2 and 3. But you know that everybody else thinks that way too, so it’s probably safest to make your task at least a 2 just to be safe. Or if there’s the slightest bit of genuine pressure for that piece of work, in which case the temptation to put a 1 is almost overwhelming.

After all, there’s rarely anything in a priority flag system that limits the number of allowed P1s, such as a “one in, one out” rule. So why not chuck it in the top buckets, as everyone else is doing?

Priority buckets.png

And so priority flags lead to priority inflation. In the worst case virtually everything ends up as a P1 – or perhaps there are so many P1s that it’s utterly unclear what really needs to be done first. So the whole system has collapsed.

What is the alternative? It’s simple; instead of a priority flag, you maintain a priority-ordered list of work, with the most important thing at the top. When a new item comes in (or an existing item is reordered), other items have to move to make room for it. In other words, you have to make a trade-off decision – “sure, this thing can go up …but something else has to go down”.

priority-ordered-list

Trade-off decisions are hard. But prioritisation in a world of limited resource is genuinely difficult, and it’s better to face up to reality than to hide from it. Any worthwhile system for representing priority should force decision-makers to confront trade-offs, rather than allowing a pseudo-decision. In priority-ordered lists, there is no such thing as ‘two items that have the same priority’, so the trade-off decision has to be made. In a priority flag system, by contrast, you can have half your total backlog categorised as a P1, and nothing tells you that you have a big problem – until the customer finds that their deadline has passed and their P1 work never got done.

In the world of software development teams, you do often find priority-ordered lists. They exist at the strategic level, ie the product backlog and/or strategic roadmap. They also exist at the tactical level, the Agile Board, as long as each column on the board is only one-ticket wide. In such a setup, a ticket’s increasing priority is shown by moving it up (the yellow ticket), but the board demands that the necessary trade-off decision is faced up to, and displayed for the world to see – the red and green tickets have to go down.

priorities-on-a-board

So one of the advantages of representing your work on an Agile Board with one-ticket-wide columns is that you can show priority the good way – every column is a priority-ordered list. This will turn out, as we will see in a future article, to be a key ingredient in reading the entire board correctly, when we don’t just have to worry about one column on its own, but need to be able to prioritise columns of tickets against each another.

Agile board design example – columns part 2 (delivery)

agile-board-complete-workflowIn a previous article we outlined the steps for a hypothetical, generic software development team to get work from its starting state of ‘Idea’ to the key intermediate state, ‘Ready for Dev’ (R4D):

getting tickets ready delivering
Idea Triaging Ticket Prep Hand-shaking R4D bla… Done

This article will examine what steps might follow, all the way to the nirvana that is ‘Done’, which means that code is tested and working in production, in use by real customers of the business, and hopefully generating some value for the organisation.

The next step after R4D is, uncontroversially, ‘In Dev’. Is this just ‘writing the code’? In fact it is that, and more. For this step, as for all steps in the workflow, the team needs to explicitly agree, and then adhere to, their own criteria for what it means to have successfully completed the step. Typically for ‘In Dev’ this includes meeting team rules around coding standards, writing appropriate unit and integration tests, version control usage and so on.

Whatever specifics the team decides on this, the point at which a ticket goes into the ‘In Dev’ state is a critical point in its lifecycle. As a general rule, much more work, and work that is much more difficult and costly to undo, happens during and after this stage than what went before. This matters greatly when, as is inevitable, the prospect arises that the specifications or the priority of a given piece of work should perhaps be amended.

These decisions should be made on the basis of a cost-benefit analysis, and it is precisely at ‘In Dev’ that the cost element rises considerably, raising the bar on giving the go-ahead to such changes. This means that putting a ticket into ‘In Dev’ should never be done frivolously, as it represents a step-change in the team’s commitment of effort. Promoting the right kind of discipline around this is one of the key functions of the pre-R4D states on the board.

getting ready ready delivering
Idea bla… R4D In Dev ? ? ? ? ? Done

What happens after ‘In Dev’? Coding is so important and difficult to get right that it is usually worth rigorously double-checking, and we call that out in our workflow under a separate step, here named ‘Code Review’. How that reviewing happens is again for the team to decide. Sometimes just a quick look from one other person is sufficient, but at other times you might need multiple people to check multiple times before work passes muster. Tools such as Github’s Pull Request review system are often used to make this intra-dev communication easier, and a single PR can end up with dozens of comments from multiple people. It’s usually best practice to summarise the results of that conversation in a more widely-available place, where those unable or unwilling to use Github can see the outcome; the obvious place is the tool used to generate the Agile Board, such as Jira.

Some teams consider that the practice of pair-programming – where two programmers collaborate closely at the same screen/keyboard to write code together – counts as ‘Code Reviewing As You Go’, and therefore any code that results from pairing needs no further checking stage. But it’s a rare team that applies this practice to absolutely every piece of work (I’ve only known one to do so in nearly 20 years in software development), and so it’s a rare team that cannot benefit from a separate ‘Code Review’ step on their Agile Board.

getting ready ready delivering
Idea bla… R4D In Dev Code Review ? ? ? ? Done

When a ticket has passed Code Review, that means the Dev sub-team considers the ticket ‘code complete’; in other words, that their work is done. Sadly this is often ludicrously far from the truth, and people can be delusional about how much work still remains to be done, with multiple steps of getting code merged, tested, approved and deployed. This cartoon puts it well:

code-complete-is-not-done

Credit

In our hypothetical team, after Code Review it is the turn of the QA (Quality Assurance) aka Testing sub-team to do their stuff. Given that the QA team is often overstretched, and is hardly ever able to pick up every ticket the very moment it passes Code Review, we need a buffer state next called ‘Ready for QA’, the inbox for the QA team.

getting ready ready delivering
Idea bla… R4D In Dev Code Review Ready for QA ? ? ? Done

The QA sub-team picks from this ‘Ready for QA’ column, in priority order of course. QA work is often a mix of coding high-level acceptance tests, and manual testing. How much of each depends on what is being tested (front-end and mobile apps need more manual testing than pure server-side code with no UI), the skills of the testing team, and the technical maturity of the team overall.

For this example workflow we will include all of this under a single step, ‘In QA’, though this step can be broken out into separate sub-steps for Automated and Manual QA if required. If a team aspires to a BDD model, some or even all of the Automated Tests might have been written in parallel with, or even before, production code was written at the ‘In Dev’ stage. In my experience few teams manage anything like this BDD ideal in practice, but if a team does this or is determined to try, they would amend their workflow model accordingly, perhaps by having a step for ‘Automated Tests’ after R4D, before ‘In Dev’. But I will stick with a simpler workflow for now in this example, this single step called ‘In QA’.

getting ready ready delivering
Idea bla… R4D In Dev Code Review Ready for QA In QA ? ? Done

What does it mean to finish this step? As ever, each team must develop and follow its own rules, but a common approach is that finishing this step means: code has been validated against its specification by the QA sub-team, that relevant bugs have been raised (as separate-but-linked tickets and/or as part of the ticket being tested), that bugs above some agreed impact threshold have been recoded, retested and are now passing, and that the ticket as whole is therefore considered deployable to the Live environment by both of the technical sub-teams, QA and Dev.

At this point a quick discussion of environments is in order. In these articles I am keeping discussion of technical considerations to a minimum, as they are complex and polemical in their own right. Nevertheless it is pretty much universal that business value is only provided when code is accessible to real users in a Live (or ‘Production’) environment, and that all the pre-steps we have been discussing happen in various lower environments. Quite which environments and what names they go under varies widely, although it is (unsurprisingly) a common pattern that QA steps happen on an environment called ‘Test’, where code from multiple different developers has been merged. The very purpose of both the QA steps and the Test environment is to give us sufficient confidence to deploy the new code to Live without panicking that our users’ world will collapse as a consequence. The dire consequences of not having separate Test and Live environments is the subject of dark humour in the industry:

So, post-QA, are we ready to deploy to Live? In some teams, yes. In others, the Product function prefers a formal Sign-off stage. If everything prior to Sign-off has been performed correctly, this should be a formality, and (like any step) if it is found to generate little value then the team should consider a process simplification, removing that step both from what they do and how they model it on an Agile Board. But sometimes a Sign-off step is required, and it is not prima facie unreasonable for the person who requested the work (ideally that very individual, often a BA or Product Owner) to confirm whether they’ve got what they asked for, neatly completing a request-response loop and avoiding much potential embarrassment from the wrong thing going Live.

getting ready ready delivering
Idea bla… R4D In Dev Code Review Ready for QA In QA Sign-off ? Done

Note that calling out a formal Sign-off step does not have to mean a team is going down a stratified, siloed, somewhat Waterfall way of working. As we have stated in previous articles, a board is necessarily a simplified model of a workflow. One of the most obvious simplifications is that it represents steps in a single, linear flow, with a particular team sub-function usually having the major responsibility for each step. But in well-functioning, well-integrated teams, the reality may be far more fluid; pairs or small groups from different sub-functions can and often should work together at multiple stages of the flow. It is a sign of good teamwork when you hear comments such as “that ticket can go straight to Sign-off from Code Review – Dev X and QA Y have been pairing on it the whole time, so it has already been tested”, or a Product person says “that one can count as signed-off already; I went through it all at QA Z’s desk yesterday when she was testing the last part”. I will never complain when, without cheating, a ticket shoots through some stages of its life on its journey to Done.

Now, after Sign-off, the ticket is indeed Ready for Live. I have known server teams with completely automated continuous deployment pipelines where this step (and several others) are unnecessary; it is an automated script on a CI server that auto-deploys any code whose tests turn a dashboard green on a lower environment, so there’s no human intervention required, insufficient time for them to do so even if they wanted, and therefore no point representing such steps on an Agile Board. But equally there are many teams where the work and process required for a Live deployment is sufficient, and sufficiently separate, to justify a ‘Ready for Live’ column on the board. One we’ve added that, we have a complete workflow from ‘Idea’ to ‘Done’, and the resulting Agile Board looks like this:

agile-board-complete-workflow

Calling something ‘Done’ means there is nothing more for this team to do. Some teams (perhaps with a sub-optimal tech pipeline) in fact do need to do further explicit checks in the Live environment itself, and in this case they may need to add further columns to represent that work, but in this hypothetical team we will take ‘deploying to Live’ as the last thing that this team needs to do.

In fact from the organisational perspective it is rarely true that no more work at all applies to what you have just deployed. Someone should be checking on the take-up amongst real users of that lovely new feature you spent two months building – if it is widely used, perhaps it should be enhanced. If it’s not used at all, perhaps it should be redesigned or removed. This is the Lean Startup Build-Measure-Learn loop, where all of these workflow steps are just the ‘Build’ part.

build-measure-learn-cycle

But the timescales over which the Measure and Learn steps happen are often large, and usually involve relatively few people from the software development team itself, and so while that monitoring and thinking could (and probably should) be modelled on some Agile Board, it is usually not put onto the development team’s board – at least until the loop has completed and new work arrives in the guise of new tickets ideas all the way over on the left of the dev team’s board, and the cycle begins again:build-measure-learn-board-setup

So now our hypothetical dev team has a complete board, the next step is to make sure we use the board in the right way. And that means we need to know how to read it properly, which is the subject of the next article.