Clients who want to be confident about a schedule to deliver an agreed set of functionality on time and within budget will benefit from the development team implementing standard deviation buffers.

I discovered Standard Deviation Buffers while reading Agile Estimating and Planning by Mike Cohn.

Often software development projects are delivered late – planning needs a strategy that directly tackles this problem. One common but very subjective way of handling this is to add _padding_ but this is often removed by the customer as they can't understand what they are paying for. This request for removal of ‘padding’ is justified because even padding is just another estimate - but an estimate of what exactly? Padding needs to be objective - based upon some measure of uncertainty and therefore should not be removed but rather continually re-assessed. I say _uncertainty_ rather than _accuracy_ because I think its important to recognise that we can't talk about the accuracy of an estimate until the work is done - but there is a way of pin pointing the uncertainty. So lets call it a Buffer and calculate it using Standard Deviation.

There are two general categories of estimators - those who always come up short and those who are always conservative. To beat this we use planning poker. When we play planning poker in the traditional manner the team often comes to a consensus on say an 8 for a story but they've been discussing how its probably a 5 but lets go with 8 because we're not entirely certain. To deal with this we play 2 cards each. The first is your best case estimate and the second your worst case estimate - this means that every player is giving us a measure of their uncertainty about the story. Now in a typical game you get card pairs that might range from a 3 and an 8 to a 5 and a 13. This range may come to a team consensus at something like 5 and 8.

This is useful because in the traditional game the single estimate is just as likely to be accurate as not therefore on average we would expect them to come out at 50% accurate - but actually they are normally between 50% and 90%. By being explicit about why the 2 estimates are needed - best case and worst case, we can suppose that best is 50% and worst is 90%. We then use some simple statistics to account for the fact that there will be some variance in the accuracy from story to story - standard deviation.

After the poker game we will have a list of stories and 2 estimates for each. Feed these two into excel in adjacent columns and set one cell below to calculate the standard deviation of the data sets. No need to understand the maths although it is simple and you can look this up elsewhere. Finish by doubling the standard deviation and adding it to the total of the best case estimates.

So for a list like this:

Story 1 best=3, worst=5
Story 2 best=8, worst=13
Story 3 best=3, worst=8
Story 4 best=2, worst=5

we get

total best=16
total worst=31

2 x stDev = 7

schedule duration including buffer = 16+7 = 23

There are more wonderful things you can do with this - for example - once stories 1 and 2 have completed - suppose they finished at 4 and 9 = 13 - both late - we would like to review the plan and see whether we're on schedule - we can recalculate the buffer which will now be 5 because the only remaining items are 3 and 4 - we've spent 13 days of the 23 which means we've got 10 left - add up the best case numbers for 3 and 4 and the new buffer and we get 10 - so we can confirm we're on schedule with the sprint even although the first 2 stories took longer than the best case estimates.

Sometimes when the schedule is recalculated mid-sprint the indication is that the buffer overlaps the sprint end date - when this happens I say the schedule is on Amber - if the buffer starts after the sprint end date then we're on red - in both cases we seek to address the colour status to bring it back to green (which I'm not getting into now - different subject).

So these buffers are great for coming up with a plan that _feels_ safe across the team without any dependencies on the team members, individual velocities, team velocities, skill set required or volume/nature of the stories. And as a bonus it leaves us with a useful means of checking progress during the sprint. Also, when it comes to measuring the success of a sprint it can all be done against the buffer - whether it was an accurate buffer or not - and if not what element contributed to that inaccuracy - no need to review all the estimates - just the ones that caused problems in the buffer.

Credit to Mike Cohn for introducing me to this in his book Agile Estimating and Planning.