Points! Points! Points!

BETA v01.00.53 - This is not official scrum guidance, it's all personal opinion drawn from my own experiences. 

Story. Points.

Story points... Just seem to be one of those things teams get hung-up over. Most teams seem to have anxieties about pointing. 

The kinds of questions typically discussed are...
  • Do points represent time, effort or complexity, everything?
  • Can we really point it a 1? 
  • If we have to test it, and deploy it, it can't be a 1, it must be a 5 at least?
  • Why can't it be a 4? 
  • Why do we have to use Fibonacci? 
  • Why can't it be a 5 and a 2? 
  • Should we point Dev and QA actions separately? 
  • Should we point UX? 
  • Can we point writing user stories? 
  • Can we point writing ACs?
  • Can we point technical tasks? 
  • I can do that in 3 points, but others will need 8
  • How do we work out what we can bring into sprint if we can't point everything?
  • We said it was a 5, but it really was a 13, shall we add +8 pts now we're done?
  • On and on...
These are all questions I have previously explored with the teams.

However, after a recent epiphany regarding points, I came to realise there is a simple strategy that takes care of these kinds of pointing anxiety.

Keep it simple

Pointing has always seemed a little hit-and-miss, and team retrospective actions felt pendulous.   

It wasn't until I worked with one particular team that I started to the question the practice of' pointing all together. These chaps had taken their pointing strategy to places I'd never imagined, so complex and deadly accurate that it needed an XL spreadsheet to enable it.

If it ain't broke, don't fix it? 

On this occasion, whilst team was delivering points with metronomic precision, there was no sight of a working product, and it was my number one probation KPO to change this. It was on this journey that I had my epiphany (for the want of a better word), their troubles all stemmed from how they were using story points.

I'd always pointed effort regardless of what it was, if the team were working on it then we pointed it. This team were doing the exact same thing, but their XL ingenuity had suppressed or appropriated-away all warning signals.

What I learn.... and this is such a radical departure from my past, that I feel nervous even suggesting it, but... 
Story points, it turns out, are only for user stories. 
With this new insight it became apparent how this team wasn't delivering anything, and looking further back, how my previous teams struggled with fluctuations in delivery.

What are we doing here?

If the primary measure of a scrum team is to deliver working software, that meets needs of its customers... Then this is productive work, and anything else should be proactively avoided.
  • Technical-debt is like hitting a credit card, we're borrowing time with a short-cut.
    It should be used strategically and sparingly, and always repaid before the interest sky-rockets.
  • Defects should absolutely be exceptions. No excuses. 
  • Framework upgrades, new tooling, CI initiatives, these solve our problems, not the customers
Any team focused on introspective actions such as defects, technical-debt or refactoring is undoubtedly busy, but they're not the least bit productive.  Only delivering user-stories should be considered productive work, introspective tasks solve our problems, not the customers, and as such we should pay the price for them in terms; lost productivity (reduced velocity), deferred delivery of stories, is the price we pay to make the structural improvements that will deliver long-term competitive gains. They're investments in us, effectively. 

We get rewarded when we meet the customers needs

I am no longer recommending to my teams that we point everything. I would argue (through posts like this) that we only point actions that deliver features our customers actually asked for, aka user-stories.

Pointing all effort was the underlying cause of many issues I'd faced in every Scrum team. 
  • It projects an artificially high impression of the teams productivity
    The measure we then use to forecast only against user-story delivery!
  • It rewards inefficiencies, waste and non-productive work
  • The pressure of under-delivering lowers morale and self-esteem
And how does it do all this? Well, I'll get on to that a bit later in the post... 

How do we forecast and plan sprints when only stories are pointed?

If you've got a backlog of relatively sized user-stories and 'unpointed' tasks then no-matter how good/bad your teams practices are, your velocity will squarely reflect how much value you can and do deliver. In this state, your velocity is a reliable forecasting tool based on your current level of productivity.

And if you want to deliver more, then you need to improve your velocity, and this is where the retrospectives are vital to a scrum team, to identify waste, inefficiencies, bad-practices, and remove them, one at a time, systematically.

Use your velocity to observe if your process improvement initiatives have actually improved your productivity... How? Well, if they have, you will have spent more time delivering real value, and more value is more stories, and more stories is more story points.

In short, your velocity goes up, because you've done more, in the same amount of time. 

Simples.

Story points are for user stories, not the other stuff you shouldn't be doing

So, back to the main point of this post... Stop pointing everything..

Pointing any activity that is technical in nature, and not an expression of a customers' needs (User Story) can insidiously result in systemic inflation, an inexplicable cycle of delivering more and more points, but simultaneous delivering less and less actual value

I'm talking utter nonsense? right? I've attended a few seminars over the years, and on a couple of occasions, the concepts surrounding relative sizing has been discussed. The theory goes, that humans are fantastic at relative comparisons, with the caveat being that this is on the same scale.
  • Comparing the size of two buildings.. Easy. 
  • Comparing a group of different dogs... Easy.
  • Comparing horses to dogs.. Easy. Oh, er, hang-on. Great Dane or Shetland Pony...  Er...
  • Comparing buildings to marine mammals? 
The key take-away here is that relative sizing works when the items under comparison, are on the "same scale". 
  • The shard is taller than the London eye.
  • The London eye is taller than big Ben
  • A blue-whale is longer than a London Bus? 
    Er, hang-on, I need to investigate. 
And so, comparing stories within a backlog, that affect changes to our product, that meet the needs of the same customers, is relative sizing along the same scale.
  • This looks like a 5, because from what I can tell, it seems more or less the same thing we did on Story 'xxxxx' which was a 5.
  • This looks like a 1, it's literally a superficial one line change, and as changes go on our system, it couldn't be simpler. It's a 1.
  • This looks like a 13, it's going to impact on every part of the system, we'll need to involve almost everyone at some point, and it will require a lot of testing, and regression testing.. it's very much like Story 'xxxxx', and that was a 13. 
Relative sizing is just that, "this is less/same/bigger" than that.

Points and velocity are not a KPO

Have you ever been told that your teams velocity is your KPO? Are people on the margins of the team, constantly asking about your teams velocity, passing judgement and making comparisons to other teams?
  • Points are not estimates. 
  • Points are not a commitment.
  • Points do not have a 1:1 relationship with time, energy, money.
  • They are not called Complexity points
  • They are not called Risk points
  • They are not called Time points
  • Points, only have a relationship with other points, along the same scale of comparison.
  • Velocity can only be compared with recent sprints by the same team.
Points are artefacts from the exercise of relative sizing; a fast and acceptably imprecise exercise intended to give a team enough confidence to move forward and deliver value through software in short(er) release cycles.

That's what matters, first and foremost, delivering functionality the customer wants to use, and every other activity the team performs must justify it's existence based on how well it helps us to achieve this aim.
Let working software be your most valued metric.

Relative sizing consistency will improve and mature, as your team matures. As you collectively understand each other, your customers needs, your codebase, your infrastructure, your weaknesses, your insurmountable corporate frictions. Over a short period of time your ability to relative size with consistency will become fast and effortless. Which means, you get to spend even more time delivering value, and even less time attempting to find certainty in workshops and planning sessions.
But fear not, I promise you, you will get it wrong, and more so in the early stages of adoption, because as they say the devil is in the detail, but on the whole, you'll get it right more often than you get it wrong - so don't sweat it, incorporate your improved understanding in your next sizing exercise and mature with time. You'll save more time and deliver more targeted value to your customer, over its life-cycle through scrum with relative-sizing, accepting the occasional stumble, than you ever would water-falling the whole project.

That's all it is, that's how it works. A brief conversation to examine the outline of the feature, get a sense of what we're looking to achieve, the scale of ambition, and then give it a point, relative to work recently delivered.

Relative sizing is fast, and precise enough to plan and forecast.

The idea behind Scrum is to get product delivered, by favouring incremental delivery of product over nth degree precision and planning, as all too often, the cost of correcting a misunderstanding with Scrum is far less than the upfront costs of waterfall, and when a mistake get's into the Waterfall apparatus, the cost of correction can be immense. 

Comparing along different scales... 

Scrum isn't new any more, it's been around for 20 years or so, and in that time it has been the subject of many studies in order to test its efficacy and make it better.
Relative sizing takes advantage of the fact we're good at making relative comparisons with little effort, but we're surprisingly bad at estimating irrespective of how much effort we exert.
Let's get back to relative sizing and one study in particular that underlines my practice of only pointing user stories.

A user story, is an expression of a particular desired outcome from the customers perspective. It's conceptual, it's vague, it expresses an outcome. We often enrich the user story with acceptance criteria, but these are essentially checklists ensuring we don't miss anything really important. We then size these desired outcomes, against outcomes we delivered recently, with other user stories. In the comparison of user stories, we're comparing outcomes relatively, that's the scope of our comparison.
When we consider TD, Defects, Bugs, Refactoring etc. We're not considering something conceptual like an outcome of desired functionality, we're considering something that already exists, something we've already built, something we know inside out, and the fact we've got to fix something. We know it's composition, its code, how we test it, we know it extremely well, it fills our mind with detail, and any comparison or consideration we make takes all of this into account.

This is where the 30% inflation comes in, our minds automatically incorporate all of the complexity we already perceive, we don't consider outcome, we consider the inner workings and steps. Thus, when sizing, non-story items are attributed with points up to 30% higher than other items in the backlog.

Thus, your backlog and sprint backlog will contain lots of items, that aren't actually relatively sized to the same scale. In fact, they're not relatively sized at all.

Your sprint backlog more so affected, as it will always contain a varying mixture of story/non-story items, and this means your velocity will fluctuate and become a bit unpredictable, which undermines its value as a forecast tool.

Your team only has ONE velocity

I've never heard of a team having more than one velocity, but it's often the first solution proffered when I put my argument forward. "Well, we can have more than one velocity"... And how will we do this? Well, let's break out the XL spreadsheets again, let's calculate our different velocities, forecasts based on the infinite combination of task types and sprint backlog compositions. - Great fun! And what a way to spend your time. Let's introduce the concept of "degrees of confidence", personal velocities.

Or, perhaps, let's just keep it simple.

Defects, TD, Refactoring, these are things we should be avoiding are they not? They're not productive, they are waste steps, and as such, they should hurt our headline velocity? Pointing them, IMHO does more harm than good:-
  • It's obfuscates the teams true productivity
  • It's cloaks waste and inefficiency under the guise of value-bearing productive tasks
  • It promotes inflation due to the different scales of comparison
  • It delivers a false illusion of achievement and delivery
  • Forecasts, projected against the product road-map, become optimistic incorporating the higher value bearing non-story points
  • It necessitates additional processes, manipulations and obscures the real velocity
  • It makes it very difficult for the team to apply retrospective improvements and measure the outcomes

Over promising, under delivering

In a previous post, I talk about the issues of velocity becoming divorced from reality, and in many respects this is the same mechanism at play. The inflated sizes of non-stories in effect boosted the teams velocity artificially, which was use to forecast. Subsequently those forecasts become overly optimistic and ultimately lead to disappointment and tension when the team doesn't hit it's expectations.

Not hitting our anticipated goals is definitely going to be a hot topic in the teams retrospectives, but their visibility of their issues will be masked because of the manufactured velocity. Any decisions made here, are much more likely to be flawed from the outset. How can the team identify an issue and measure the success of any mitigation if their benchmark is invalid? 

So when I've seen this happen previously, the team was just getting into a panic, the retrospectives became more about blame. 
  • Individual velocities were introduced and used to "incentivise" developers into going the extra mile (unpaid over time).
  • Sprints became labelled as "Commitments" that the team must deliver. 
    Once again, by going the extra mile (unpaid over time)
  • It was assumed that developers weren't over-pointing, so pointing was taken out of their hands.
  • Scrum was getting blamed continuously for being inefficient
  • Spreadsheets were out in force to provide manufactured confidence

Further enhancements to post:
  • De-emphasizes effort as being the point of comparison, focus on the "outline" of something.
    Is it bigger, smaller or the same, without breaking it into tasks.
  • Sizing irons out the concept of individual contributions