Iteration is a well understood concept in Agile/Lean software development, where a method is used to create a sequence of improving approximate solutions. Iterations produce increments of functionality that can be validated by customers over time ensuring we continue to take small steps in the right direction. But can the same ideas apply to the Kanban Method where each iteration produces an incremental (or minimum viable) change to the process using the Kanban Method itself as the model to recognise improvement opportunities.
“The Kanban Method is a change management method. It describes a process for driving change in an organization and that process has sufficient detail as to be repeatable.” – Anderson
Kanban Method starts with making work and its flow visible and moves through the following steps:
- Visualize – make the invisible work and its workflow visible
- Limit WIP – implement a virtual kanban system
- Manage flow
- Make management policies explicit
- Implement organizational feedback using quantitative measures of demand and capability
- Improve collaboratively using models and the scientific method to implement a “guided” approach to evolution
Each of these steps can be implemented in a number of ways but is never considered complete, there is always a way to improve visualisation, limiting WIP, managing flow etc. One way to explore options for the next minimum viable change is to look for specialised practices that sit underneath each of the generalised practices in the Kanban method. For example a team might chose to visualise wasted effort using an image of a waste basket on the board, showing stories that the team had to stop working on before they could be completed. Visualising waste is a specialised practice of visualising invisible work and workflow.
Specialised practices are more temporal than generalised practices, being more important at certain times when certain conditions exist. When work is being stopped in flight the team need to find the root cause of the problem and take action. Once data is visible and the system is changed then the specialised visualisation has less value, unless the adverse conditions return.
Specialised practices create a more granular roadmap of possible directions, representing smaller experiments which are safer to fail and easier to amplify. Each specialised practice helps achieve a generalised practice such as visualising work or limiting work in progress.
Generalised and specialised practices always remain coherent with principles of flow.
The example below shows how a team might work through some more specialised practices in a first pass through the Kanban method.
Green denotes where the team has implemented practices as part of the Kanban Method on the first pass through. Red highlights some future changes the team are considering. This does not represent all the practices available to the team, just the ones they are aware of at a point in time.
Although there is an implied initial sequence, the depth of a Kanban implementation comes from greater specialisation under the core practices. In this case the team has visualised work and workflow including blockers but consciously or unconsciously chosen not to visualise rework/waste at this point in time.
WIP refers to the total software inventory in the system at a point in time, in this case the team has constrained WIP using both personal and activity state limits. Limiting WIP can be achieved through constraints but also through other means such as reducing batch size. Batch size has two flavours, production batch size and transport batch size. Production batch size being the size of individual units of work in an activity state, transport batch size being the number of items that move between activity states at a time. Waterfall is defined as 100% transport batch size so it is important to optimise for both and not overlook transport batch size which can have the greatest economic effect.
The team plans to reduce transport and production batch size in the future but realised that they need to tackle transaction costs first as the cost of moving code between environments is too high to justify breaking up work any smaller at present. In this case, without lowering transaction costs first, reducing batch size would lead to a worse economic outcome.
Batch size and queues overlap in the visualisation to represent their close relationship, reducing batch size automatically reduces queues.
1 1 1
1 1 1 = 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1
1 1 1
One large unit of work = queue of small units of work
A large unit of work could equally be represented as a queue of constituent units of work if broken up. Large Transport batch size can also be thought of as a queue on a regular cadence e.g a regular pattern of 4 weeks development followed by 4 weeks testing creates a transport batch size problem as work builds up in the test ready queue on a regular 4 week cadence.
Attaching more granular or specialised practices to the Kanban Method creates a map of improvement opportunities which can be explored over time. Revisiting the six generalised practices of the Kanban Method on a regular basis leads to greater insights for improvement opportunities, which can then be undertaken collaboratively.
In effect we are continually iterating through the Kanban Method looking for improvement opportunities. Each iteration yields a minimum viable change. Sometimes we can see opportunities but need to address other areas before we can tackle them directly. Other times agents in the system change the system at the same time we do, leading to unexpected results which necessitates the ability to dampen failed experiments and move forward in another way.
Continually iterating through the Kanban method helps identify new opportunities for improvement by exploring increasingly more specialised practices. The structure forms a map of opportunities as it helps focus on different areas and asks different questions. Like Taiichi Ohno we are in an infinite loop, always trying small improvements against an evolving system, never satisfied with a fixed target.
My latest blog entry about the invisible principles of Lean is on the Assurity site here
Pattern Name: Release Steps
Bob’s web development team builds up a critical mass of functionality to be released every 6 months as the transaction cost of release is too high to release more frequently. No one is aware of the real cost of releasing every 6 months. Everyone is too busy developing software to take time out to improve the deployment and release process.
Summary: Reducing the build up of software inventory reduces waste and creates flow.
Context: You work in a team with release cycles measured in months or years.
Problem: You need to show that the cost of holding software inventory is greater than the investment required to continuously deliver small units of work.
Forces: You need to convince your organisation of the benefit of delivering little units of work more frequently. You need to uncover the reasons why people believe this is not currently possible and make small improvements that show benefits whilst moving towards continuous delivery.
Essence of the solution: Use data to show the true cost of holding software inventory, this cost gives an approximate budget and payback function for beginning to implement continuous delivery.
More about the solution:
Drop in throughput over time – Capture data from your current process that shows the time it takes for each unit of work to move from request to release. Cumulative flow charts can be used to visualise the cost of releasing large units of work infrequently by showing any drop in throughput/velocity when units of work are released. Below a line has been shown indicating the throughput with a zero transaction cost of release. It shows that 40 stories rather than 33 would have finished development in 100 days. Per year this is a loss of about 17.5%, if the team costs 1,000,000 to run per year then the cost of lost productivity is about 175,000.
If you do not have data to start with, the cost of release steps can be calculated by talking to team members to find the manual effort required for each release that could be automated. By multiplying the total hours spent per release by average hourly rate and number of releases we identify the internal cost to the team in a different way. For the above team we might find that 4 team members take 16 hours per release @100 per hour. Over a year this comes about 110,000.
Another cost is the loss in revenue caused by holding software inventory. On a 21 day release cycle we can assume half the software is completed by day 10.5, meaning we perpetually lose 10.5 days revenue on average. If we assume the team generates software that is more valuable than the cost of the team, then the the lost revenue is at least 1,000,000/365 * 10.5 which is approximately 29,000 per year.
With economic data the cost of the improvement can be weighed against the benefits. If the payback time to make the changes leading to continuous delivery can be shown to be short enough it becomes a simple economic decision.
Resulting context: Once you have the funding take small steps and measure progress. Find others in the organisation or community that have reduced transaction cost of release for a similar technology stack and ask for advice. Use good reference material for the implementation.
Patterns describe a problem for a given context and offer a solution based on experience that has been consistently successful. A pattern language of flow, if it existed, would attempt to identify problems with the flow of work in software development from request to release. The existence of problems could be confirmed with the use of empirical data and different kinds of visualisations. Solutions are used to inform better economic decisions and create the catalyst for change.
Patterns start life as strategies that help build the right thing, improve quality, increase throughput or reduce the time it takes from the request of a feature until its delivery. A single pattern may have many implementations, like waves on the ocean, no two are identical.
A pattern language is not just a collection of patterns, it describes how patterns are connected and highlights paths through the language that improve flow. The connections resemble a network showing sequence, forces and resulting context.
Christopher Alexander’s patterns created complete buildings with one pass through the network. In a complex adaptive system, cause and effect can only be determined in retrospect, so a single pass through the language to create a complete roadmap of change would not be possible. A pattern language of flow would be more useful if used iteratively, each pass through the language could be used to help recognise improvement opportunities at that point in time. The pattern language could also be used in reverse to understand the evolutionary potential of a team, product or organisation.
Many organisations that have adopted Agile/Lean practices still operate with large batch sizes in parts of their development pipeline and release cycles measured in months. Often reducing batch size and shortening release cycles needs considerable investment due to the technical debt inherent in many organisations deployment processes. Stakeholders in continuous improvement are often outside the software development teams who have little understanding of the cost vs. benefit of making the improvements. Outlined below are some arguments and issues that need to be overcome in order to make small batch sizes and short release cycles a possibility.
Even when organisations manage to have small development batch size they often have large release batch sizes as the transaction cost of release is too high to allow for more frequent releases. Continuous Delivery offers a comprehensive answer to this often complex problem. Continuous Delivery is built upon the practices of Continuous Integration and Configuration Management where the system is never more than a few minutes away from a working state and all environmental data is version controlled. An instance of a Deployment Pipeline is created on every commit which lives until compilation fails or a single automated unit, integration, smoke, deployment, acceptance or performance test fails. Suites of tests can be run on production like environments that can be built from scratch faster than they can be fixed from source controlled artifacts. Deployments are fully automated and the Deployment Pipeline chains the separate suits together to run in a sequence. Release Candidates are then available to be pulled into exploratory manual testing and then one click released into production by operations/devops.
For most organisations there will be a significant cost putting in the mechanisms to continuously deliver software in a reliable repeatable and automated way. Most organisations are unaware of the cost of manually deploying and releasing software and maintaining “works of art” environments. The payback period is almost certainly shorter than expected, often less than a year.
Another reason many organisations are unwilling to release small batch sizes is the belief that a critical mass of functionality is required to make a release viable when replacing an existing system. This argument is based on the premise that concurrent use of new and old functionality is not possible or more expensive than replacement in its entirety. When replacing existing applications there are other strategies apart from big bang cut overs that can be employed. Strangler applications and clever use of abstraction allow for the slow replacement of legacy system functionality until the legacy system can be switched off. Creating the ability to surface new applications inside legacy applications means small increments of business value can be released from the outset rather than building up a critical mass of functionality before the first release.
Many organisations do not differentiate between types of change. Some software changes are the result of business process change which usually requires training, other changes may be more intuitive to those who use the software. When software is changed in small increments over small iterations without an associated business process change, functionality can be discovered without impairing the customers ability to use the system. Managed change should concentrate on complex functionality and business process change. Business process change can then be decoupled from the software change and education can begin before the software is released. Differentiating between the types of change creates more flexibility in how change is managed allowing training material and documentation to be built with a high level view and detail added just in time to match the development process.
Lack of constancy of purpose
“Create constancy of purpose toward continual improvement of product and service, with a plan to become competitive and to stay in business.” – Edwards W. Deming
Introducing systemic change to lower transaction costs of release is not going to be a quick fix, it requires constancy of purpose, and dedication to improvement. Working in projects with short term goals does not provide a sufficiently long term view to make such changes, projects have a finite end whereas continuous improvement does not. To undertake these types of initiatives requires allocating resources to provide for long range needs, rather than only short term profitability.
Managers within organisations also need to act as leaders, spending time on innovation, research and education and helping promote and enact ideas that solve problems of this nature. Plugging leaks is not improvement, long release cycles need to be recognized as a root cause of poor quality which requires action. You cannot save money if you are more worried about money than you are about quality.
Finance, accounting and planning
Batch funding leads to batch thinking where scope is relatively fixed at the onset and success becomes a function of adherence to cost and schedule with some minor scope change. This model ignores the fact that software is a knowledge creating business and we simply do not know what to build up front. It also does not consider whether the software delivered under budget is valuable compared to its cost.
One shot funding ends up with either a bag full of money or nothing. Because it is hard to get funding approved, you ask for as much as you can get and justify it with elaborate and detailed plans containing lots of expensive guesswork. Once a project has the money there is little motivation to spend it on organisational imperatives to reduce batch size and lower cycle times.
Managing and delivering when batch size is small and scope is dynamic scares people. The problem here is mostly visibility, which is why smaller units of work need to be traceable to higher level goals and work needs to be structured in a way that represents stakeholder needs rather than process focused technical delivery. When working with small batch sizes and small releases, progress can be measured in smaller cycles which in turn creates more frequent decision points.
Piecewise funding is a model used by venture capital backed startups. If used more by large organisations it would support smaller batch sizes and cycle times whilst measuring success against customer focused criteria rather than conformance to plan.
The graph below from Dean Leffingwell’s book “Agile Software Requirements” shows the relationship between market value of a feature and return on investment for shorter release cycles.
We do not get any return or save any cost until the software is in production. This is compounded by the fact that technological innovation in the market place decreases the value of potentially shippable software quickly over time, the shorter the depreciation cycle the higher the cost of delaying a release. Another compounding effect is that each release generates a return, the more often software is released the more cumulative the effects of these returns.
Knowledge now exists in the industry to make low transaction costs of release and small batch size a real possibility for any organisation, allowing organisations to show more tangible economic benefits from Agile and Lean adoption or to put in place fundamental building blocks for the successful adoption of Agile/Lean practices.
Queueing theory started in 1920 when a statistician names Agner K. Erlang started worrying about the performance of telephone switches. Later Industrial engineering spawned the Cumulative Flow Diagram (CFD) as a way of visualising queues.
Many software teams do not measure queues on the critical path so miss out on valuable insights that could help them reduce the time it takes to bring a customer request to production.
Kanban teams use both the CFD and Statistical Process Control (SPC) chart to support empirical continuous improvement. But what about using a CFD for Scrum?
Below is a CFD for a mature Scrum team who work in two week sprints. The vertical axis shows stories, the horizontal axis show the dates the stories passed through each activity state. The chart is cumulative as each story is added to the total number of stories in each activity state as they flow through the pipeline.
The team has completed one release and is currently in its second release of a product. Stories flow through the following states:
Created – when a story is first created with at least the basic format
Analysis – planning cycle preparing for the next sprint
Task defined – when tasks are created and hours estimated
In-progress – when the first task is started for a story
Completed – when the story is finished development and testing
Accepted – when the story has been accepted by the product owner
Released – stories are in production and generating revenue or saving money
The data behind the graph shows the total number of stories that have passed through each state on a given day:
The first thing to note is the regular sprint pattern that can be observed in the chart. Every two weeks (the teams sprint length) we can see tasks being defined in the sprint planning meeting causing the orange step at the start of each sprint. In a successful sprint the Task Defined, In-Progress, Completed and Accepted lines converge to a point showing nearly all planned work being completed. Both sprint 2 and 4 end with nearly all planned work completed. This is a good result as a Scrum team that finishes all its stories every sprint may be adding contingency into their sprint commitment as there will always be natural variation in the way we work that makes it difficult to finish exactly as planned. This natural variation due to imperfect information at the start of each sprint suggests that a sprint should end with the least number of concurrent stories in flight as possible based on the context of the team.
We can see in sprint 2 the team started all of its planned work near the beginning of the sprint so may have suffered from context switching. Starting all the work early usually ends up with many unfinished stories at the end of the sprint so there may be another explanation as we can see the planned work was nearly all finished. The team may have testers that begin work writing automated acceptance tests or preparing test data early in the sprint meaning a higher number of stories are started concurrently. In this case the CFD guides the team to potential areas for improvement but deeper insights come from an intimate knowledge of how the team works.
Apart from the first sprint this team manages to get completed work accepted by the product owner quickly, this is a sign that the team and product owner work closely in creating stories with clear acceptance criteria and that the product owner and team communicate frequently.
In sprint 3 the team finished their planned work and needed to define more tasks halfway through the sprint. This work was not started for some reason until sprint 4.
We can see how the BA’s ride the bow wave, doing their analysis in the current sprint for the next sprint as shown by the area in blue.
The purple curve at the top indicates the rate of story creation during the release, it does not tell us how much detail is being added when the stories are created. This release has approximately 56 stories (82 – 26 = 56). The table below shows the percentage of stories added at stages of the release:
|% of release complete||Stories created||% of total stories|
The purple curve on the CFD is showing the innovation rate for the product during the release. This team adds work in a healthy way for a stable team working in a known domain and with known technology as we do not see large uneven steps. Lots of large uneven steps would typically indicate uneven bursts of work and surprises for the team. Here we see 30% of the products design emerges over the the last 80% of the release, this shows the team is able to respond to new information which leads to better design. Occasional large steps in the innovation rate are not in themselves unhealthy, we know from information theory that events that are less probable contain more information.
The CFD prompts us to asks questions, diving into more detail based on an understanding of the context of the team helps us to optimise flow.
Note how the line at the top of the previously released area in green does not meet the accepted line at the start of the first sprint in the new release. Some of the completed and accepted work did not make the release and will now have to sit as software inventory until the next release. It does not sit in Accepted for free, nor does it generate revenue or save cost for the organisation until it is released. A rough cost to the organisation can be calculated using the area between the Accepted and Released curves.
The team takes on around 6 stories per sprint based on the intersection of the straight line indicating velocity and the sprint boundary lines, the cumulative total at sprint end can be deducted from the cumulative total at the start of the sprint. This velocity metric is based on stories not story points, which are an estimated unit of measurement. The consistent number of stories per sprint and constant velocity indicate the stories are likely to be around the same size and are reasonably small in effort provided the team size is also small. The team work at a sustainable pace with a stable velocity but the velocity is not improving significantly over time. We would need to check that stories remain relatively similar in size over time before drawing this conclusion with confidence.
All of the observations made can be compared to what could be considered ideal flow. If ideal flow is defined by low Work In Progress (WIP), small batch size, short Lead and Cycle times and increasing Throughput then we would see:
Low WIP – vertical distance between lines converging to a stable minimun
Batch size reduction or increased velocity/throughput – slope of the velocity/throughput line increasing over time
Reduction in lead and cycle times – horizontal distance between lines reducing over time
Large steps and flat horizontal lines indicate impediments to flow or lack of flow.
CFD’s have been around long before the advent of Kanban and Scrum, they are methodology agnostic and can generate insights for any software development team provided you have sufficiently granular data. CFD’s tell us about flow through the system and where queue’s are building up and hurting the team. They provide valuable insights into patterns and trends a team experience over time and provide the basis for empirical continuous improvement. Empirical data helps teams identify and isolate practices that are not working and gives teams the confidence to try new ideas, measure their success or failure and continue learning.
As a software development team we want to be able to let our customers know what will be delivered and by when, so they can predict how to generate revenue and cost savings for the organisation. Traditionally teams commit to delivering a number of features in a certain time period but with Kanban we make a detailed commitment on a feature by feature basis. In order for the business to be comfortable with this approach, we need to be able to accurately predict how long it will take a unit of work from request to release (Lead time). We can also measure other parts of the pipeline such as the time from when we start on a unit of work until it is ready for release (Cycle time). By focusing on reducing Lead and Cycle times we can deliver more valuable software over a given period of time i.e. increase throughput.
On a motorway if we want to get more people from point A to point B, several options are available, add capacity to the road system, get the cars to drive closer together, carry more people per car or get everyone to drive faster. Adding capacity by building a wider road does not fix other problems such as poor driving or road design, inappropriate vehicles, lack of feedback to drivers, uneven demand at rush hour and long queues around other bottlenecks in the route. Getting the cars to drive closer together, carry more people per car or get everyone to drive faster all have safe limits which once exceeded impede flow.
Adding more cars to an already congested route only makes things worse. In fact it makes things worse at an increasing rate as we can see when a traffic jam forms on the motorway and vehicles grind to a halt.
The same ideas are true for software development yet many teams push more work into the development pipeline in the vain hope that they will go faster.
Little’s law gives the following relationship.
Cycle Time = Average Work In Progress (WIP) / Average Throughput
Where WIP is the amount of work started but not yet complete and Cycle Time is the average time in the system for a unit of work once started. Throughput is the rate at which units of work are processed. Little’s law holds for the entire development pipeline or any part of it.
Everything in the development pipeline depreciates over time, more WIP means everything is in the pipeline longer and therefore the higher the induced cost. Requirements go stale, for example user stories are considered to have a half life of about 30 days as the system they relates to changes. The more detail in a story that sits waiting for development the greater the cost of updating it over time. Code that sits for a long period of time before testing may mean developers have to reconfigure machines to fix defects or change context to understand that area of the code again.
When you start a piece of work there is a mental set up time, this is especially true with software development where it can take time to build up enough context about the problem to become productive in finding a solution. Switching between areas of work means a loss of productivity, like a computer thrashing between processors without actually processing anything.
Carrying too much WIP in software development is like putting more cars on a congested motorway and hoping they get to their destinations faster.
So how do we improve Lead and Cycle Times (see Reinertson)?
Allow for late decisions in software development
Many valuable design changes occur through knowledge created during the development of a product. By designing everything up front we stifle innovation and do not arrive at the optimal solution. This is like empowering drivers to change route based on road conditions versus mandating a route for drivers before they start their journey and making them stick to it no matter what.
Work sitting in queues causes delays
By reducing queue size we can reduce Lead and Cycle Time. Continuous deployment is an example of a technical practice that reduces the build up of completed work, reducing the queue at the end of the pipeline which reduces Lead and Cycle Time. Queues often appear near bottlenecks in a traffic system such as the Auckland harbour bridge, where the constrained capacity of the bottleneck is prohibitively expensive to increase. The harbour bridge uses a “tidal flow” system, where the traffic direction of two of the centre lanes changes daily to provide an additional lane for peak period traffic and helps to improve flow. This shows us that adding more capacity is not the only answer to reducing queues around bottlenecks.
Reducing variability leads to more predictable teams
High levels of variability mean it is more difficult to make promises to customers that can be kept. More variability also leads to more work in the prioritisation process for teams and customers as work becomes stuck. Reducing blocked time per unit of work reduces variability and get work flowing again. In a traffic system we use traffic lights to reduce variability time spent waiting at intersections. When traffic flow gets to a certain level mechanisms like roundabouts become less effective as one side may be blocked for a long period of time which creates long queues. Traffic lights give everyone a turn when traffic flow is too heavy for a roundabout.
Reducing batch size
Reducing batch size, the size of the units of work, leads to a reduction in Cycle Time. A motorway full of motorcycles is less likely to be congested than a motorway full of huge trucks as motorcycles are more manoeuvrable. If we overload the vehicles we are also more likely to suffer breakdowns and accidents which will reduce flow. The internet is another system that shows us the benefit of reducing batch size. By sending messages in small packets, part of the message arrives at the destination whilst other parts are still being sent. It would be much slower if the whole message was sent at once from node to node.
Apply WIP constraints
Apply WIP constraints to limit the amount of software allowed in the system. Little’s law shows us that working on too many items at once negatively effects Cycle Time. In London the congestion zone was effective, although not popular, in reducing the number of vehicles in the inner city traffic system, effectively reducing WIP and improving flow.
Accelerate feedback to ensure teams build the right thing by learning from customers and responding fast to market conditions. Shorter feedback loops mean we build more of the right thing and less overall. This is like giving drivers navigation systems that tell them where the problems are so they can be avoided.
Lead and Cycle Times are such powerful metrics as they provide a holistic measure from the customers perspective. Improvements in either do not tend to have negative effects in other unexpected areas, they can also be used across teams as any team can measure them.
As Taichi Ohno said of the Toyota Production System “All we are doing is looking at the time line, from the moment the customer gives us an order to the point when we collect the cash. And we are reducing the time line by reducing the non-value adding wastes.”
In other words if you are reducing Lead and Cycle Times you are heading in the right direction. With knowledge of Kanban, queues and flow, Lead and Cycle time information becomes even more powerful.