Thursday, 23 August 2012

Distributed testing

Cost effective scaling


As part of Project A, we purchased some fairly hefty liquid-cooled I7 workstations to plough through some very lengthy build and test cycles. 

Both of these workstations are performing fantastically as you might expect for a (4.3Ghz) I7. Each machine nearly halved the length of time it takes to complete the build and test cycle phases of the pipeline. 

However, not even the combined super-awesomeness of these two liquid-cooled monsters can handle the amount of work we're now throwing their way. As we slowly increase the size of the development team, both build frequency and queue lengths are slowly increasing. 

I want to tackle this niggle before it becomes a burning issue, which won't be too long given how quickly we're expanding. 

The easy answer is to get another liquid-cooled I7 and add it to our existing build farm, but before I approach the IT guy and ask him for "another" monster, I thought I'd explore and idea I've been mulling over for a good while.

Surplus power

In most organisations, developer workstations are typically above average specification. 

What I'm wondering is, can I use Powershell and its background jobs to take advantage of all that spare capacity sitting under each developers desk? When you think about it, all those I7 workstations, dotted throughout the office, just sitting there, idling away, is a colossal reservoir of power.

Powershell is the key to unlocking this unparalleled processing power.

The potential is enormous and the cost savings equally so.
  • Rapid build and deployment cycles
    The quicker a build completes, the quicker the feedback for the developer. 
  • Tools and licenses
    Developer workstations already have installed and licensed all the software a build agent would ever need. And, in some cases, bypassing the need to get a *special* license for build servers!
  • Infinitely scalable
    If the build pool is derived from the number of developer workstations, then as each new developer joins the team, there will be another powerful workstation joining the pool! All the tools installed, licensed and ready to go.

Whats my plan?

My deployment scripts are already capable of farming-out and load-balancing the deployment packages. So, I want to reuse this mechanism for pushing out parts of the build cycle to each developer machine, taking a small slice of their redundant power for the greater good.

There are several stages in the current build pipeline:
  1. Clean
  2. Build
  3. Unit tests
  4. Code coverage
  5. Quality analysis
  6. Static analysis
  7. Databases
  8. Packaging
As a proof of concept, I'm going to take the 3rd step first, and see how the performance improves by farming out the unit tests.

Two reasons for this... firstly, the unit tests have no inter-dependency and secondly they are very easy to orchestrate, being nothing more than a linear list of jobs to process. The clean and build cycles are far more complex as they have to navigate the minefield of build dependencies.

How will it work?

Right now, I only have a rough idea in my head of how this might work, and its largely based upon how the deployment mechanism works.

My intention will be to clean and build the workspace on the build agents first. Then, each test container will be packaged up and farmed out to the available pool of developer machines. 

There are 120 test containers at the moment and the build agents take just under 2 minutes to run all the tests in the current parallel execution model. I'm hoping that by pushing out 5 test containers to each available developer workstation that I'll be able to reduce the testing cycle by a factor roughly equal to the number of machines in the pool.

I've currently got about 12 workstations at my disposal, so I'm hoping to reduce unit tests from 2 minutes to 20 seconds! High hopes, and we shall see about that :) 

More specifically, the compiled output and test containers will remain on the build agent, and the Powershell jobs that are remotely executing on the developer workstations will make UNC connections to the build agent. I think (although I will compare and contrast) that this will be quicker than first copying the files to the developer workstation and pushing the results back over. 

There will be issues of multi-hop security, so CredSSP policies will need to be enabled on the developer workstations, and I'll have to include the network guy to make sure we're not leaving ourselves wide open to abuse. 

This should be fun! 

How will I know if this has been a success? 

I 've been keep some quite detailed statistics of the build and deployment refinements over the past year. 


The illustration above demonstrates how each evolution of the build and deployment pipeline has contributed to an overall reduction in the cost of creating and deploying packages. 

So, the plan will be to farm out the Unit Tests to the developer workstations, and then compare the results against the current times.

I'd rather not speculate that something was *just* better, I'd like it to actually be better, and I'd seriously want to know by how much its better. I have high hopes for this distributed build project, but I'll wait for the results before I get too excited.

I shall post regularly relating to this project, hope you enjoy.