Wednesday, 31 October 2012

Hyperbuilds

As an extension, spin-off, and plain old "Can I do this?", I extended the work I'd done recently on the Distributed Tests project to see if the same principals would yield any benefit to the compilation of our platform.

The short answers is, ohhh-yeah!

Bit of background first

Our platform consists of 40+ visual studio solutions with 80 or so build configurations. These build configurations have an exact build order that must be strictly adhered too. 

This was always going to be the fun part, working out a parallel execution plan and then distributing it!

The remotely executed jobs go looking for the solutions over a UNC path which leads them right back to my computer. In production, this could be a NAS drive or any low-powered server, it just needs to be good at working with files.

Hows it gone?

So far its all been very good. The parallel execution plan we'd already solved some time ago, so the key problems to solve have been:
  • Getting MSBuild to work over a UNC path
  • Setting up the other servers to play nicely
The first set of results weren't all too amazing, in fact I'd say disappointing,  the platform built but no quicker than on a single build agent. This however was hardly a fair or scientific test, the build agents were in active service and my host computer was busy performing a platform deployment. So, perhaps given the level of ambient activity, these results were in fact very promising, but its too early to tell.

Whats next?

The are some as-of-yet unresolved issues with two of the solutions that can't be built over UNC paths. We're using some component such as code contracts which don't seem to like the UNC pathways.

It may well be that we can never use remote builds, but even so, I think it would be worth pushing this approach to its limits and see where it eventually breaks.

I'll do another post later in the week/month to complete the story, and journal the steps more precisely with some script samples. 

Until then...

Wednesday, 17 October 2012

Continuous Delivery with BDD

As we progress through our process maturity plan for 2012, we're finally reaching the good stuff! All the drudge has been automated away, we can finally start to focus on the big-ticket wins!

Implementing BDD is one of the highlights on our Process Maturity Model (PMM); Moving to BDD will give us better:
  • Self documented code
  • Tighter functional specifications
  • Better integration from the testing team
  • Automated acceptance testing
My colleague Stephen Newman rightly deserves the credit for implementing BDD with SpecFlow and WatIn. 

My post however, is going to focus on the very simple act of automating acceptance tests artefacts of the BDD process. 

Continuous Delivery

The great thing about BDD is it weaves itself seamlessly into the development process, that the acceptance tests are just artefacts of the process.

So with the testers and developers using BDD for feature development, I now have access to a large volume of acceptance tests which I can run post deployment to UAT. Leaving the testers free to concentrate on complex scenarios.

Should any of these tests fail then the build is automatically rejected, just as it is for Unit Tests and Static Analysis.

Anyhow, no waffle on this post, here's the scripts:

Powershell

Test-Platform.ps1

Distributed testing simulation

Developing the distributed framework

In the midst of the chaos that is the last week before a release, there's not a right lot for a build and release manager to do! You'd be forgiven for thinking otherwise, but I'd fear for my life if I interrupted the testers during this busy phase, so I find myself more or less locked out of my environments until after we've gone live, and the dust has settled.

The Skills Matrix is a no go since the team are busy bug hunting.
The FXCop implementation is postponed likewise.

Time then, to return to the distributed testing project.

Today, I finished the framework and rejoiced at the site of my temp folder flooding with test results. So far this framework has only been run against my own PC, but I effectively treated as a remote host, using a CredSSP authenticated session.  I'm very confident that little or no modifications will be needed to bring in the rest of the hosts.

Change of host

I'd spoken to our developers a lot about this project (a few read this blog), and there's not much love for the idea of sharing their cores. Whilst some have argued against sharing their CPU on the grounds of it interrupting their own local builds, I suspect ever-so-slightly that its got more to do with an unscheduled interruption during a lunch-time session of StarCraft or WoW at a crucial moment.  

Anyhow, all I want to do is keep the devs happy, so I scrapped the idea of using their PCs and instead turned my focus towards our two build agents.

As I'd previously mentioned, our two build agents are liquid cooled monsters, but they do sit idle when there's no build going on. Even then, it's only really the unit-tests that can be orchestrated to near-saturate the CPU. The build and packaging stages seem to peak at about 70% CPU utilisation, then some other bottleneck manifests. So, more often than not, the build agents have plenty of spare capacity.

So, my new plan, which is the same as the old plan, is to develop a distributed testing framework. The only difference is, I'll be load balancing over the build agents, instead of the developers own workstations.

The structure

Today, I created four scripts effectively.
  1. The client
  2. The invoker
  3. The receiver
  4. The command
Hmm, sounds familiar doesn't it. But, don't get too excited or hung up about it, it's similar to the command pattern, but it is not the command pattern, nor am I pretending it to be.

DistributedTests.ps1
The Client? perhaps

Discovers all the test containers from our solutions. At present, it finds over 50, which is a nice number to work with. 

It then performs a random allocation of test containers to available hosts. I chose a random allocation to ensure that some of the heavier and lengthier tests don't go to the same host over and over. 

Once the tests have been allocated to hosts, they're packaged up into n PSJobs, where n is the number of hosts I have at my disposal. This is effectively creating packaged commands, that will be given to another process to invoke. One package per remote host.

Each PSJob takes the package it's given, and executes the RemoteAgentController.ps1 script.

RemoteAgentController.ps1
The Invoker? possibly

This script is designed to run on the local host. It has the test package from DistributedTests which reflects the work that will be done.

This script is designed to be the point-of-contact for the remote tests once their running, so there is one instance of this script per host, each instance runs as a PS background Job. 

The remote host session is created and then given RemoteAgentReceiver.ps1 to process.
The test package is also passed up to the RemoteAgentReceiver session. 

RemoteAgentReceiver.ps1
The Receiver? whatever gave you that impression?

This script runs on the remote test agent, within a CredSSP authenticated session.

It's purpose is to finally do some work. It churns through all the tests that in the test package defined by DistributedTests.

Each test container will be performed locally, and the results packaged up and returned to the RemoteAgentController.

Using the same tried and tested technique as the current build process, it breaks out as many PSJobs as it can, and uses each one to process a single test container. Collecting the results as it goes.

TestRunner.ps1The Command? you may very well think that

This script is that is performed by each PSJob created by RemoteAgentReceiver.
This script just encapsulates the invocation of MSTest and NCover. It reads the output files and packages up the results. 

Yes, finally, someone is doing some work! 

The results are passed back up the call chain.

Results handling

The results object is created at the business end of the chain, in TestRunner. After this, its a matter of packaging and aggregation before the final results are echoed out to the host by DistributedTests.

The RemoteAgentReceiver packages up all its test results to return back to the RemoteAgentController.

The RemoteAgentController will aggregate all results sets from all hosts, then return back to the DistributedTests caller.

DistributedTests then echo's the results to the screen, and looks for any warning or failure counts in the results, and affects the build accordingly.

In short, it does this... 

Topology

Acceptance and Load Testing

In real world practice this is likely to reduce the test running times by half, which isn't a bad saving, but we're only talking a 2 minute gain. This isn't really worth the effort that I've gone too so far, but, there is a much bigger pay-off just around the corner.

Acceptance tests and load tests are traditionally performed in a distributed manner where possible. All too often, they're run from a single host due the difficulties in orchestrating a distributed test. This framework, can be used to conduct distributed acceptance and load tests. In this light, the potential time savings are simply enormous.

Developer workstations can be used overnight to perform browser based acceptance tests from a nightly build. One machine using Chrome, another firefox, IE and so-on. The same applies to load testing.

A speed boost for local builds

Local builds still take around 10 minutes to perform, and the unit tests phase takes about 4-5 minutes. If the unit tests can be farmed out to an idle build agent or two, we can expect a reasonable reduction in build times by 2-3 minutes.

The Scripts

DistributedTests.ps1

RemoteTestInvoker.ps1

RemoteTestReceiver.ps1

TestRunner.ps1


Helpers/QueueHelper.ps1

Helpers/TextHelper.ps1


Ah yes, the reason you're here and I've delayed you long enough!

Simulation results

The results were a little surprising, I wasn't expecting diminishing returns to bite at such a low level of concurrency, but at 6 concurrent agents is where we achieved the shortest possible time.  After this time, the results started to take longer and longer again.

Having looked at my specific test data, I should have expected this result. In the real world situation, we'll have over 110 test containers, each of varying complexity, intensity and duration. I'll perform the same method again and compare the real world results against my simulation. If there is a similar pattern of diminishing returns, then at six concurrent agents, we can expect to gain 50%. Perhaps :)

An over-riding goal of mine is to push the total build times under 10 minutes. If I can leverage this distribution model to other aspects of the build pipeline, static analysis being a prime candidate, I might just achieve it! 

Wednesday, 10 October 2012

Placeholder

Just filling a conspicuous hole in my blog where a post should be.

I've almost finished the distributed test simulation work, it's taking me longer to write the blog post than it did to create the framework!

The results are indeed promising. A 50% reduction has been achieved.