Wednesday, 8 May 2013

Building our platform in a .NET 4.5 environment

Now that its mid 2013, what better time to move our platform to 2012!

Scale of effort

In one sense, this is a minor upgrade, the platform was already using the latest-n-greatest up until the advent of .NET 4.5 and VS 2012. A minor amount of work for the development team is expected in this upgrade. There are some differences, but they seem confident that it will be a largely trivial exercise.

From my perspective as the build and release manager, there has been a slight change to how MSBuild works in this updated framework, which for us has massive consequences.

Our entire build approach has until now, rested upon the use of solution configurations.

The issue we are facing stems from the fact that we have intentional circular dependency between our solutions. Our strategy has been to build up the platform in incremental steps, using the solution configurations to define these steps. The approach is analogous to bootstrapping.

Key to this approach was that the down-stream builds could access and reference the output assemblies generated by previous builds. And this is what the recent updates to MSBuild has deprecated.

MSBuild will no longer follow a project reference if the referenced project is not part of the current active configuration - even if its already been built successfully. In every sense, the referenced project simply doesn't exist, and the build fails.

So, where do we go from here?

The easy way out

All of the quick fixes I could think of were unpalatable.
The best I could come up with was either:
  1. Create compound build configurations. A bit like a decorator pattern.
    Whereby down-stream solutions would also include project's that were previously built by other configurations. Thus, the last configuration was likely to contain everything.
  2. Use file references instead of project references.
The developers weren't particularly pleased with either of these options, and I can't say I blame them.

The compound build configurations was a maintenance headache in waiting, and using file references would strip-away many layers of useful intelli-sense.

Project dependency

We are currently using a dependency discovery mechanism that generates a precise build plan to bootstrap the platform.  It seemed logical therefore to revisit this mechanism to work at a project level instead.

The new approach completely bypasses the solution configurations and works out a plan to bootstrap the platform project-by-project. When we bootstrapped using solution configurations we had about 500 build actions to perform, now we have just short of 900.

Contradictory perhaps, the revisions have resulted in a leaner and faster approach. Less code, less complication, and as a result, significantly quicker.

Dependency discovery

Let's be honest, you probably didn't read anything above, you just scrolled right down to this point, to see how we dug ourselves out of the MSBuild hole? 

We express the solution structure of out platform using a powershell hashtable.

$buildPlan = (
@{
    solutions = (
        @{
            name      = "DataStorage"
            namespace = "Platform.Databases"
        },
        @{
            name      = "CoreFramework"
        },
        @{
            namespace = "Platform.Server"
            name      = "Application1"
        },
        @{
            namespace = "Platform.Server"
            name      = "Application2"
        },
        @{
            namespace = "Platform.Client"
            name      = "Application1"
        }
     )
})

With this information, we can then visit each solution in turn, and build up a dynamic map of project dependencies.

So first we need to extract the projects contained with each visual studio solution file (.sln).
$solutionContent = Get-Content $solutionFile

$buildConfigurations += Get-Content $solutionFile | Select-String  "{([a-fA-F0-9]{8}-([a-fA-F0-9]{4}-){3}[a-fA-F0-9]{12})}\.(Release.*)\|Any CPU\.Build" | % {
        New-Object PSObject -Property @{
            Name = $_.matches[0].groups[3].value.replace("Release ","");
            Guid = $_.matches[0].groups[1].value
          }

    }  | Sort-Object Name,Guid -unique
This produces a list of GUIDS from the solutions "Release|Any CPU" configuration. From this, we can then extract the information about each project into an array, to be queried and iterated later.
$projectDefinitions = $solutionContent | 
      Select-String 'Project\(' |
        ForEach-Object {
          $projectParts = $_ -Split '[,=]' | ForEach-Object { $_.Trim('[ "{}]') };
          $configs = ($buildConfigurations | where  {$_.Guid -eq $projectParts[3]} | Select-Object Name)

          foreach ($config in $configs)
          {
              $santisiedConfig = if ([string]::IsNullOrEmpty($config.Name)){"Release"}else{$config.Name}
              if ($projectParts[1] -match "OurCompanyPrefix.")
              {
                  New-Object PSObject -Property @{
                    Name = $projectParts[1];
                    File = $projectParts[2];
                    Guid = $projectParts[3];
                    Config =  $santisiedConfig
                  }
              }
          }
    } 
With a list of projects extracted from the solution file, along with a relative path from the solution root, we can now inspect each visual studio project (.csproj) file to work out what it's dependencies are.

This requires a little preparation but it doesn't take much before we can extract the references out of the visual studio (.csproj) XML file.
$projectDefinition = [xml](Get-Content $csProjectFileName)
$ns = @{ e = "http://schemas.microsoft.com/developer/msbuild/2003" }
$references = @();

External dependencies (assembly references)

First up, find those external references to projects in other solutions. (DLL)
These are straight assembly references, but a dependency none the less.
$references += Select-Xml -Xml $projectDefinition -XPath "//e:Project/e:ItemGroup/e:Reference" -Namespace $ns | % {$_.Node} | where {$_.Include -match "OurCompanyPrefix" -and $_.HintPath -notmatch "packages"}  | % {$_.Include}

Internal dependencies (project references)

Now, we can look around the current solution and find project references. (Other .csproj)
$references += Select-Xml -Xml $projectDefinition -XPath "//e:Project/e:ItemGroup/e:ProjectReference" -Namespace $ns | % { $_.Node.Name }

Post-Build Events

This is more of an edge-case, but in our platform, sometimes, the developers have chosen to "push/copy" output to another project (or solution) after the build has completed, rather than using straight forward references (ask me not). So it's important to identify these post build events, and consider them as valid dependencies, otherwise we'll be MSBuild attempting to copy files to non-existent output folders.
$references += Select-Xml -Xml $projectDefinition -XPath "//e:Project/e:PropertyGroup/e:PostBuildEvent" -Namespace $ns | where {(!([String]::IsNullOrEmpty($_.Node.InnerText)))} | % {

            $postBuildEvents = $_.Node.InnerText.Split("`n")
            $projectsReferencedInPostBuildEvents = $postBuildEvents | Select-String "\(SolutionDir\)((\w|\.)*)" | % {$_.Matches[0].Groups[1].Value}
            if ($projectsReferencedInPostBuildEvents -ne $null)
            {
                Write-Output $projectsReferencedInPostBuildEvents | % { $matchedProject = $_; ($releaseConfiguation | ? {$_.File -match $matchedProject}).Name }  
            }

        }

Helpful information

Since we've got the XML project file open, we might as well take the time now to extract any other useful information:
$assemblyName = (Select-Xml -Xml $projectDefinition -XPath "//e:Project/e:PropertyGroup/e:AssemblyName" -Namespace $ns).Node.InnerText
$outputPath  = (Select-Xml -Xml $projectDefinition -XPath "//e:Project/e:PropertyGroup[contains(@Condition,'Release|')]/e:OutputPath" -Namespace $ns).Node.InnerText
So, we extract the compiled assembly name and the location MSBuild will drop its assemblies into.
This is helpful in future for NuGet packaging for instance.

One final action, let's just make sure we didn't pick up any duplicates.This wasn't necessary until the post-build events were also being used to push to projects also referenced.

I combine that action, with recording the current project's dependencies.

$buildAction.project.dependencies += $references | where {(!([string]::IsNullOrEmpty($_))) -and ($_ -match "OurCompanyPrefix\.(.*)")} | % { $_.ToLower()} | Select -unique

Dependency map

In order for this to make better sense, please make sure you have read my JIT compiler blog post, as its goes to great lengths to explain how I orchestrate powershell jobs to perform each build action in isolation.

The following script snippets demonstrate how I've modified the original JIT build engine to work per-project rather than per solution configuration.

The dependency discovery script examples (above) when applied over the entire solution generate a hash-table that comprises of every solution, every project and crucially, the dependencies of each. 

With this iteratable object, we can now build up a new per-project JIT build plan.
foreach ($buildAction in $discoveredBuildPlan)
{
   ... see snippets below
}
And, within this iteration, we can create a specific build action for the JIT engine to follow, complete with its dependencies.

First up, we examine the current project entry and work out what its dependencies are, which we convert into a string array of job-envelope references (not projects, but the actual Powershell jobs we'll be using later).

This a simple one liner, and appends the suffix of ".build" to each because this is the naming convention I chose for the JIT engine.
$dependencyList += $buildAction.project.dependencies | %{ "$_.build" } 

  1. coreframework.build
  2. platform.server.application1.build
  3. platform.server.application2.build

Then, we create a new compile action object to represent and encapsulate all the information we'll need to build the project later..

What's important in the section below is the name of this action, the operation and its dependencies.
$compileAction = New-JobEnvelope $buildAction.solution.namespace $buildAction.solution.solution $buildAction.project.name $buildAction.project.definition "build"  $buildPlan.environment $dependencyList $buildAction.project.outputs 
The function New-JobEnvelope takes a lot of inputs, but it's also responible for creating the unique key of this action, which is just the projects name and the action i.e. MyProject.Build

That's why when I'm building up the string array of dependencies I append .Build to the name, as I'm expressing what other jobs must be completed first, before this job can be processed.
$buildAction.project.dependencies | %{ "$_.build" }

Then lastly, we add it to the list of actions in our build plan.
$buildPlan.actions += $compileAction

The final build plan

Upgrading from build by "Solution Configuration" to "Per-Project" has almost double the number of jobs to be performed. We previously had about 470 jobs, which has since grown to 900. In spite of the extra complexity and volume of jobs to manage, the build times have only been affected by 20 seconds in total, which rather surprised me, I thought there would be a much greater penalty to pay for not using MSBuild to resolve dependencies.

We'll soon be upgrading to new XEON based build agents, upgrading from 12 to 24 simultaneous cores, so I am hoping to see a significant shortening of build times.

From previous examination, we now have ample opportunity to process our platform in parallel and I'm confident that we could keep all 24 cores very busy for the duration of the build. I am going to predict, that build times will almost halve, which should push us under the magic 5 minute barrier!