Saturday, 8 September 2012

Orphan Annie

It seems I owe AppFabric an apology of sorts.

In a recent post about AppFabric, I gave it a good slating for just been plain rubbish, but it seems it wasn't entirely AppFabrics fault! In fact, it wasn't AppFabrics fault at all.

I would still maintain that it has some fairly opaque nuances, but on this occasion my general AppFabric ignorance and an aversion to reading documentation may have contributed greatly to my mid-week psychosis.

To recap

I was having a very bad day, trying to automate the installation, configuration and tear-down of an AppFabric cluster.
The principal aggravation was not being able to register/un-register Cache Host instances. It was insistent that I could not do this as there was already one running. Even though I thought I'd stopped it, and de-registered it from the cluster.
Not understanding just how AppFabric works was the cause of my trouble.

What I learnt about AppFabric?

I now understand just how AppFabric works, and it's actually very very simple. So simple, and my failure was to grasp how simple it really was.
I had assumed that the cluster was some kind of super-controlling service, that reached out across the network and exerted total control over its cache-hosts. 
Therefore, I assumed...
Stopping the cluster, would stop the hosts... and, yes, it does.
Removing the cluster, would remove the hosts.... no, it does not.
In actual fact, the cache hosts appear to be completely self organising. 
The cluster appears to be nothing more than a text file (in our XML provider model), from which the cache hosts can learn their configuration. Hence why AppFabric needs to use a UNC path for the cache hosts.
There are some powershell CmdLet's that promote the idea that the cluster is an entity in its own right. Create-CacheCluster, Remove-CacheCluster and Start/Stop-CacheCluster. However, in reality, all these cmdlets actually do, is create, delete and iterate over the XML file on the UNC path. Perhaps these Cluster CmdLet's encapsulate the knowledge for correctly configuring this text file, but thats it.
There is no intelligent cluster host, no cluster service or cluster controller. Just a shared text file, that enables the disperate cache hosts to form a team.

Orphaned hosts!

The conceptual, but non-existent cluster is what gave rise to my troubles. 
I thought I'd removed the cluster and all its dependents. In actual fact, I'd only removed the shared text file. The hosts were still running under the configuration they'd previously accepted.
And this is why I was having so much trouble with my scripts.
I had hosts that were still running, but I had no record of this because I'd removed the cache-cluster configuration. Or, as it should be referred to, the shared config file. 
I couldn't deregister the hosts, because they had no entry in the new shared config file.
I couldn't create a new host, or register it, because the original host was still running and blocking this process.

So to recap, my mistake was:

  • I didn't realise the cluster was just a shared text configuration file.
  • I removed the configuration file which effectively orphaned the cache hosts.
  • When I recreated a new config file
    • The orphaned cache hosts couldn't be de-registered
    • The orphaned cache hosts were blocking any attempts to add new cache hosts and register then with the shared configuration file.

The steps I am now taking are quite simple:

  • Remove all caches from the cluster (makes the next steps quicker)
  • Stop the cache cluster
  • Visit each cluster host using a PSSession and 
    • de-register the host from the cluster
    • stop the cache host
    • remove the cache host
  • Kill the now empty cluster
  • Recreate the cluster
  • Add and register a cache host on the cluster host machine
  • Add all the caches
  • Visit each host using a PSSession and
    • Add the cache host
    • Register the cache host with the cluster
    • Add the cache admin feature

In summary

  • AppFabric was working as it was designed to do so. 
  • A series of stand-alone cache instances, that can arrange themselves into a team (cluster) by using a shared configuration file.
  • There is no entity that actually is a cluster. 
  • As our cousins across the water my say, My Bad. 
AppFabric, Microsoft. Sorry.