Monday 12 September 2011

Microsoft's Single Point of Failure

I ran the UK and North European business for a Service Provider called Genesys Conferencing for around 3 years. I had, as MD, ultimate responsibility for the service hub which was co-hosted in the UK with our largest reselling partner, Cable & Wireless.


One Saturday afternoon, when our bridges were normally pretty under used, I got a very irate phone call from C&W. Their conference service was down. It was Saturday - so what? By coincidence, their own telecom network had suffered an outage and the best way of mobilising their action team to fix it was to use our audio conference service. By coincidence, our service was also down.

In the aftermath, we looked at how we set things up. The fault could be traced to a  single router that had got itself in a loop and had caused an entire conference bridge with 800 ports to go down. A single router had caused a total denial in service. But because it was a Saturday, only one customer had been affected.

But when we analysed the problem we found that of the 800 ports, most of the them had been assigned to our two largest UK users - a bank was the other customer. By doing so we had made that specific bridge our largest and most profitable service bridge. We had also made it our single point of largest potential failure.

We learnt a big lesson - efficiency and usage is highly profitable until things go wrong. We learned that we needed to spread our risk and scatter the two customers across a wider number of bridges. Then we built in redundancy in the routers and the bridges.

We experienced outages again after that but we never, ever disrupted more than a few users in each company ever again, so decreasing the 'hit' to any single company.

Fast track to Microsoft's outage on Office 365 this last week and you can see that Microsoft, in deciding to exclude its channel from service delivery, has exposed its entire customer base to a single method of service delivery. Microsoft, even though it has experience in serving hosted applications, is not a business grade service provider by trade - it is a software vendor. Hosting free services is too simple as no one really cares if Hotmail goes down as a) it's free, b) we usually have multiple free email services and c) we can probably communicate some other way like Facebook or Twitter.

It will be lesson hard learned in terms of embarrassment and cost for Microsoft. But will they really learn? They have set out their strategy, it has backfired, now will they change? I doubt it.

No comments: