The more a system does, the more you rely on it

This week we’ve had a situation where a customer we work for has been unable to operate the way they expect for nearly two days because of a single hardware failure. It caused serious disruption to their business and while we we worked hard to provide as many workarounds as possible, it must have affected sales.

How can this happen? Well the customer has a Small Business Server that they installed roughly six years ago. At the time they had an AS400 that ran their business software and at that stage only wanted a server for file storage and email  for 20 odd users – SBS seemed the perfect answer. Since then they have replaced their AS400 with a SQL based Dynamics NAV system and taken advantage of the integrated possibilities to for instance, put 30 odd handheld terminals in their warehouse.

Trouble is with Small Business Server whenever a user logs on their password is checked with the SBS Server. With standard Windows networks we can share this process across several servers so that if one fails, one of the others can certify that the password is correct and allow access. SBS however, because its designed for single server networks, has been hobbled by Microsoft to stop this.

broken_computer

What this meant for our client was that although the other four servers they have now were up and going, they could not access them because their passwords could not be verified. Frustrating to say the least. Very quickly we got them access to Dynamics NAV via switching to database security but the warehouse terminals cannot work that way so they ended up back on paper. 

What caused this was a power supply failure in a five year old computer – it was on 4 hour response maintenance with IBM but after first sending a motherboard and then finally a power supply from their spares base in Hungary it took over 48 hours to get fixed. To be fair to IBM, all the big maintenance organisations work like this now, the small ones are even more useless as the simply cannot carry the variety of spares needed.

We could have rebuilt the network if we had realised how long the repair was going to take but in the heat of the situation you don’t know that and should you do several hours work when you hope you’re only a couple of hours away from the fix. This is an especially difficult call when you know that if you go down this route it will take more work and therefore disruption to change it all back again once the original problem is solved.

So the lesson relearned for me is that each business needs to be able survive a single point of failure, if having a application not being available is going to cost serious money. A risk assessment on at least an annual basis to say if this particular bit of kit goes down can we operate and what’s the worst case to get it back up and running.

There is maximum pressure on budgets at the moment so its tempting to spend on stuff that’s going to push the capabilities of your system forward. Just watch that if you consistently do that, something doesn’t unexpectedly happen which takes you backwards far to rapidly.

The good news is that Microsoft have, with the introduction of Windows Server 2008 R2, recently made the virtualisation technology we need to do this, very affordable. This tips the balance between risk and cost of insurance very much back in favour of not taking the risk. 

And our role as advisors in all of this? Well we had been talking to this client about upgrading their systems but obviously had not translated the technical risks into implications to their business well enough. It is difficult to get the balance right sometimes.

I hate the scare tactics used in the past by amongst others, the year 2000 consultants and security vendors but we have to make sure the risks are understood. First point on that I guess, if I’m honest, is making sure we stop and think the possibilities through for each client so that we truly understand what could happen whenever we change anything.

Technology can make help businesses achieve levels of performance they could not remotely dream of achieving with it – if that technology then fails it can also have very serious consequences.

2 thoughts on “The more a system does, the more you rely on it”

Leave a Reply