Ask any IT team what their worst nightmare is and they will likely recant tails of network outages with thumb-twiddling users, panicked management and the embarrassing discovery of a preventable cause.
Yes, we have all been there.
But as every day goes by, the reliance on IT systems grow; and with it a need to maintain the highest possible levels of availability.
Hyper resiliency, is the term which is developing in the industry.
But how do we achieve this?
Take a look at our top five tips.
1. Use Network and Device Monitoring Software
This might seem obvious and considering such solutions have been in existence for nearly two decades, one has to wonder why anyone would not use such solutions.
In any, case they should!
Network and device monitoring tools take the statistical health information from the devices around your network and give you insight, that you could not have extracted from the management consoles of the individual devices themselves.
Being able to see into the future has obvious benefits; and certainly with the slow development of problems in the network, such solutions are able to give better visibility of them.
Whether it be the switching environment, virtual devices, WiFi or applications themselves.
There are countless statistics which show that predictive monitoring results in fewer outages, particularly those which affect operations.
Do you use a network monitoring tool? Take a look at the IT administrators favourite, Ipswitch WhatsUp Gold.
2. Implement a Change Control Programme
Again, we are not setting the world on fire here. But how many of our viewers only makes changes to their environments which are agreed by a change board and during a change window?
We hazard to guess that it is very few.
Studies from Gartner show that up to 80% of all network outages are relatable to misconfiguration or changes which we made without prior scrutiny or approval.
Needless to say, it should be high on your list of priorities to implement.
But when things still go wrong and unauthorised or unexpected changes continue to pose a problem, consider using a file integrity monitoring solution (FIM).
FIM solutions can detect changes and even verify if they are expected or not. Often integrating with ITSM tools such as the popular ServiceNow.
Do you use a file integrity monitoring solution? Take a look at NNT Change Tracker, favoured by many of the fortune 500.
3. Use Automated Incident Response Solutions
It seems barely a conversation can by today, without mentioning AI or machine learning. But we actually believe that automation is the key to network and device availability.
With automation, you can react to developing events much faster than the human IT team member equivalents, with evasive actions.
Where Windows services stop, automation can start them again or bring online an alternative solution to compensate.
When a switch hits 100% utilisation, an automation solution can detect this and send a command to the switch or another network device to divert traffic to a more resilient location.
The speed at which this can be achieved limits any impact to a minimum.
Network monitoring solutions such as Ipswitch WhatsUp Gold now include incident response features such as those described above. If you would like more information, book a call with one of our solutions specialists.
4. Use the Cloud
There are many benefits to using cloud infrastructure and services, however in relation to this topic, the main benefit is the increase in availability.
Cloud providers invest in equipment and processes to ensure high levels of up-time, which they often advertise as 98% and above.
Especially for smaller and more frugal organisations, the extent to which cloud providers go-to to maintain such statistics is outside of their spending power or capability.
For this reason, use of the cloud for some of your more important systems is highly recommended.
5. Have a Rehearsed Plan
Despite being the recommendation with the least cost, it is also the one we find most often alludes people.
Have a plan for when things go wrong.
In a situation where there is loss of network connectivity or a critical application/server is unavailable. Deciding what best to do is wasting precious time.
An well-thought out, possibly rehearsed plan will return the organisation to operational health as quickly as possible; leaving the smallest impact.