Alexander Pope wrote about our inherent capability to make mistakes and conversely about the virtue of overlooking failure in his poem titled An Essay on Criticism in 1711. In the poem just before these words, Alexander wrote “Good-Nature and Good-Sense must ever join” which says to me that both error and forgiveness must always coexist.
But what happens if they can’t?
Some errors can cause irreputable damage that no amount of forgiveness can repair. Worries about those types of errors are the kind of worry that keeps people awake in the late hours of the evening.
Computers make errors too.
You may not have used the words ‘to err is human’, yet few of us would dispute that making mistakes are an inherently human trait. And while some of us might have talked about how computers are infallible, I think we also realize that computers are programmed by us humans; so in reality, it’s not that computers can’t make mistakes, they just make mistakes a lot faster.
For example, modern networks are basically built with special-purpose computing machines. These special-purpose devices are called routers, firewalls, etc. And since network infrastructure is everywhere, networking devices are everywhere too. Inside them are the same basic parts that are inside your phone, tablet, or laptop that you’re reading this article with. And like any other computer, those devices are programmed by humans.
But unlike your phone, the software that runs on these devices is far more complex to set up and control. And ‘to err is human’ so as you might expect, errors in setting up those devices happen all the time. And because they are computers, those errors can cause damages that eclipse a lifetime of error on a human scale.
Take, for example, this major Internet outage that occurred late last year.
Basically, what happened was that a human made a configuration change in a router that caused the router to start to tell all the other routers on the Internet to disconnect from a portion of the network.
The router was using a special message that is used to help prevent DDoS attacks, but the human set it up incorrectly. This type of human error isn’t unusual. In fact, in recent surveys network professionals acknowledge that anywhere between 75% – 97% of network outages arise from human error! Those folks must all work for some very virtuous bosses!
When Mr. Pope was writing about human error, he probably wasn’t thinking about the Internet. In my interpretation of his prose, he’s actually taking to task the literary critics of the time. Later in his writing, he adds “No Pardon vile Obscenity should find”. Which to me means that there are certain errors that we shouldn’t overlook. As prevalent and human errors are in modern networks, why do they still occur with such frequency?
Because humans designed how networks operate.
As an example, I’m going to pick on one of my not so favorite networking protocols. It’s known as BGP or Border Gateway Protocol. BGP is used to share route information between routers on different networks. BGP has become the de facto way for two different networks to share information about how to route data. Setting up BGP is complex.
No, it’s not just complex, it’s a nightmare.
It is so difficult that the ‘How-To’ configure BGP guide on a leading vendor’s website is over 27,000 words in length! BGP configuration errors happen multiple times a day. There’s even a website to help track when we get it wrong! Check out https://bgpstream.com/.
These issues have been going on so long it’s as if we have just given up. Was it Albert Einstein that said something about doing the same thing over and over and expecting a different result? Insanity!
Maybe we rethink how we network.
My dad is a stickler for the proper use of language. One of his favorite word plays was to catch people using the term ‘worry’ when they should be using the word ‘concern’.
Worry is what you do when you’re afraid something bad is going to happen and there’s nothing you can do about it. Worry is a useless emotion; it serves no purpose. Concern on the other hand is when you know something bad could happen, but you know you can do something about it.
We should be concerned about human error causing outages in our networks. These outages can be catastrophic, but we can do things to help eliminate them.
Here are few ideas you might consider to help address these issues:
- Use automation instead of manual configuration. This will help eliminate typos and allow your teams to easily repeat what they know works.
- Use network slices to set up sandboxes for your developers. Let them experiment in a real-world network environment that is safely isolated from your production network.
- Stop using those complex networking devices. Instead of having to go through setting up hundreds of configurations for things that your teams don’t even need (does anyone still run AppleTalk?), why not use a software micro-function? You need routing, not an entire router.
To err is human and forgiveness is admirable. However, maybe we should only forgive the same errors about…six times? Seven times seems too much. There are easily seven BGP configuration errors made every week. Let’s stop worrying about those errors and instead, have enough concern to take action and stop them from happening.