6/11/2015

How do I Deal With Major Mistakes?

Every professional, engineer or not, feels extremely distraught when they mess up. However, while some professions have the luxury of only having their failure visible at a limited level of exposure, most of the time software engineers that mess up tend to get a large amount of negative attention very quickly. The last time I screwed up majorly, a Vice President at my organization (a really big one) noticed. And he wasn't happy about it. My boss wasn't happy about it. My team wasn't happy about it. I REALLY wasn't happy about it. Really, there was just lots of unhappy going around, because it was an expensive blooper that cost a good bit of additional development and operational overhead to correct. While I can't put a literal dollar value on what it cost my company, I can estimate the number was probably somewhere in the $10K (or more) range, mostly as lost opportunity cost or developer hours spent fixing it. This wasn't my first screw up, but (as of now) it's certainly my most expensive one. And yet, when the proverbial "pile" started growing, I kept my cool. Getting worked up and angry at myself could have made an already bad situation much worse - so instead I just hunkered down and figured out how to fix it.

My biggest mistake in this situation was in trusting a deployment pipeline to give us the right answer about the deployment of the application we were working on.  A critical bug was discovered in production, and we had assumed we fixed all of the edge cases that caused the bug and deployed confidently on this assumption. It turns out there were more edge cases. Lots of them. And our deployment pipeline had no knowledge of them, so it happened again. After we confidently said it was fixed. Ouch.

So, several things had to happen after my major mishap. First, and foremost, though - I took ownership of it. Several people were either directly or indirectly involved in the writing of this code and the decision to deploy it, but at the end of the day I had to have the ultimate final say-so that we'd gotten the solution good enough to face our (internal) customers, and I signed off on the solution. So, I found my boss and told him exactly what happened, as honestly as I could, about how this code could get into production in a still broken state. In order to maintain respect within my team and organization, taking responsibility for the error was the best way to show that my professional integrity, though sullied, was far from failing completely.

Next, we worked as efficiently as we possibly could to get the issue fixed. Thankfully it was a job that ran once a month, so between runs we had a little bit of time to clean the egg off of my face.

After that, we went through the process that lead to the broken deployment.  We laid out all of the decision points that could have lead to the failure, and determined where I could make some changes in my own methods for reviewing software to help prevent errors like this in the future. What this basically amounted to was learning where the deployment pipeline wasn't reliable, and finding out how to deal with this in any further deployment - with a manual checklist.

At the end of it all, once I'd cleaned the egg off of my face (with my team's help) and grudgingly accepted the major ding in my quarterly goals review, the most important thing of all was to review the bad time as a learning experience.


Recently, I was asked if I was going to fire an employee who made a mistake that cost the company $600,000. No, I replied, I just spent $600,000 training him. Why would I want somebody to hire his experience? - Thomas John Watson Sr. - First CEO at IBM1

It's possible to learn from success - at least as far as knowing the things that you can repeat to keep being successful. But the kind of learning that you gain from a failure is akin to the kind of learning you do when you put your hand on the hot stove for the very first time. It stays seared into long-term memory, and helps to propel greater things. We all need to stand on the shoulders of our past mistakes, knowing that we're strong enough to make different mistakes in the future. At the end of the day, if you work somewhere that you respect, and that has respect for your talent - you'll get the opportunity to prove you can be trusted, and you'll also get the opportunity to prove that sometimes - mistakes are just really expensive training exercises.


  1. If anyone has a source where this quote was captured originally I'm happy to cite it here.  This is just a citation of someone else that re-used it. http://www.inc.com/murray-newlands/30-quotes-to-remember-when-recruiting-for-your-startup.html

JSON Jason