Two weeks ago I had a meeting at a nearby city. I found a parking spot on the street, set payment for ‘on-street parking services’ using my favorite parking app, and headed off to the meeting.
50 minutes later, the app decided that I moved the car, and terminated my payment and my parking permit with it. It even sent me a message explaining how much money I saved with this feature. Unfortunately, I was still in the meeting and couldn’t see it.
Only 4 minutes later, it turned out that my car didn’t really move — as a parking enforcement officer saw it parked on the street and gave me a parking ticket.
Imagine my surprise when I got back to the car. On my end, I did everything right — followed the instructions on the parking signs, paid using the official app… but still — due to no fault of mine — I got a ticket.
Honestly, I wasn’t only surprised and disappointed, I was angry. I realized I would need to either pay the ticket or go through the hustle of protesting it — and I liked neither option. It was a mess that I didn’t create, but I still had to bear the consequences.
I was about to shame the app on Facebook when I decided to give them a last chance to fix it. I contacted their customer service through chat, and within 10 minutes the issue was resolved.
Guess what? At this point, I was no longer angry! Instead, I appreciated the quick resolution and the overall good customer service (they actually apologized for the inconvenience they have created multiple times during the conversation — not a trivial experience with customer service in Israel). My overall satisfaction with them was flipped — I went from being annoyed to being relieved. I forgave them for having the problem since they fixed it so quickly.
The Impact of Bugs on the Customer Experience
Now let’s look at it from a product management perspective: what I ran into was a bug in the algorithm responsible for detecting car movement. Bugs happen. I’m sure they will add it to their backlog for fixing, assuming it happens frequently enough.
In many companies, when customers are complaining about such issues, that’s exactly what happens: the customer service representative thanks them for the feedback, logs the issue and sends it to R&D or product for a later fix.
But looking at it from the customer’s point of view, I don’t really care about fixing the bug later. Whether it’s in the backlog or not — my experience now is the same. And the problem caused by the bug impacts my life right now. Sometimes, as in my parking case above, it would cost me time and/or money to fix it myself. In other cases, there is nothing I can do to fix the issue, and the impact will remain until the bug is fixed.
If you care about customer experience, you should take this into consideration and prepare in advance to make amends for future bugs’ impact.
In some cases, a customer service process is enough, but in most cases, 5-star customer service would include a resolution process at the product level as well.
This is especially true when machine learning is involved. Machine learning algorithm bugs take a long time to resolve. It is often not even a matter of priority or resources assigned to it — which are things you as a product leader can control, it’s simply how it is — training and retraining the algorithm take time. And until the algorithm is updated, the issue is there, together with its impact.
In these cases, customer service resolution has to be powered by a product-level resolution process. There has to be some way for either customer service directly or R&D to tweak the algorithm results so that the impact is fixed now, even if the bug takes years to fix later (as in Google’s example below).
One way which is especially relevant in classification algorithms where the data is not too diverse — for example when the input is a small amount of text like a title or search query — is to have a way to override the algorithm output with a manually entered output.
So if your search engine understands “yellow jacket” as the bee, but you are actually a clothing shop and want to show real jackets which people can wear, in yellow color, you can prepare in advance means to tell the search engine — “I know you think it is a bee, but for this one — trust me, it’s a jacket of yellow color”.
This solution gives you a good way out but has limited scalability. More scalable solutions might give you a non-optimal resolution, but would still allow you to eliminate the damage from the bug.
Take for example Google photos’ infamous Gorilla incident from 2015. Their algorithm was obviously wrong. Their immediate solution was to remove the labels “gorilla”, “chimp”, “chimpanzee” and “monkey” from the algorithm. It is not optimal — because it doesn’t give you the correct answer for these labels, it simply ignores them. But it is still much better than the original impact of the bug.
Imagine Google didn’t have the means to do so prepared in advance (which I am guessing they did have). In this case, when the bug was revealed, they might have tried to fix the algorithm quickly. In this specific bug, despite having the most brilliant engineers and data scientists working on it, and despite having practically unlimited amounts of data to work with, it still takes years to resolve.
Even if they did realize right away that fixing the algorithm is not feasible and they need a manual fix, if they didn’t prepare for it in advance they would need to start developing some kind of an override mechanism — all under the immense pressure of the company’s image being slaughtered in the news. Not a happy experience for developers, product and company management.
So What Can You Learn From This?
First, think about potential bugs impact on the customer and make sure you fix the impact ASAP — the customer usually doesn’t care that it’s a bug. They do care about how it impacts them.
Think about the customer service resolution, and see if a product-level resolution is also needed.
Second, prepare in advance the means to fix such issues, so that in real-time your people can activate the mechanism rather than developing it from scratch under pressure.
Pay special attention to it if your product includes machine learning algorithms, since resolving bugs there can take much longer than in other areas.
If it happens to Google, it can surely happen to you. You better be ready.