Disclaimer: The views are opinions of the writer and might not be 100% effective to everyone. I have written them from my experience while working as Support and Solutions Engineer.
In the ocean of issues and support tickets, it is not always easy to maintain focus and think straight. Every now and then, there will come scenarios where we need to jump into an issue which we have no clue about. Each individual has his/her own approach on how to tackle a case. But there are always rules of thumb and ways to go about that.
In this article, I have tried to present my experience as a solutions engineer for more than two and half years. There is no silver bullet in solving support cases, customer issues and I must admit I am far from perfect, but these ideas might help someone who strive to be better and improve in their craft.
This article does not contain much technical stuffs. It leans more in the philosophical aspect of support but I feel it is important to be written about. So, if you are looking into something technical please look somewhere else.
Understand the issue first
Oh, the cliche understand the issue. This article is also like all others on troubleshooting. But, I cannot emphasize enough how important it is to understand the issue. Very frequently, I find myself not understanding the issue up front and later in debugging realizing that.
Any machine or problem domain we are looking into might have several issues, so before jumping into a case it is very important to understand what is the issue. I find following approach helpful to do so.
- Read the customer’s complaint email or ticket in as much detail as possible.
- Do not hesitate to ask customer to elaborate the case if initial request is not very helpful. Ask for screenshots or any other information if that helps. You might even have to sit down in a call and ask the customer demonstrate the issue.
- Understand what is the major problem caused by the issue. What are the stakes and how the business is affected by the issue? This is important to understand as this sets the priority of the case and you can allocate your time accordingly.
Locating where the problem resides
Once the problem is understood, now the investigation starts. The approach for investigation will vary as per the problem.
Usually, we will have to start with whatever information we have from the beginning and trying to figure out what goes under the hood.
For example: If a customer cannot make search in your platform. Then start from the UI side, see how requests are being made and what the responses are. And then proceed further to webserver logs, system search engine services and the service logs.
Tips
- Usually, I find it very helpful to follow the request/data and see the response of any service that is under question. Doing this will help to isolate any problematic service.
- Follow binary search approach. If you know what are the services that are involved in the issue then, you can follow binary search and see how data is flowing in the service. This helps to narrow down the problem quickly.
- Look into the service logs, benchmarker logs, system metrics, network data, tcpdumps, I/O metrics everything you can think of. This will help you view the flow of data and notice if anything is fishy.
Understand the root cause of the problem
Hopefully, you located where the issue is or at least circled the problematic area. At least you might have found any error logs/stack traces in the service or some other hints than just the problem.
I would suggest to restrain yourself from restarting the problematic service or regenerating config as much as practicable. If possible try looking into the problem without restart. Collect and observe as much data as possible and only restart if you have collected enough data. After the restart, the problem might be solved however it gets really difficult to figure out what the cause of the issue was.
Understanding the root cause is one of the hardest part of the investigation. It is not straight forward and sometimes it might take long time to understand what actually caused the issue. I must admit I have had this problem often and had to ask for help to someone or since the issue was already solved did not pursue for the root cause. But it is always good to dive deep, understand what caused the problem and make sure this does not happen in the future.
Here are some tips on how to figure out the root cause of the problem.
Tips
- Read official manual for the product. This also helps a lot if you have no idea where to start with.
- Collect as much relevant data as possible from the investigation.
- Look into the log or the error and try to understand that message. Search in the online forums if the error is originated from any open source code. If the error is observed in a service, custom designed for the application then, we might have to dive into the code itself.
- Try searching into the company’s knowledge base. Usually, this is always a great resource and the issue might not be new one. There is always a good chance that someone already encountered the issue and solved it. If that is the case, lucky you.
- Try replicating the case in a local environment. This will always be helpful to understand the root cause and even file a bug report if necessary.
- Try discussing the behavior with the development team or system team.
Communication with the stake holders
Communication is key. We’ve heard this so many times. Why, because it is key.
That did not help, did it? Well, think as a customer.
- You bought a software service with a hefty price. Well it was running for sometime but now the search does not work.
- You asked for support. Someone said, we are looking into it and will contact you soon. But its been 2 days and no one has followed up.
There is no way for the customer to know you are looking into the case or investigating into it, without you telling them. So, it is really important. Very difficult cases can be really pleasant to work if you can communicate the progress with the stake holders. If they know what you are looking into they can also offer their own insights and that can be really helpful for you solving the case.
Also, make sure you communicate with your managers if you think something’s blown big or you need help or any thing you think might be important. After all, they are their for your help.
Tips
- Give regular updates to the customer. If possible give them some ETA or when you might be investigating into it deeper.
- I usually liked to give a summary of the case every now and then because at number of times stake holders might not be in the same page and this really helps everyone.
- Usually being the person looking into the problem, we are the only people who knows what’s going on and everyone is kind of looking up to you for output.
Support Engineer’s block
Very often we get stuck and we do not know where to move ahead. Since we are looking into a problem, sometimes there is no straight way to figure out. But the key is not giving up and keep looking for.
Usually, I find it really helpful is to talk with colleague with all the details I can mention. This allows your mind to let go all the things you are holding into and seek if there is anywhere or anything more you can look into. Obviously, your friend can give you more ideas and ways to investigate.
Here are some tips.
- Take a break. Usually looking into a case for hours will make your job more difficult.
- Talk to someone about the case. Even he/she will learn about the case and it will also be a time for you to summarize the case in your mind.
- Do not hesitate to ask for help. Talk with other experienced people, developers and managers. There is always a way out.
- Tell the customer you are having hard time figuring things out, sometimes they might make your job easier with suggestions and advices. Ask them for one more session, talk with other stakeholders.
Final Words
There is no straight way of debugging and troubleshooting. The basic idea is going for your usual diagnostic steps. Then, focus in the problem, try out things, look into the knowledge base and discuss with people.
Key is not to give up, sometimes it feels like there is no way out but just come back into it after a break. There is always a way out.
No issue, no problem is immortal. Good luck.