Wednesday, 11 October 2017

Infrastructure day - part one

Today I had a zero coding day, it was all about infrastructure problems.
It all started with some internal users experiencing slowness when using one service endpoint that was released recently.
The operations in the new endpoint was supposed to take something around 3 seconds as it involves payloads of up to 1MB and is very CPU intensive. 
But this morning users were reporting times of up to 30 seconds. No alert had been triggered, so I run a quick query against the logs and found that requests were being processed in under 3 seconds.
My first thought was the serialization in the client, a VBA "app", has been a long while since the last time I did anything on VB*, so I thought of trying something different first; network? is never network... but I had to checked; was able to reproduce the issue with one of the developers with the application pointing to a subdomain that resolves to a load balancer as per the production deployment, and then changed the application to go straight to one of the boxes and the slowness was gone.
And that was all the joy of it, after that, I had to go to talk with the network administrator that, of course, told me that the problem was in our applications. After three emails, one diagram and two more visits managed to convince that there was a problem in the network, and after looking at DNS records and the f5 we found out that the subdomain was resolving to an external network address, what meant that our internal users requests were traveling out and back to our network through through enough network devices and hops to cause the slowness. Finally we changed that internal DNS to resolve to an internal virtual IP and moved on.
Have to say that the pain of having to deal with IT was more bearable as I had the official DevOp of my team working with me, feels good to have someone bridging both worlds to investigate and overcome this kind of problems.

No comments:

Post a Comment