Handling a steady 10 requests per second at present down from a peak of about 85/hour during NA time. No 4xx codes being reported on the front end graphing.
I’ve seen some issues depending on what client you’re using with it sometimes caching TLS certs and when it gets renewed, the client can’t cope, but that should return something like a 403 or 500, not a 404. Other possibility is if the client persists a connection and that then gets stale if the load balancer drops it after a while.
Or something in the network between?
If I was a network person, I’d blame spanning tree protocol because that’s the go-to excuse for any unexplainable.
I’d hate to imply that the OP didn’t know what he was on about. Particularly with a good clear error message that showed some kind of troubleshooting had been done. No, blame a technology no one understands.
FWIW, I checked back through my logs and I was also getting straight 404 during the following periods, all UTC, otherwise all fine:
Mar 12 21:42:02 to Mar 12 23:04:03
Mar 13 15:31:02 to Mar 13 17:43:02
Mar 13 19:34:02 to Mar 13 21:35:04
Mar 14 11:28:02 to Mar 14 11:37:03
Mar 16 19:38:03 to Mar 16 20:49:05
Mar 21 16:35:02 to Mar 21 19:16:03
I also run concurrent pings to all my data sources. Since March 1 the max latency to api2.sota.org.uk was 141 ms with no dropped packets. My server for all this is clearskyinstitute.com. My hosting provider is godaddy.com, not the best for sure.
I have another theory that what’s happened is one of the backend servers had a glitch, was pulled out of the load balance, but something your request was trying to persist to the same server through the LB either with a cookie or something else, and once that backend server was pulled from the LB, it had nowhere to go. Some clients do this automatically, so I’m not suggesting it’s anything intentional, but it’s plausible other than the fact it doesn’t explain why there’s periods where it’s all OK, which would imply the cookie/client is being reset at some point.