SOTAwatch problems 22-Jan-2021

We are aware of SOTAwatch issues and myself, Andrew and Josh are having a 3 way around the world chat now whilst we work on this problem.

7 Likes

Thanks team! Yall are the best!

1 Like

I thought I would mention that SOTAwatch is awesome!

I’m amazed at how reliable it is.

Having it go down briefly just reminds us how awesome it is.

Thanks for all you do!

1 Like

:flushed:

6 Likes

Thanks for all you guys do!!!

KC3GOP

1 Like

This happened in the middle of my activation! I was fortunate that my first spot for 40m got through and I had a big pileup. After that…crickets ;-(. I even tried spotting via SMS and nothing. I figured SOTAwatch was completely down. I didn’t know how long it would take to get this resolved and I had chores to do at home :-(. I’m sure it was back up when I was driving home.

Many thanks to the team supporting both Sotawatch and rbnhole. The system availability has been quite good and I’ve come to really rely on those tools! I’ve become addicted to big pileups and I wouldn’t have them without those tools. I’d have to just call CQ like the old days and make only a few contacts ;-).

73, Brad
WA6MM

1 Like

It’s so reliable I had to consult my book of spells to remember how to log in to view the status :blush:

Here is a graph of the load balancer traffic showing the outage.

tfc

The outage started at 1825Z and it was back by 2041Z.

Your first failed SMS spot was at 1834Z so you just caught the start of the outage. A further 16 SMS spots failed.

It was working, but was running excessively slowly as 1 SMS spot was successfully posted in the middle of the period. However, it’s unlikely anybody would have seen it :frowning:

We’re still considering what caused it to get all upset and faulty.

1 Like

The system is stable again. We’re unlikely to get to a complete root cause, but we were able to track it down to communications between the API and the backend spot database courtesy of timeouts visible in the API server logs. This database runs in a shared cloud environment and showed a couple of spikes of load at or around the time of the outage starting. Our best guess is something - either a query we ran, or some background load on other hosts in that environment - caused a bottleneck briefly, which then caused further queries to bank up using up database load. We spiked up the available compute (at twice the price :frowning: ) and then restarted the API servers to reset any outstanding connections and after everything that was waiting on the API servers was cleared out, the API was back.

3 Likes

“…you don’t know what you’ve got til it’s gone.”
I, too, was mid activation when I couldn’t send out another self spot for a band change.
The 1st spot on 20 did the trick, though, and then I tried 2 meters for a few more Qs.
After driving to my first summit only to find low 40s and active rain I headed to a local dry summit.
The system allowed me to update my status.
My thanks, too, to the group keeping this system up and running.

73,
David N6AN

1 Like

All good half the time its my internet connection that only works at about half pace as well. What with summer and Covid not a lot of SOTA in VK for a month or so.
Thanks to everyone who keeps it all going.
vk5cz …

1 Like

W7UM and I were on sunny W0C/FR-062, Mount Bailey, at 2770M, and things seemed to be getting really slow around 1845Z - no SOTA stations audible. We had both done 40-30-20M CW, so we had already got most of what we went up there for.

I tried calling CQ on 18.093 for 5 minutes 1851-1856Z, just to get a few more chasers, but after calling for 5 minutes, no chasers called. This was not normal…the RBN Hole almost always spots me and gets some chasers on a new band that’s open.

Mark checked his phone for the spots on SOTAWatch, and he told me:

  1. First that the spots had not changed in a while
  2. Then that there were no spots or alerts listed on SOTAWatch!

After talking about it, and hunting for people calling CQ, we found none, so we decided to pack up and head down the snowy trail.

I think this is the first time that SOTAWatch has gone down for a significant time, during one of my activations, since I started SOTA in 2013.

Thanks for getting it back and for all your incredible work to make this work so well!

73
George
KX0R

2 Likes

I was just getting set-up and I noticed that I didn’t have enough of a cell signal to log into SOTAwatch3, which is not unusual for ATT users on Utah peaks. So I was able to get a text to Bruce, WY7N and we QSY’d to 60 meters. Good old ham radio, it always gets through. He couldn’t log in, so I resorted to calling CQ on popular SOTA frequencies. Being old school, not a problem, plenty of contacts.

It reminds me too that the SOTA infrastructure of reflector, spots, maps and alerts is incredibly robust. I can’t imaging doing SOTA without it.

Thanks for all that you do and 73.

Greg, W7GA

We were surprised there wasn’t a topic on the reflector before Andy posted one but it seems like that was because everyone was out activating.

3 Likes

Hmm, wondered what was wrong. I was freezing to death for nearly two hours trying to get contacts.

1 Like