SOTA reflector outage 26th October

Well most of you have noticed the reflector is now back and running. So this is what happened…

This reflector is run on a hosted service. We could run our own server and run the reflector software on that but we took the decision to use a hosted service for one simple reason, someone else has to do the hard work of maintaining the server, installing updates, installing security fixes, running backups etc. It costs a little more but everyone on the MT has a lot to do so offloading these tasks to someone else makes great sense. This is because Jon and myself both have full time jobs and we don’t have the time to maintain the hardware and system software, we only have time to design, write and administer some of the apps that run on these machines.

The hosting company decided that it was time to do some maintenance work which should have resulted in a few minutes downtime. However, when they tried to bring our server back up they had some real problems. SOTA online services only have to be a bit iffy for a few minutes before we all start to notice and we had a fault ticket raised quickly. At that point the fix was expected shortly. As the day wore on it was obvious that some big problems were happening for the hosting company. To be fair we did get updates from them and they were warning us they may have to use a backup to restore our server. Jon got a message up on SOTAwatch quickly so hopefully everybody was able to see the problem was being worked on.

In the end it turned out the disks in our server were damaged beyond repair and a backup had to be restored. That takes time because the server is shared by many customers and that can be substantial amounts of data being restored. If you work on the figure that a slow disk may write out 50MB/sec it will take 20sec to write 1GB and 20000sec to write 1TB. That’s 5 1/2 hours to restore the files. This explains the long delay

As I said earlier, we pay for a hosted service so we don’t have to do anything and the hosting company ended up restoring a backup they made. We could have backups made hourly but the cost is considerably greater than the backup level we do have, that is backups are made daily. The backup that has been restored was somewhere between 14 to 17hours old. It’s easiest to assume that posts made after 0000Z 26-Oct-2015 but before the server came back up may be missing but everything before then has been recovered.

I’m glad we pay for hosting because otherwise someone would have had to come home from a day at work and start debugging hardware and then rebuilding hardware and the restoring backup images. Instead all we had to do, once we knew what was happening, was to wait. The hosting company has apologised for the length of the outage and is giving us some free hosting as compensation. I think that’s not a bad deal. We now know their backups work and they value our custom to compensate us for loss of service.

So everything should be OK now on the reflector.

Andy, MM0FMF
obo SOTA MT

3 Likes

Very good Andy - thank you for the information.

73 Phil

A similar thing happened with the QRZ website, nightmares unimaginable took place.

Thanks to everyone for their efforts.

Regards, Nick

1 Like

Hi Andy,
Does SOTA cover the hosting costs etc from sales of promotional material or does the MT have to play the stock market to cover SOTA’s costs?

73
Ron
VK3AFW

The costs are covered from sales of awards and from donations. We try to set the price of awards so that the profits earned are sufficient to cover the costs of running the program for the foreseeable future. Many people purchasing awards round the price up to the next multiple of ten and tell us to keep the change. On top of that we have received many donations from people. Some public such as the annual donation by the HB9-SOTA group and some anonymous.

2 Likes

Hi Andy

Thank you for all your hard work. Well done!

73

Pedro, CT1DBS/CU3HF