Well most of you have noticed the reflector is now back and running. So this is what happened…
This reflector is run on a hosted service. We could run our own server and run the reflector software on that but we took the decision to use a hosted service for one simple reason, someone else has to do the hard work of maintaining the server, installing updates, installing security fixes, running backups etc. It costs a little more but everyone on the MT has a lot to do so offloading these tasks to someone else makes great sense. This is because Jon and myself both have full time jobs and we don’t have the time to maintain the hardware and system software, we only have time to design, write and administer some of the apps that run on these machines.
The hosting company decided that it was time to do some maintenance work which should have resulted in a few minutes downtime. However, when they tried to bring our server back up they had some real problems. SOTA online services only have to be a bit iffy for a few minutes before we all start to notice and we had a fault ticket raised quickly. At that point the fix was expected shortly. As the day wore on it was obvious that some big problems were happening for the hosting company. To be fair we did get updates from them and they were warning us they may have to use a backup to restore our server. Jon got a message up on SOTAwatch quickly so hopefully everybody was able to see the problem was being worked on.
In the end it turned out the disks in our server were damaged beyond repair and a backup had to be restored. That takes time because the server is shared by many customers and that can be substantial amounts of data being restored. If you work on the figure that a slow disk may write out 50MB/sec it will take 20sec to write 1GB and 20000sec to write 1TB. That’s 5 1/2 hours to restore the files. This explains the long delay
As I said earlier, we pay for a hosted service so we don’t have to do anything and the hosting company ended up restoring a backup they made. We could have backups made hourly but the cost is considerably greater than the backup level we do have, that is backups are made daily. The backup that has been restored was somewhere between 14 to 17hours old. It’s easiest to assume that posts made after 0000Z 26-Oct-2015 but before the server came back up may be missing but everything before then has been recovered.
I’m glad we pay for hosting because otherwise someone would have had to come home from a day at work and start debugging hardware and then rebuilding hardware and the restoring backup images. Instead all we had to do, once we knew what was happening, was to wait. The hosting company has apologised for the length of the outage and is giving us some free hosting as compensation. I think that’s not a bad deal. We now know their backups work and they value our custom to compensate us for loss of service.
So everything should be OK now on the reflector.
obo SOTA MT