Database Log Entries Lost

M1MAJ · 29 January 2018 09:18

The root cause of the DNS problems is that “they” (for some value of “they”) have changed not only the entry giving the address for sotadata.org.uk, but also the identity of the name servers for the domain.

The address record has a TTL of 3600, so you would expect it to propagate within an hour - perfectly reasonable.

But the NS record is being served from the parent zone org.uk with a TTL of 172800, which is 2 days. The old nameservers are still serving the old address records (or at least one of them is - the other appears to be down). It is quite legitimate to carry on using these servers until the TTL of the NS record obtained from the parent expires.

I would therefore expect this to take up to 2 days plus 1 hour from the time of the original change to sort itself out. In the meantime, some people will get the old address, some will get the new. (Not counting any additional client-side caching outwith the DNS protocol proper).

Doing a DNS change by changing the identity of the name servers is not a clever way to do it Changing address records is routine; changing the nameservers themselves is not something to do lightly.

Martyn M1MAJ

VK3ARR · 29 January 2018 09:31

Which gives you a sense of the scale of the issue they faced!

M1MAJ · 29 January 2018 09:40

Indeed, but you might expect a service provider to be prepared for it. In my work life we never publish the real IP addresses of the machines offering DNS service. It is always a secondary address used for no other purpose. If we have a catastrophic failure of a server, we would just move the secondary address to a different machine and not have to trouble our parent with a change of nameserver.

HB9DQM · 29 January 2018 10:01

And of the new nameservers, only one answers authoritatively for sotadata.org.uk, while the other (ns10.aspnethosting.co.uk) does not seem to know about the zone

M1MAJ · 29 January 2018 10:02

If you start from the main SOTA web site http://www.sota.org.uk/ you are led directly to the database in one hop, but there is no direct reference to this reflector at all. You can only get here in two hops via SOTAwatch. How can anybody wanting to do something on the database be reasonably expected to come here first on the off chance that there is an important notice telling them not to?

If I want to use the database, I go straight there. I don’t read the reflector first just in case. I won’t typically see a reflector posting until it comes in the mail digest the next day. Sometimes I have a backlog. In the event I wasn’t caught out, but I could have been quite easily.

The database front end web site was up and running throughout. Could it not have a “maintenance mode” that you could trigger to block user activity when you don’t want it?

I hope this doesn’t sound too grumpy. I know you must have put a huge amount of effort into sorting this problem out, and along with everybody else I am most grateful for that. But please don’t blame users for using something that appears to be working!

Martyn M1MAJ

M1MAJ · 29 January 2018 10:06

Erk - yes! That’s awful.

Do you think these guys need a DNS consultant?

SP9MA · 29 January 2018 10:51

Is it safe now to upload again those lost data ?
I have some missing S2S of Jan 27

VK3ARR · 29 January 2018 10:52

What really annoyed me about this outage is I’m in the middle of a bunch of work to separate out the presentation and app layers and because it was down, I was stuck twiddling my thumbs. If this had happened two or three weeks from now, then yep, we’d have been sweet.

But it didn’t, so we weren’t.

G0CQK · 29 January 2018 11:38

I guess anyone still waiting for the DNS to fully distribute could add a line

87.117.226.153 sotadata.org.uk

to their HOSTS file in C:\Windows\System32\drivers\etc
as a short term fix

K3TCU · 29 January 2018 14:29

Yep, Same here Rich. I keep a daily paper log so I just went back and re-entered
those dates.

Gary K3TCU

DD5LP · 29 January 2018 15:13

NO!! Please don’t as I will gaurentee that will get left in there and forgotten!

When any new combined services for SOTA come along IP addresses will have to change, DNS entries will have to change, even URLs might have to change - all of that happening with users with manual DNS records in a HOSTS file is going to cause real confusion!

Please wait everyone until Andy says everything is ready for use again - I’m not part of the MT but I have worked in IT for over 40 years and while I understand people’s wish to get using the system again, doing so before it’s 100% released by the MT will only cause problems!

Ed.

G0CQK · 29 January 2018 15:20

Several years ago when 123REG had a major outage many SOTA users used a similar insertion and to the best of my knowledge no-one forgot to remove it once the problem was fixed.

DD5LP · 29 January 2018 15:25

I have had other experiences (in business situations) where local IT “gurus” have taken it upon themselves to set up or change HOST files - perhaps it wouldn’t happen with more technical SOTA users, but then again …

G0CQK · 29 January 2018 15:33

My HOSTS file contains 30600 lines mainly because I use the HOSTS data from http://winhelp2002.mvps.org/hosts.htm with the additions that are added by Spybot https://www.safer-networking.org/. Just another layer of security.

M1MAJ · 29 January 2018 18:50

Have you noticed that the one that does answer is the database server itself?

HB9DQM · 29 January 2018 19:37

Indeed… Let’s hope that is because they are using some kind of firewall/NAT/load balancer in front of the servers, and are trying to save on IP addresses.

BTW, here is an explanation of the outage: Windows VPS UK hosting in London Announcements

GM4LLD · 29 January 2018 21:05

I think the last setup was the same, with 2 (different) nameservers but only 1 serving names! I’m happy to admit I’m a DNS user… I know the things I have to do to use someone else’s DNS and I understand the broad principles. The only one I’ve ever set up is in a Raspberry Pi and that consisted of “apt-get install dnsmasq” and edit the conf file to set the IP base and range handed out. So whether this setup makes sense or not I leave to those who know.

I read the crash as someone applied one of these quality Intel Meltdown/Spectre microcode updates and it goosed the server. MS withdrew one today because Intel’s microcode patch caused more issues than it fixed and Red Hat did likewise a few days back. I think both MS and Red Hat were very sensible in washing their hands of Intel’s awesome failure.

M1MAJ · 30 January 2018 09:17

This badness even has a name: it’s called a lame delegation. If you tell the service provider that sotadata.org.uk has a lame delegation to ns10.aspnethosting.co.uk they should be embarrassed and say they’ll fix it right away. If they say “what’s that?”, you know they are clueless.

DNS is incredibly resilient, but as with so many things, it works best if you follow the rules. If you don’t, you tend to get apparently random iffyness which gets characterised as “oh, just another DNS glitch”.

Martyn