Other SOTA sites: SOTAwatch | SOTA Home | Database | Video | Photos | Shop | Mapping | FAQs | Facebook | Contact SOTA

Andy's Cluster


#21

It’s not very well at present. But there appears to be an issue at the data centre in Rekjavik compounded by the fact I’m 3000km further away in Fuerteventura! It could be there is a DDOS attack running on something at the data centre because at times access is very slow…

I’ll keep restarting it when it falls over as I’m already checking it whenever I can. Bit of busman’s holiday fixing servers when away!

You must be keen Esther… the bit of the cluster left responding said you’d tried 249 times to log on. You only get about 10 attempts per hour or your get blacklisted… a primitive way of blocking script kiddies that sort of works.

Restarted now, Ill check before we go out for dinner. If any of the experts who like telling me how I have implemented something badly would like to offer some ill informed opinions, please be my guest. I may even read them!


#22

Thanks Andy.

Also been blacklisted myself trying to get in the last few days - I don’t think 249 times though like Esther!
Sorry I missed your activation today and others. I was out of the shack for most of the day - only one SOTA QSO to show.

Now back to monitoring FT8 on 160m in darkness hours for anything interesting popping up…

Hoping for a GX0OOO top band SOTA operation or two this winter which I can chase with my latest antenna project…

73 Phil


#23

Hey Pat @KI4SVM, you were slow to reconnect today… it took you 18 secs after the cluster strted before you were in again :slight_smile:


#24

Maclogger keeps trying automatically! Crabdance back


#25

And I thought you were furiously typing at the prompt and swearing because it wouldn’t work! :rofl:

I’m glad that people find the thing useful and miss it when it breaks. Makes writing it worthwhile.


#26

Just trying to be patient… :grinning: It was also the auto connect in MLDX and not me, but I was missing the cluster immensely this morning. Thanks for giving it a kick and I do appreciate having the cluster.

73, pat - KI4SVM


#27

So I found telnet on “run” and typed in elgur.dtdns.net:7300 and it replied “Invalid Command.” Please give a little more detail. Thanks! Scott


#28

Using my powers of ESP that will be Windows though you didn’t say so. On Win you need to install a telnet program or enable the (rubbish) built in one.

Then you need to use the correct address, elgur.crabdance.com, my mobile broadband dongle service is offline so I cannot help with whether it’s {space}7300 or : 7300


#29

Scott, your personal help line has been opened. :grinning: Give me a call when you wake up.

73, pat - KI4SVM


#30

Pat & Scott - hope you can get the telnet working Scott. Its most effective when used as part of a logging program, such as Logger 32 etc.
However Andy’s cluster was working fine until about an hour ago, I think he needs to give it another kick when he can, as its stopped shipping the spots out again . Its normally very reliable.

73 Phil


#31

It knows I’m on vacation!

To compound that my mobile broadband provider has an issue. Phone internet access works but not the PC’s but it should be on this afternoon.


#32

Cheers Andy - weather awful here today, hope you got it good.

73 Phil


#33


#34

Obviously too hot!


#35

The oompla-loompas have been busy. Three had a large roaming internet access problem for PAYG customers. That explained why my phone was fine (contract) and the PAYG dongle was not fine. Anyway things look fixed.

There is something broken at the datacentre in that there are periods when the cluster server is very slow to repsond. This looks like a networking issue, possibly DDOS on the hosts or a machine in the hosts, possibly an intermittent switch or router. When I do connect to my server it is working, just it cannot access SOTAwatch for the spots in the allocated time window. There’s nothing I can do here other than restart the cluster when I see it is broken. I can’t even report the network issues as I don’t have the contact / account info with me :frowning: Sorry but we’ll have to grin and live with it till I return from the paradise of warm sun and blue skies.


#36

I’m hoping I have got to the bottom of the problem with the cluster. The random network slowdown problems still are affecting my server and they have been very useful in bringing a random failure mode into focus. The cluster has had a random “failure to proceed” issue for the last 2-3 years. It has proved almost impossible to pin down as it was totally random in that the cluster would run continuously for weeks or months with no issue and they just die. The various attempts at instrumenting the failure were not really helping me.

All the network code is protected by exception handlers which are meant to gracefully capture network faults, report the issue to me on my cluster console and then continue. All the network transactions are protected by timeouts which will also catch problems. Well I thought they were protected… it turns out there was one that had a timeout of 0 and that means “wait indefinitely” and that was exactly what was happening. A failure to get the latest spots from SOTAwatch caused a beautiful deadlock… the code would wait indefinitely for the response that was never going to come and you can guess the rest. Simply setting the timeout to 10secs has fixed the problem. Despite the network connection suffering from all sorts of slowdowns, the cluster has gracefully handled its loss of connectivity and continued running. I’m happy that it’s fixed and I can now report the network problem as it is no longer helping me debug my own code.

Of course this had to happen when I was vacation, it’s almost like computers know when to go wrong to cause the most disturbance. :slight_smile: