ZL Hunting (Part 1)

GM4LLD · 27 January 2025 08:55

Works with Brave using my UK IP but 403 for Firefox using the Zurich based VPN.

Snap. Maybe you need an IP that geolocates to ZL?

GM5ALX · 27 January 2025 08:58

That’s what I figured. Checks VPN options

GM4EVS · 27 January 2025 09:05

My IP usually geolocates to somewhere in the Central Belt … according to the local service ads that pop up?

VK3ARR · 27 January 2025 09:16

CloudFlare often blocks VPN exit IPs, so it may simply be because you’re on a VPN, rather than a whether or not you’re using ZL/non-ZL IPs.

GM4LLD · 27 January 2025 16:30

Well one last post to finish things off.

Thanks to Tim for the initial post and starting an interesting thread off.
Thanks to Tim and Alex for various programming comments.
Thanks to others for algorithm suggestions and overall improvements.
Thanks to Dave to some nostalgia comments.

My original program in Python using geopy had the kind of performance issues you’d expect from doing many iterations in a langauge like Python. It was taking about 35sec to make one complete pass through 177166 summits. That would suggest it would take about 65.5 days to complete the task. This is on a low power tiny Linux desktop (17cm x 17cm x 4cm, quad core i5 4560, 8GB/2TB). It took about 30mins of development time from let’s write this to getting something that was outputting meaningful numbers. That’s what I’d suspect for a simple-ish task like this using Python and the excellent standard library.

Yesterday I thought that as I’ve been paid cash money to write C++ since 1996 I should rewrite this in a fast language. You get spoilt using Python (and other similar languages) as you don’t ever seem to have faff about , the standard library contains code that nearly always does what you want. So despite 29 years paid C++ development it took me nearly 2 bl**dy hours to parse the input CSV correctly. I’d even written a quick prog in Python to output just the data I wanted. And after 3 hours I had something that was actually working properly. This was compiled with -O3 on gcc 12.2.0 On the same computer it took about 45mins to process 155472 summits, the currently valid summits.

I ran the same code on an i7 8500 laptop, 31mins. But I was using the standard great circle calculation with a single value for Earth’s radius not using anything that models the bulge or height.

Both of those times are single thread time.The obvious challenge was to go run it on some “big iron”. The machine running idle at work was a Intel Xeon Gold 6148 dual CPU. That’s a total of 40 CPU cores supporting 80 concurrent threads, 384GB RAM, 6TB SSD. It’s sat in a data centre in Eindhoven waiting to be used. In fact it’s considered obsolete by my employers, I needed a server grade machine with a proper server remote management facility and it was cheaper to take this out of the scrap pile than buy a new small server for the job it will run.

I modded the C++ program so you could tell it how many summits to process and where in the list to start. So you would start one at line 1 in summitslist.csv and run for 10000 summits and another copy at line 10000 and run for 10000 summits etc. I ran 40 copies, 1 per each real CPU core each doing 3800 summits. This task parallelises perfectly, you can split it into as many chunks as you want and no chunk delays or impacts the others.

It was quite a bit quicker. In fact I thought it wasn’t working because it seemed to stop before it had started. But no, all the files were there with data.

You want to know how long to run in total? First time as it wasn’t cached etc. was under 10secs. Repeat runs take around 6sec for all 40 copies to load, calculate and finish. That’s to load 40 copies of the program, 40 copies of summitslist.csv with 155472lines and process 3800 summits each. Oh, and this is a 2017 vintage processor, so not nearly as fast as a modern one!

Dave’s 3081 was a dual CPU machine with 5MIPS per CPU so it should have been nippy but if you have hardware floating point then you are going to win even if your 80286 was probably only around 1.7MIPS. You also may not have been getting all the processor time. There were probably lots of batch jobs running at the same time. Don’t forget IBM big machines of that time used 3270 style terminals so there would have been many multiplexors and concentrators between you and the actual computer to make sure it wasn’t bothered by people pressing keys. Because of that, 500 users is not a lot.

Also the discussions and comments have exposed a wee issue with some summit data in the main database, a fix for that is in hand. Probably would have been done if I hadn’t been playing parallel processing!

I wasn’t sure what the error would be between the “accurate” answer using Earth’s bulge and the simple great circle. Sufficient that for EA9/CE-001 mine picks ZL1/NL-101 not ZL1/NL-084, the error is about 16km at this distance and enough to pick a different summit. @GM5ALX @G5OLD , send me the algorithm you used and I’ll make this code use the same for the same answers.

This has been such a change from the normal computery things I do.

GM5ALX · 27 January 2025 17:19

UPDATE: I’d used the wrong formula, and think mine was calculating the distance ignoring the curvature of the earth… but I’ve had enough of this now

I was thinking a useful output, digestible by all (and not a 300+GB file), would be the summitslist.csv with some extra columns for the top 10 furthest summits and distance. So people could look up their summits and see what’s far away…if such thoughts occur to them…or if nothing else then a least something to show for all the brain and computing processing!

G5OLD · 27 January 2025 18:22

@GM5ALX’s python looks just the business and based on the formula below:

The beautiful thing about this, it will look at the distance from operator to operator.

Having looked at the top 1000 antipodean summits calculated by Alex @GM5ALX both the altitude and bulge of the earth make a difference to almost all summits !

Very interesting discussion about computer times. I do wonder how much of this is related to how the software is written to use a) storage with poor/naff write times and b) speeds that multi-core/threading offer. We are talking about very very big data sets, where this matters!

ZL1THH · 27 January 2025 19:51

Well I’m up for this end any time someone can go up Cabras. Anyone keen?

ZL1THH · 27 January 2025 20:12

GM5ALX:

def haversine_with_altitude(lat1, lon1, alt1, lat2, lon2, alt2):
    R1 = geocentric_radius(np.degrees(lat1)) + alt1
...
    return np.sqrt(R1**2 + R2**2 - 2 * R1 * R2 * np.cos(c))

This is slow as molasses in python.

You should be doing this vectorised, not point by point in a subroutine, so that the libraries (blas,linpack) take care of using SSE instructions, multiprocessing etc. (i.e. Numpy)
Not sure how one approaches vectorising an iterative algorithm like Vincenty’s

I don’t understand how Python is a “major” language and lacks JIT compilation, I was just astonished how slow when I benchmarked it, back when it was still Googles favourite language du jour. I was thinking about changing from Matlab, but gave that idea up.

GM5ALX · 27 January 2025 20:27

Yes, absolutely there are better ways. (I’m also comparing it to Tim’s screenshot of the formula and I’m not sure it’s right…)

I think Python is a major (in top 5 by everyone) language because of how clean and easy it is to read, how easy to get started and how universal it is across all domains.

If you know what you’re doing, there are lots of ways to speed it up. I believe youtube uses python and handles 1 million requests per second, dropbox, instagram, many major, mainstream apps all run on python. A huge percentage of data science is python, major AI libraries are python.

GM4LLD · 27 January 2025 20:30

Because it’s users know of the deficiencies. Ease of development and available everywhere make it a go-to language for the correct problems. As I said, despite many years C and C++ experience, I wasted hours to get my CSV parser/loader to work when in Python it was simply

inputfile = open('summitslist.csv', 'r')
summitreader = csv.DictReader(inputfile)
for row in summitreader:
    if row['ValidTo'] != '31/12/2099':
        continue
    //more code here

and as they say “Hello, my name’s Robert and I am your father’s brother”

Paid Python programmers know when to stop using it and move to the next tool in the toolbox.

GM5ALX · 27 January 2025 20:49

Let’s try again…

def calculate_arc_length(lat1, lon1, alt1, lat2, lon2, alt2):
    """
    Calculate the circumference segment (arc length) between two mountain summits.

    Parameters:
        lat1, lon1: Latitude and Longitude of the first summit in degrees.
        alt1: Altitude of the first summit in meters.
        lat2, lon2: Latitude and Longitude of the second summit in degrees.
        alt2: Altitude of the second summit in meters.

    Returns:
        Arc length in meters.
    """
    # Earth's equatorial and polar radii in meters
    a = 6378137.0
    b = 6356752.3 

    # Convert latitudes and longitudes from degrees to radians
    lat1_rad = math.radians(lat1)
    lon1_rad = math.radians(lon1)
    lat2_rad = math.radians(lat2)
    lon2_rad = math.radians(lon2)

    # Calculate the radius of the Earth at a given latitude using the formula for an ellipsoid
    def earth_radius_at_latitude(lat):
        cos_lat = math.cos(lat)
        sin_lat = math.sin(lat)
        numerator = ((a**2) * (cos_lat)**2 + (b**2) * (sin_lat)**2)
        denominator = (cos_lat)**2 + ((b / a)**2) * (sin_lat)**2
        return math.sqrt(numerator / denominator)

    # Calculate the radii at the two latitudes
    radius1 = earth_radius_at_latitude(lat1_rad) + alt1
    radius2 = earth_radius_at_latitude(lat2_rad) + alt2

    # Calculate the central angle
    delta_lon = lon2_rad - lon1_rad
    central_angle = math.acos(math.sin(lat1_rad) * math.sin(lat2_rad) + math.cos(lat1_rad) * math.cos(lat2_rad) * math.cos(delta_lon))

    # Calculate the arc length
    arc_length = ((radius1 + radius2) / 2) * central_angle

    return arc_length

Also

Nice optimisation

GM4EVS · 27 January 2025 21:09

As you know, for many (most?) tasks, Python is quick enough, effective, and has loads of useful libraries.

What is more, Python / SciPy are now widely accepted within the academic & scientific communities. It seems Nature magazine etc are happy to accept the results of your latest work on say Gravitational Waves, charts and all, generated by Python.

You put your data and code on Github so it’s available for peer review. In particular, this helps reviewers from less well-off countries who can’t afford the likes of MATLAB licenses.

That said, though I have been retired for a while, I believe major engineering projects, like building a 2-mile suspension bridge over an estuary, will typically require all the calculations to be done (and submitted) in a recognised Math package such as MATLAB.

It would seem the days of having two ‘interns’ write 30,000 lines of VB in 2 months to design your bridge are long gone

Of course, bridges are about safe, accurate and reliable. But for others, where time really is money, flat-out performance becomes the order of the day - see link below.

73 Dave

M1MAJ · 28 January 2025 10:05

It certainly is, but Julia is gaining in popularity as an alternative to MATLAB. In my former department, one of our greatest MATLAB enthusiasts came across Julia and pronounced “I think I’m in love…”

Martyn M1MAJ

GM4EVS · 28 January 2025 11:20

As you know, our software world is built out of both new and old.

Newer languages like Julia, Rust, and so forth continue to gain traction. These are places where Comp Sci folk are probably keen to build their careers.

Equally, but perhaps less-visible, are the older dependencies, for example:

1 Large tracts of the UK banking system still continue to use COBOL.

2 WSJT, which includes FT8, has some of its core error-correction routines written in FORTRAN (older F77, not newer F90) - who knew?

Check for yourself, the full source is available from the web-site. It’s rumoured that K1JT is the only FORTRAN support person?

3 The US Lawrence Livermore lab struggles to get folk to support its substantial legacy FORTRAN codebase. Not an enticing career path.

73 Dave

G4TGJ · 28 January 2025 17:12

I’ve used GNU Octave to run some MATLAB code for generating DSP filters. I don’t know if it is generally a good alternative, apart from the infinite reduction in cost.

I thought everyone knew that.

ZL1THH · 9 February 2025 03:32

It has become way, way more complete than it was a decade ago. The ML style handle graphics seem to work properly. It starts faster than ML’s ide does these days, though I’m sure the run speed is not so good as I don’t think it has ML’s JIT compilation.
I’m running a new system on Octave in production, and that would have been unthinkable a decade ago.

I tried Julia, I also heard great things about it, but every startup required a whole looong compile of libs you included. Slow startup grinds my gears.
It’s probably the choice for really big datasets. Throw a few 500MB arrays at Octave and I’ll bet it doesn’t run so well.

ZL1THH · 9 February 2025 03:33

Heading up for a sunny evening on AK023 tonight with Russ VA3RR, hoping for some antipodean action.

ZL1THH · 15 February 2025 19:26

Anyone in Christchurch, Kaiapoi:

EA2GM is almost antipodeal to you, on now, only 50kms away…
EA1/AT-208

ZL4NVW · 30 March 2025 19:40

Appologies to any EAs who were after an S2S distance record. I’ll have to withdraw my promise to activate The Nelson Tops as the wx forcast for the remainder of my annual leave is too poor along the main divide.

Another year perhaps …

I might still be able to manage the antipode of EA1/LE-267 Las Colinas (ZL3/CB-293 Snowflake), but only on request as it doesn’t fit in as part of a longer SOTA tour. If anyone is keen and can give a commitment around the 5th April, let me know.