S2S CSV upload format

MM0FMF · 14 April 2013 21:40

I’ve been putting this off for a while but now is the time to get the discussion on this started. I had a very useful discussion with Martyn M1MAJ a few weeks back so it’s time to see what others think.

The current CSV file is quite slack in the exact specification. It doesn’t require that any embedded commas are wrapped in quotation marks. i.e. either of these lines would be valid:

MM0FMF, 14/04/2013, 14:15, GM/SS-001, 144MHZ, FM, 2M0NCM/P, Neil on Ben More, doing a fine job

or

MM0FMF, 14/04/2013, 14:15, GM/SS-001, 144MHZ, FM, 2M0NCM/P, “Neil on Ben More, doing a fine job”

This slackness means I’m limited in how I can automatically determine if the line is for an S2S or normal chase. I don’t want to change anything that affects any existing tools used to generate CSV files as that would annoy the pants off a lot of you. I mean I could just say from some date old format files will not work. But all that means is my email inbox will fill with complaints and queries and I don’t want that.

So I’m proposing the following in that an special identification field is added which indicates that this line of data is an S2S QSO followed by the activator call, activator summit and then the rest of the data as before.

i.e.
ID_S2S, MM0FMF/P, GM/SI-100, 14/04/2013, 14:15, GM/SS-001, 144MHZ, FM, 2M0NCM, Neil on Ben More, doing a fine job

This means that data for S2S and non-S2S chaser QSOs can be intermixed into one single file. Existing software will still generate valid chaser files and new software will be able to generate data for the new format.

I’d welcome comments on what potential software authors think or just users in general. Maybe you can improve on this but remember you can’t change anything that breaks existing utilities.

Comments?

Andy, MM0FMF
Database Manager

G8TMV · 14 April 2013 21:57

In reply to MM0FMF:

Instead of adding an Id and extra fields, how about annotating the callsign if it was on a summit?

Maybe something like (following your example above):

MM0FMF/P(GM/SI-100), 14/04/2013, 14:15, GM/SS-001, 144MHZ, FM, 2M0NCM/P, Neil on Ben More, doing a fine job

i.e. if the chaser is on a summit then they append the summit ref to their call. This should be easy to parse, just look for a (
It means there are the same number of fields to parse so should mean minimum changes.

Colin G8TMV

M1EYP · 14 April 2013 23:55

The possible snag that springs to mind with your suggestion is that many users will be starting with a log in Excel type format, or Excel generated from a logger progam. So you would have ID_S2S (or neater with just S2S perhaps?) in column A, which wuld be blank if not S2S, and your own callsign in column B.

Perhaps when the spreadsheet is saved as a CSV file, the empty cells in column A will just be ignored, I’m not sure. I could have a play to see what happens, but I’m busy…

Tom

MM0FMF · 15 April 2013 00:04

In reply to M1EYP:

You can’t have S2S as it’s a potentially valid Bangladeshi callsign, hence ID_S2S as it can’t be confused with a callsign. You can’t start putting in blank columns as that breaks backwards compatibility.

The problem is we allow unquoted commas in the comment fields. If we didn’t then we’d know 7 columns = chaser and 9 columns = S2S and it would be easy. But we can have >7 columns and less than 300chars as a valid line.

Unless we have separate import S2S and import chaser functions we have to a way of distinguishing between new and old data lines. Whatever is done has to continue to accept anything which is valid now.

Andy
MM0FMF

M0JLA · 15 April 2013 00:09

In reply to MM0FMF:

Andy,

Tricky one.

As the summit references are well structured and listed uniquely might beginning the record with a summit ID be sufficient to identify the record as a S2S record.

If so, then a batch of mixed chase and s2s records would have the first field blank, a callsign or a valid summit id. The blank would be skipped and the record treated as any other chaser record. The valid summit id would generate a chase record and the relevant s2s record.

Allowing a blank at first field would make chasers’ Excel sheets easier to scan for errors when mixed records are submitted; no s2s included then no blanks needed for padding.

My brain is still seriously fogged up so I my well have missed some obvious flaws here.

73,
Rod

G0LGS · 15 April 2013 03:05

In reply to MM0FMF:

Whilst the current database code for upload of CSV allows the comment part to have a comma without any quoting, unless something has changed since I last looked at it this: when downloading a CSV that has such entries importing those into other software causes some odd things to happen (because the web site download code does not put in the quotes).

When I wrote my CSV Log editor I chose to not allow comma’s to be entered in the comments field, simply to avoid such issues.

I would like to suggest the following approach to fix things:

Update the CSV download code to correctly add quotes around comments that already have comma’s in them (this will fix things for programs that correctly handle CSV formats such as Excel).
Add the S2S field to the CSV after the comment so it becomes an extra column at the end - this means any code using the older format remains largely (if not completely) compatible. (The CSV upload handling could even work out from the data if it is part of a comment or a valid S2S reference).

Whilst I know that this approach is not quite what Andy has said he would like it should mean that most existing programs that people are using for SOTA can either already handle the extra information (i.e Excel) or would require only small changes compared to what might be otherwise required to deal with 2 totally different line formats.

As software authors (such as myself) will need to update their code to handle whatever form the new S2S format takes I do not see changes to tighten up on the CSV requirements as a big problem.

Stewart G0LGS

VK3WAM · 15 April 2013 04:38

In reply to MM0FMF:
This is my idea:

Use the first field to switch between the existing line which has a callsign in the first field and has no s2s support, and a marker, rather than “ID_S2S”, make it “V2.0” “V2.0” cannot be a callsign. This could be present with or without quotation marks.

If the server finds “V2.0” in the first field, then this is a version 2 line.

So then if we take the line:
V2.0, MM0FMF/P, GM/SI-100, 14/04/2013, 14:15, GM/SS-001, 144MHZ, FM, 2M0NCM/P, Neil on Ben More, doing a fine job

This is a s2s between GM/SS-001 and GM/SI-100.

If we take the line:
V2.0, MM0FMF/P, 14/04/2013, 14:15, GM/SS-001, 144MHZ, FM, 2M0NCM/P, Neil on Ben More, doing a fine job

Then this is not a s2s, but it would be easier for software (and excel export sheets - which is what I do) to handle this, because the s2s chased is left blank.

Software authors can easily update to v2.0 which provides s2s support. Legacy software remains supported with the old format, but there is no s2s support on the old format.

It’s also future proof, as if something is needed in the future, the first field could be changed to something else as needed (eg V3.0) and then the server would look for the new fields, whatever they happen to be.

73 de Wayne VK3WAM

MM0FMF · 21 April 2013 19:50

In reply to VK3WAM:

I’d like to thank everyone for their input to this.

I have decided we would be best served by adding a new more tightly specified format for both chaser and S2S entries. In the chaser case it is essentially what Wayne suggested but that comments are not allowed unquoted in the comment field. This will make every line a fixed number of fields long. At the same I think having a similarly tightened format for activator entries is also a sound idea. The feature in the uploader that causes an activation to stop at 0000z will be removed. (There has been a feature to disable this for the lasy year or so.) The original format CSV formats will continue to be supported for a long time to come.

The upload procedure will not change, you tell the database which CSV to upload and it will determine from the data whether you are submitting old style or new style data. As such there will be no change to the procedure. I’m not able at present to allow a single file to contain both activation and chase records however. That would be nice in that those that export their log from another program could just export 1 file for upload. It’s too much of a change to implement that now.

At the same time as these changes, I intend to update the CSV download formats slightly such that the comment field will be quoted if it contains embedded commas. Also the parser has a bug in handling some modes so that unrecognised modes get accepted. This will be fixed.

Finally, once the new format is in place and we have the bugs squashed then I will switch the download log/activations facilities to produce new format CSV files.

I think that captures all the major requirements but if I’ve missed off some must have feature then now is the time to comment.

Andy
MM0FMF

W4ZV · 21 April 2013 22:22

In reply to MM0FMF:

Andy I hope TSV will also be supported. My old copy of Excel 2000 generates TSV but not CSV.

73, Bill W4ZV

G0LGS · 22 April 2013 00:48

In reply to MM0FMF:

I have decided we would be best served by adding a new more tightly
specified format for both chaser and S2S entries. In the chaser case
it is essentially what Wayne suggested but that comments are not
allowed unquoted in the comment field.

Do you mean the Comment Field will always have quotes around it ?

I like the idea of tightening up on the specified formats.

If you are going to use some sort of version ID in the CSV that can be achieved by making the first line of the CSV do that for the whole of the one upload/download (much like the downloadable Summitslist.csv has a date in the first line).

As the author of a program (SOTA CSV Log Editor) that works with both Chaser and Activator CSV formats, I would like:

Chaser and Activator formats to remain essentially the same for the first 8 fields of the CSV Format (Call Used, Date, Time, Full Reference, Band, Mode, Station Worked, Notes)

or/and:

The first line to contain something that can be used to determine which format is being used.

Stewart G0LGS

MM0FMF · 22 April 2013 12:30

In reply to G0LGS:

Do you mean the Comment Field will always have quotes around it ?

I was only going to apply them around comments containing one or more tabs or commas. It’s no biggy to always quote the comment field.

I like the idea of tightening up on the specified formats.

It’s probably overdue to be honest.

If you are going to use some sort of version ID in the CSV that can be
achieved by making the first line of the CSV do that for the whole of
the one upload/download (much like the downloadable Summitslist.csv
has a date in the first line).

It’s a bit awkward as the code is written in a sort of stateless way. i.e. the parsing and processing code just gets fed a line and it doesn’t know if there are going to be more lines or not. I’ll think more about it but I was going to have a fixed ident at the beginning of each line if it was the new format.

Andy
MM0FMF

M1MAJ · 22 April 2013 16:39

In reply to G0LGS:

The first line to contain something that can be used to determine
which format is being used.

The big problem with a “magic first line” (other than a header line containing field names) is that it completely breaks the relational database model. Such files can neither be imported into nor exported from standard database tools such as MS Access, but require bespoke applications or pre/post processing. It’s not difficult to work around, but it creates an additional step in the workflow.

The downloadable summits list won’t import into Access because of that line at the beginning giving the date. As it happens, this file needs additional fixes too because of quoting problems, but even without that I’d need to strip the first line before importing.

G8TMV · 22 April 2013 18:13

In reply to M1MAJ:

The big problem with a “magic first line” (other than a header line
containing field names) is that it completely breaks the relational
database model.

Agreed, each line should be both self contained and self identifying.

Colin

VK1DA · 23 April 2013 02:27

In reply to MM0FMF:

Andy, I know I’m pointing out the obvious, but this problem arose because the summit field had two meanings. Sorting that out would probably help everyone.

The general case is that there can be a summit associated with each callsign. If the first call has a summit, that means it is an activation. If the second callsign has a summit, it is a chase. If both have summits, it is S2S. If neither have summits, it is an error.

Your dilemma is whether to create a new format by adding a new column and thereby break other software. With notice, that should not be a problem for software developers. Some may require 3 months notice, you would know better than I.

The longer term solution may well be a selected subset of ADIF. With explicit tags,… you know the rest. In fact any XML-like implementation is better than CSV, but that would cause much larger ripples in the pond.

I liked your idea of a doctype-like declaration at the top of the file because it creates great freedom to change formats in the future, but agree that it does foul the nest in terms of one line/one record. Allowing commas to have two meanings is just as undesirable.

Another thought: allow either activation or chase records to be edited online to add the “other” summit code manually and thereby convert an existing record to be changed into an S2S record. Editing existing records can be ugly, yes, but this might buy time to develop the next version of file upload format.

regards

Andrew
VK1DA/vk2uh

DB7MM · 23 April 2013 22:39

In reply to VK1DA:

The general case is that there can be a summit associated with each
callsign. If the first call has a summit, that means it is an
activation. If the second callsign has a summit, it is a chase. If
both have summits, it is S2S. If neither have summits, it is an
error.

I think, this is a very good idea. Even if the database import does not support both activator/chaser data in one file, the same file can be used for both activator and chaser import separately. This is a big advantage for all exporting their data from some logbook program. Even our Excel users only need a single file, as unused columns will be no problem. In future this format could be seamlessly used for activator/chaser data in one file.
As you intend to have the old format still be supported and the new one tagged with the V2.0 you could even put the summit references to more obvious positions.
This is a chaser QSO:
V2.0, MM0FMF/P, 14/04/2013, 14:15, 144MHZ, FM, 2M0NCM/P, GM/SS-001, Neil on Ben More, doing a fine job

This is an activator QSO:
V2.0, MM0FMF/P, GM/SS-002, 14/04/2013, 14:15, 144MHZ, FM, 2M0NCM/P, Neil at home, doing a fine job

This is a summit to summit QSO:
V2.0, MM0FMF/P, GM/SS-002, 14/04/2013, 14:15, 144MHZ, FM, 2M0NCM/P, GM/SS-001, Neil on Ben More, doing a fine job

This is an invalid QSO, which will be rejected:
V2.0, MM0FMF/P, 14/04/2013, 14:15, 144MHZ, FM, 2M0NCM/P, Neil and me at home, doing a fine job

This would also automatically prevent users from inadvertently uploading their activator files as chaser data and vice versa. I did this once. It was a pain to delete each of the 30 false chaser QSO manually.

Another thought: allow either activation or chase records to be edited
online to add the “other” summit code manually and thereby convert an
existing record to be changed into an S2S record. Editing existing
records can be ugly, yes, but this might buy time to develop the next
version of file upload format.

This would make things very convenient for all using the legacy CSV/TSV-import.

73 de Michael, DB7MM

G0LGS · 24 April 2013 04:09

I was thinking that even just adding the S2S Reference after the existing comment (with or without any commas in the comment) with careful coding the database upload page could still determine if the old format is being used or the new format on a line by line basis (it will presumably be checking that the S2S info is valid anyway).

Otherwise how about keeping the existing upload method for old format CSV and adding a new one for the new format, with proper handling of quoted comments (where necessary) - this would do away with any need of headers or type identifiers in the file. if the last field exists then Validate it as a summit reference if not then you don’t need to bother.

Stewart G0LGS

F6ENO · 28 April 2013 00:25

In reply to MM0FMF:

Hello Andy
This thread is very interesting.

Just 2 questions for now :
Why do you want to upload some S2S QSO files ? is there a new S2S award ?

Is the actual field number of a line important for your system ?

73 Alain F6ENO

G0LGS · 28 April 2013 01:03

Alain,

The details of the S2S Award are at http://www.sota-shop.co.uk/faq.php

Stewart G0LGS

F6ENO · 28 April 2013 10:40

In reply to G0LGS:

Thanks Stewart

Of course, I missed some reflector information;
I just went on Summits on the Air and saw nothing about S2S award.

Best 73
Alain F6ENO

MM0FMF · 14 May 2013 23:38

In reply to DB7MM:

After careful thought I have decided that Michael’s suggestion is the best way forward.

So the format is

,

Chaser:
V2,MM0FMF/P,14/04/2013,14:15,144MHZ,FM,2M0NCM/P,GM/SS-001,“Neil on Ben More”

Activator:
V2,MM0FMF/P,GM/SS-002,14/04/2013,14:15,144MHZ,FM,2M0NCM/P, “Neil at home”

S2S:
V2,MM0FMF/P,GM/SS-002,14/04/2013,14:15,144MHZ,FM,2M0NCM/P,GM/SS-001,“Neil&Me”

You will be able to mix activator, chaser & S2S records in the same file. This should make exporting from existing software easier. However, due to the way the existing code is structured you have to import the file twice. Once to collect any activation records using Import Activator CSV and again to import chaser and S2S records using Import Chaser CSV. One day there will be unified importing but it’s a massive job and not worth doing today.

Old format data should would just as before. I have tried to minimise any changes in the original code but some work was needed so I may have broken something. If this is the case it will be changed so that old format files will still work. It’s imperative that this is so.

For activation data, the old parser would consider a change in date as the end of an activation. This caused no end of issues for non-EU activations. There was a crude hack to get around it but it was rough. The new parser uses the end of activation records in the file or a change in summit to mark the end of an activation.

As I said you can mix the records so you could have a file that contained

and the parser(s) will find the data. The only proviso is that the entries must be sorted in date/time order. i.e. oldest at the top.

I have most of everything working just fine. But it’s not ready for release yet. I need to do a lot more boundary testing and I needed to enforce the fixed number of fields and quotes delimited comments. I wanted to give everyone due notice though.

I will make a formal announcement when I release the new uploader.

Andy
MM0FMF