Missing characters

OK7OK · 29 April 2012 12:01

A note about missing characters in Czech Summit names

I had 2 people ask me yesterday about the name of one of the summits I was on because some of the characters in the summit’s name were displayed as question marks and not the correct Czech letters which have diacritics on them, I understand this can be confusing not sure why it is caused, could be for few reasons maybe the character encoding needs looking at but am sure someone more qualified that me can tell us

Explanation:

Yesterday 28/04/2012 at 13:07 I did a spot for summit OK/US-002 if you move your mouse cursor over the summit reference a little pop up box will appears and it shows “Velk? ?pic?k” if you click on the reference it takes you to the info page which will show the same, this however should be displayed as Velký Špičák.

If you want to see the name of the summit displayed at least with out confusing question marks go to the “database” page then “Summits” then “Find Summits” and place the summits reference here you will then see the summits name “Velký Špicák” not 100% but much more legible

73

Darius OK7OK

OK7OK · 29 April 2012 12:04

In reply to OK7OK:

OK so as you can see here also it seems that the system does not like the letter “c” with the diacritic it has translated it here as “č” this should be a small “c” with little “v” above it. so best bet as said look it up in database.

73 Darius OK7OK

G0RQL · 29 April 2012 12:37

Thanks for the explanation Darius,I had found the best bet going to the database but wondered why it showed more than average ? marks.
See you on your next.73 Don.

G8ADD · 29 April 2012 12:42

In reply to OK7OK:

Hi, Darius, you will probably get a more authoratative reply from a more computer-literate member of the MT later, but I can tell you that this is a problem we have known about for some time but have not been able to spare the time to try and correct it. The database is at present unable to handle diacritic marks or none-roman characters as it has a limited character set. We hope that this can be dealt with in the not too distant future and must ask you to bear with us for the present time.

73

Brian G8ADD

OK7OK · 29 April 2012 12:47

In reply to G8ADD:

Hi Brian, no problem, I myself and all Czechs know what to replace the question marks with as they recognise the words, just want to help those who wonder what they are suppose to replace them with

73

Darius

OK7OK · 29 April 2012 12:55

In reply to G0RQL:

Hi Don, well the 2 characters yesterday were easy to explain
s with diacritic(hook) is pronounced as SH in English
c with diacritic(hook) is pronounced as CH in English

glad it was not

r with diacritic(hook) as it has no equivalent in English and most none native Czechs struggle to master this one, it took over two to get it right.

Cheers and catch you from the next one, got a small 2 day trip planned for next Tuesday/Wednesday so hopefully weather will stay good. will post info.

73
Darius

OH9FZU · 29 April 2012 14:02

In reply to OK7OK:
I think it would be good to use only normal us keyboard characters with some summits.
It could get very messy if you try to use OH/MU-023 Suoppâjävruáivi which is similar summit…making sure that all those special characters work right on all operating systems and browsers including smartphones could be very very very hard.

Jani OH9FZU

OK7OK · 29 April 2012 16:12

In reply to OH9FZU:

Good point, thinking back to the early days of the internet many courtiers just used stand US Keyboard as many websites and email programs would not support their characters but they could at least communicate, all locals know the correct characters in words in their own language anyway and a letter/character is much better than just a symbol or question mark.

Darius Ok7OK

F5VGL · 29 April 2012 18:15

In reply to OH9FZU:

Dearvva Jani,

Sami is really a foreign language to us Finns though some words are similar, which can also lead to confusion.

The seven bit ASCII has been extended to eight bits with with different ISO 8859 standards, for example ISO 8859-1 West European languages, 8859-4 Scandinavian/Baltic languages and 8859-10 Lappish/Nordic/Eskimo languages. Then there is also Unicode UTF-8. Supporting these character sets should not be difficult in web browsers. There is something like Character Enconding in the menu, where you can select which character set is used. It may be necessary have the correct fonts installed though. For example there is a separate package for the Japanese language in linux systems.

For the summit activation we give only the international reference number on morse. This can then be used to retrive the full name of the summit.

73, Jaakko OH7BF/F5VGL

F5VGL · 29 April 2012 18:38

In reply to F5VGL:

Looking into the page source on the sotawatch it looks like the “?” was put there instead of “ä” or “ö” when the html page was generated. In the Finnish pages there is usually a line like

content=“text/html; charset=iso-8859-1”

in the beginning. This gives the “ä” and “ö” correctly on the page. Probably easiest fix to the sotawatch is to copy and paste the correct characters to the html.

73, Jaakko OH7BF/F5VGL

OH9FZU · 29 April 2012 19:17

In reply to F5VGL:
Yes but still you often see that pages are not showing the characters right.
And the problem is all over the internet, sometimes it’s just easier to use us characters.
Btw sometimes sotaoh yahoo group messages are full of question marks and other weird characters…
And of course it’s not just the internet, some people use special characters in psk31 macros etc, if you shop online and you are 100% quaranteed to see question marks etc. in your address.

Jani OH9FZU

F5VGL · 29 April 2012 19:41

In reply to OH9FZU:

Btw sometimes sotaoh yahoo group messages are full of question marks
and other weird characters…

That should not happen since the ‘ä’ and ‘ö’ are in the Western 8859-1 and to my understanding also in UTF-8. To see correctly the simple text files that I have submitted there you may need to try different character encodings. Most likely UTF-8 will work.

73, Jaakko OH7BF/F5VGL

OH9FZU · 29 April 2012 20:04

In reply to F5VGL:
Yes I know it should not happen, but still it does and it’s the only yahoo group that I have that problem.

Jani OH9FZU

F5VGL · 29 April 2012 20:08

In reply to OH9FZU:

Maybe that was a copy-and-paste problem from my side? Also mixing two encodings to the same file will probably result to something not very readable.

73, Jaakko OH7BF/F5VGL

M1MAJ · 29 April 2012 23:22

In reply to G8ADD:

The database is at
present unable to handle diacritic marks or none-roman characters as
it has a limited character set.

Actually the database proper is fine. The CSV download in particular has correct UTF-8 encoding.

SOTAwatch has problems though. In particular the web pages declare themselves to be encoded in UTF-8 but are actually encoded in Windows-1252, or something like that. I need a few workarounds for that in my Twitter feed. But mostly, the characters are there correctly in the database. (The main exception is that the Ukranian names were broken in the database last time I looked).

You should find that the summit names are correct in the Twitter feed, because I derive those names from the database rather than the SOTAwatch page for just this reason.