Conversion of accented characters in webpaths


(Douglas (@finnatic at @waikato)) #1

New Zealand has three official languages, one of which, Maori, features the use of the macron accent to indicate long vowel sounds in the written form.

 

One of our systems built in Matrix takes a user provided title for use as the webpath and something is replacing the accented a in that title: so that "Māori ..." is resulting in the webpath having "mori..."

 

My guess is that Matrix may be misidentifying the accented a as an invalid character for a url and stripping it out rather than providing an appropriate substitution? 

 

Happy to post a squizmap idea for this if that's warranted.


(Luke Wright) #2

Happy to post a squizmap idea for this if that's warranted.

 

Already done so for you: Squizmap #9386.

 

We do handle some accented characters but I'm pretty sure macrons aren't one of them, and they also don't have a named HTML-entity which would have seen it replaced too.

 

We keep a "character map" of these and it should be pretty simple to add the macron characters from Te Reo Māori to that. Thanks Douglas!


(Luke Wright) #3

Actually, they're already in the character map, but the character map needs to be enabled.

 

In System Configuration > section "Internationalisation Settings"... check that Replace Accented Characters in Web Paths? is set to Yes. (Alternatively, go to main.inc and enable SQ_CONF_LANG_USE_CHAR_MAP). It's not enabled by default for whatever reason.

 

Without this, it will drop the character because it doesn't have a named HTML entity (ie. you can't write macron characters like you can with Á for "á" for instance; you have to use ā for "ā"). With that replacement setting off, it will only keep characters with a named entity.

 

I'm not sure why it's not enabled by default (presumably most of our NZ users would need it enabled for instance), so I've left the Squizmap issue open for that purpose.


(Douglas (@finnatic at @waikato)) #4

It seems a bit odd that it's not defaulted to yes, I can't see why you wouldn't want that by default.

If you want to adapt the squizmap idea to be a change in that setting to default to yes, I'd vote for it (if I had a vote to spare).

 

If having it default to yes creates a problem, then alternatively could we have the squizmap idea modified so that macron accented characters are treated the same as those used in European languages (which don't require an internationalisation setting change).