I was wondering how someone would go about using unicode characters (specifically characters with macrons) for page names in MySource Matrix.
For example, if I include the character Ā (A with a macron, or character & # 256;),
both the front and back-ends (with some exceptions) display & # 256; rather than the character I've specified.
The odd thing is that content with the very same character is displayed correctly!
I have little idea about why it would be working in content, but for page titles (and other attributes) it is an issue that we are aware of, and this is partly a browser issue - Firefox does what you explain above; Internet Explorer usually changes the character to something else (it would return 'A' with a different accent, instead of A-macron). The reasoning has to do with the default character set encoding being ISO-8859-1 instead of Unicode (UTF-8), which is necessary for this to work.
Full support for UTF-8 in the back-end is included starting in the 3.6 series of MySource Matrix; you should be able to change the 'Default Character Set' system config option to "Unicode (UTF-8)" and those characters should work.
Naturally this works best on a new system (preferably by opening the main.inc config file after step_01 and changing the 'SQ_CONF_DEFAULT_CHARACTER_SET' setting to 'utf-8'). However, if you're only using ASCII characters so far in your system, you are likely to be able to change things over safely in the System Configuration screen without affecting the rest of your content.
I'm having a similar problem at the moment…
Matrix is setup for utf-8, and so is apache, 'AddDefaultCharset UTF-8'.
The (oracle) database is setup as: NLS_CHARACTERSET AL32UTF8
This was all displaying fine on our old hardware (solaris) and now is broken on our redhat setup.
The only difference I can really see is that the old db is: NLS_CHARACTERSET US7ASCII
Any known issues with the hardware and OS change possibly effecting it?
[quote]I'm having a similar problem at the moment…
Matrix is setup for utf-8, and so is apache, 'AddDefaultCharset UTF-8'.
The (oracle) database is setup as: NLS_CHARACTERSET AL32UTF8
This was all displaying fine on our old hardware (solaris) and now is broken on our redhat setup.
The only difference I can really see is that the old db is: NLS_CHARACTERSET US7ASCII
Any known issues with the hardware and OS change possibly effecting it?[/quote]
Sounds like you might have had a problem during the migration. What are you seeing on the frontend and backend? It is not going to be axactly the same issue as the one above.
At the moment, the best I can do is describe the behavior
All the apache headers say the page is UTF8
- If I create a new page, named 'ßßtestßß' it displays in the asset map, and front end as '??test??'. In the database it also appears to be '??test??'
System log for creating this page:
[2008-10-16 15:20:06][91217:andrew][1024:mysource notice][ ] [asset.attributes.fulllog.scalar - assetid:100511;] (Asset Attribute Changed) - The value of attribute "name" for asset "Page Standard #100511" has been changed from "" to "ÃÃtestÃÃ" [2008-10-16 15:20:07][91217:andrew][1024:mysource notice][ ] [asset.attributes.scalar - assetid:100511;] (Asset Attribute Changed) - The value of attribute "name" for asset "Page Standard #100511" has been changed from "" to "ÃÃtestÃÃ"
and then eg removing permissions on the page afterwards
[2008-10-17 08:48:04][91217:andrew][1024:mysource notice][ ] [asset.permissions.delete - assetid:100511;] (Asset Permission Deleted) - Admin permission has been deleted on asset "??test??" for "Unknown User"
Another thing which seems odd is when I commit some changes to the content of any page with unicode chars, 'ßßßßßßßß' it is fine on the front end, but when I go back in to edit the contents afterwards, opening the wysiwyg converts them over to '????????'
Thanks
[quote]At the moment, the best I can do is describe the behavior
All the apache headers say the page is UTF8
- If I create a new page, named 'ßßtestßß' it displays in the asset map, and front end as '??test??'. In the database it also appears to be '??test??'
System log for creating this page:
[2008-10-16 15:20:06][91217:andrew][1024:mysource notice][ ] [asset.attributes.fulllog.scalar - assetid:100511;] (Asset Attribute Changed) - The value of attribute "name" for asset "Page Standard #100511" has been changed from "" to "ÃÃtestÃÃ" [2008-10-16 15:20:07][91217:andrew][1024:mysource notice][ ] [asset.attributes.scalar - assetid:100511;] (Asset Attribute Changed) - The value of attribute "name" for asset "Page Standard #100511" has been changed from "" to "ÃÃtestÃÃ"
and then eg removing permissions on the page afterwards
[2008-10-17 08:48:04][91217:andrew][1024:mysource notice][ ] [asset.permissions.delete - assetid:100511;] (Asset Permission Deleted) - Admin permission has been deleted on asset "??test??" for "Unknown User"
Another thing which seems odd is when I commit some changes to the content of any page with unicode chars, 'ßßßßßßßß' it is fine on the front end, but when I go back in to edit the contents afterwards, opening the wysiwyg converts them over to '????????'
Thanks[/quote]
A few things:
1. does your design have the right character encoding - even though the headers should override this, its worth checking (I dont know the URL of your site to check).
2. It sounds to me like the database is not set to the right encoding, though it is hard to tell. If the characters are coming back as ?? then it looks like they are not being preserved in the DB. But if you had US7ASCII before this could not have been working either.
3. is NLS_LANG set correctly in your Oracle startup scripts? Which version of Oracle are you using? (and which version of Matrix?)
- <meta http-equiv="content-type" content="text/html; charset=utf-8" /> - looks ok
- Oracle 10.2.0.3, the only NLS parameter we have modified is also as per Squiz recommendations, nls_date _format.
It's quite puzzling…
[quote]1. <meta http-equiv="content-type" content="text/html; charset=utf-8" /> - looks ok
3. Oracle 10.2.0.3, the only NLS parameter we have modified is also as per Squiz recommendations, nls_date _format.
It's quite puzzling…[/quote]
There is a recent patch which adds a charset parameter to the oci_connect call. It will be in the next release, or you could patch it and see if it helps. It may be that your Oracle version needs the explicit parameter on connect.
Oracle doesn't need the explicit parameter on connect() though specifying it if you have the latest oci8 driver from PECL can significantly improve connection speed when used with oci8_pconnect(). Be aware that Matrix doesn't use oci8_pconnect() by default. If you make this change, make sure you switch your Oracle Database to use SHARED servers (instead of DEDICATED, which is the default). You'll also want as many shared server processes as you have Apache processes. This is tough, because they can use a LOT of RAM. If you really want to switch to using oci8_pconnect(), you should upgrade to Oracle Database 11g and enable database resident connection pooling (DRCP) which gives you much better performance without the RAM penalty on the database server.
However, to resolve this issue you will probably need to set the NLS_LANG environment variable for Apache as well. You can find out more in the Globalization chapter of the can significantly improve connection speed. I would imagine that setting NLS_LANG=AMERICAN_AMERICA.AL32UTF8 would probably work fine.
Avi,
Thanks for the link to that document, very helpful.
I didn't setup the apache myself, but it looks like the ORACLE_HOME is definitely not in there
TNS_ADMIN=/var/opt/oracle; export TNS_ADMIN LD_LIBRARY_PATH=/usr/lib/oracle/10.2.0.3/client64/lib; export LD_LIBRARY_PATH
I've added
NLS_LANG=AMERICAN_AMERICA.AL32UTF8; export NLS_LANG and everything seems fine.
Thanks again.
Setting ORACLE_HOME and ORACLE_SID will reduce the time it takes the OCI8 driver to parse the TNS name, though this could be a negligible savings. The biggest saving comes with specifying the character set during the oci_pconnect() call, as this saves an environment lookup which is costly. This is particularly noticeable on Matrix systems that are under load.