I am having some trouble with copying from MS Word, into the WYSIWYG for a News Item. When using quotes ("") in word, and copying them into the WYSIWYG, they are getting converted to: ’ after submitting the news item, which is very strange. Using the Replace Text tool does not help. Is this something that tidy should be taking care of? Or am I missing something?
Right now I am just testing this, but I know this will be a problem for editors using our News asset builders when we launch our site.
Here is a screen shot of the two different kinds of quotes that I am getting, the first is the one that gets pasted in from Word. The second is just typed into the WYSIWYG.
[attachment=313:Picture_3.png]
Edited to add that it looks like all special characters are not being converted.
Picture_3.png (3.41 KB)
Double posting to say that I am thinking this has to do with the character set. What character set does Matrix use? I see that matrix.squiz.net is using:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
Should I be adding this to my site as well?
The character set is configurable on the System Configuration screen. I believe Matrix ships with ISO-8859-1 as default, but it can be switched to UTF8 to support multibyte languages.
Thanks Avi. This "does" solve my problem, but since we did not start the site out using UTF8, changing it to the screws up all special characters, including " and '. So, I have to leave it as ISO-8859-1 for now until I can figure something out.
It seems the problem here is Words "curly quotes", as talked about here: http://www.zacvineyard.com/blog/?p=7
Does anyone know if Tidy is configured to convert "smart quotes" from word?
And do news items even use Tidy?
I don't believe it is, but I'd need confirmation on that.
[quote]And do news items even use Tidy?[/quote]
As a WYSIWYG attribute, my immediate guess is "yes", but I'm not 100% sure.
One more comment about this...are there any guidelines as to switching from one charset to another? I take it that since all our data was submitted to the database as ISO-8859-1, that changing to UTF8, and then trying to display pages, is why it is getting screwed up...any suggestions are welcome before I contact support.
I suspect it is better to set the character set as early as possible. Your database (if you're using PostgreSQL) should be in SQL_ASCII mode which just stores higher byte characters uninterpreted. This is capable of storing UTF8 characters, but Matrix does all the conversions.
I'm surprised that switching your character set stuffs up your quotes (" and ') as these should be lower order characters. I would contact Squiz Support anyway to see if we've ever had this issue before.
Thanks, I will contact support.
[quote]I suspect it is better to set the character set as early as possible. Your database (if you’re using PostgreSQL) should be in SQL_ASCII mode which just stores higher byte characters uninterpreted. This is capable of storing UTF8 characters, but Matrix does all the conversions.
I’m surprised that switching your character set stuffs up your quotes (" and ') as these should be lower order characters. I would contact Squiz Support anyway to see if we’ve ever had this issue before.[/quote]
Just a note: I changed the search manager to accept 1 character searches (just for now), and used the Search and Replace tool to look for MS Word “smart quotes”, and it is actually doing a great job at finding them and replacing them with normal quotes. Not sure why I didn’t think of this before…
Crossing my fingers that it will solve all my UTF8 problems. 