Curly apostrophes being deleted

We didn't previously have this problem; I'm not sure if it's due to upgrading to 3.18.11 or to declaring charset=iso-8859-1 in my doctype (which is what Squiz set us up as; I don't believe we specified anything).


I'd be very happy if it was turning the ‘ into ' but it isn't, it's just deleting them altogether. As a grammar nazi I am finding this stressful!



Any ideas on what could be causing it and what can be done?

Did you change the charset after content was added?

What version did you upgrade from? I am working for a client who has upgraded from 3.16.12 to 3.18.9 and there seems to be some similar voodoo in their system.

Can you show us some screens of how the content looks in the wysiwyg and then in the browser, plus source?

Interestingly, we upgraded from 3.16.12 to 3.18.11…


I did not change the charset and other non-related problems make it clear that we are indeed using iso-8859-1. I did start declaring it in my design but nothing was actually changed system-wise.



The code of the WYSIWYG editor prior to committing. Text was pasted from Word and then cleaned using the replace text tool (one that looks like an eraser).

[attachment=394:beforesave.gif]



The code of the WYSIWYG editor after committing. Curly quotes and apostrophes are gone.

[attachment=392:aftersave.gif]



On the front end.

[attachment=395:onpage.gif]



The page source code on the front end is the same as appeared in the backend.

[codebox]<p>We are discovering what our website is like for you to use. Were interested in what you think, good or bad. You cannot make a mistake because there are no wrong answers; we are testing the site, not you! Id just like to hear your opinions as you go.</p>[/codebox]



EDIT: removed unneeded screengrab.
aftersave.gif (11.3 KB)
beforesave.gif (11.2 KB)
onpage.gif (15.3 KB)

Hmmm…Last one to the bug tracker is a rotten egg!

lol Shane.


I have a feeling it's HTML Tidy related. There's certainly some character encoding issues going on with it with >= 3.18



We've inserted an extra clean-up module infront of tidy to deal with a host of character encoding and PEBCAK issues…

Bug lodged, Bug 3696.

Further to this…


The system I am working in converts pre existing apostrophes to upside down question marks and affects asset attributes as well, like page name and page short name…


There were some fixes for an issue that sounds like this one. They only affected some systems, and were to do with how html tidy was called (seemed to depend on system locale setup, apparently), so they were hard to isolate. The new code only went into 3.22.0 however.

Are you sure that both the system character set setting and the doctype of the page are the same? this can also cause issues.

The patch was on 10 February, you could ask Squiz to apply and see if it fixes the issue.

Update of /home/cvs/fudge/wysiwyg/plugins/html_tidy
In directory delta.squiz.net:/tmp/cvs-serv12629

Modified Files:
html_tidy.inc
Log Message:
Work with Tidy using a proc_open/close() and pipes handled in PHP, rather than piping the document in from a command line. Minimises potential underlying system differences.

File: html_tidy.inc
Log: Bug 3696

So SLA requests are the go for 3.18 then? There is definately voodoo with HTML tidy and the accessibility checker in this system. On first save the curly brackets remain in tact, it is the next time you return to the wysiwyg that they are changed to question mark. It appears that tidy isn't working at all even though it is set up exactly the same as previously working versions. This client has hundreds of authors working on thousands of pages in just over a hundred sites. It took nearly a year to get an upgrade from 3.16 to 3.18 finalised i dont think there is the scope to upgrade to 3.22 overnight. Amazing that this hasn't come up in 3.18 before and been fixed.

Thanks for that Justin; I've had to log a support request as we don't have the skills here to investigate further. When Squiz support suss it out I will post here what the problem was for future reference. :slight_smile:

The problem was, if a recall correctly, that they moved to the bundled PHP version of tidy with PHP5, but this broke things on some installs compared with the old standalone version.


If they are changing to questions marks that *sounds* like a different bug. What database are you using and what is the database character encoding set to? I have seen the ? issue if Oracle is set to the wrong encoding in the database, nothing to do with tidy at all. What happens in a raw html div?

Without access to the system it is quite hard to diagnose these issues. Actually with access they are quite obscure sometimes!

[quote]Further to this…


The system I am working in converts pre existing apostrophes to upside down question marks and affects asset attributes as well, like page name and page short name…[/quote]



Can you give a URL of a page with this on?

Could except that it was in a very prominent spot and had to be fixed straight away. It was in the metadata fields of redirect assets that were being listed by an asset listing that was nested into other asset listings that were nested into an asset listing being used by a nested asset design area on a customisation and all with dynamic root replacements. It took me longer than the upgrade(i didn't do the upgrade) to actually find what was feeding the offending character and longer again to fry the thousands of cache's that seemed to exist…never boring.


Anyway, long story not as long. Seems like the upside down question mark was a one off or maybe the problem was my eyes after banging my head for so long trying to find it. The replacement of the curly apostrophe, and about 30 other characters, with a stock standard here i am you bugger question mark exists in every other asset. I could give you about 200 url's today and maybe 300 tomorrow but Squiz would get cranky with me for spamming out a clients site with issues to all the punters out there.



The HTML container is a good one to raise because it just happens to be something i thought of. While looking into some safe edit voodoo i got the bright idea to check it out so i changed the wysiwyg to a raw html and played around for a while. I couldn't find anything and got bored so i canceled the safe edit and was met with some nasty as errors, with a new found interest I kept pushing and found that the original content which is supposed to be safe was overwritten with the content i had generated in safe edit.



Anyway, again, i will now halt the longer than not so long story and leave it for the SLA tug of war. I found a stack of issues, some of which are consistent in other systems i have tested them in. Database? maybe, but all pre upgrade content is sitting there with a nice green to go html tidy tag and any post upgrade save or new content gets a nasty bugger off i'm not getting out of bed html tidy tag. But then again I haven't yet seen a 3.18 system that doesn't have tidy issues, and i thought the character issue was just more system specific voodoo until Rachels post.



I know matrix well and could fill the forums(or write a book) with the issues i have seen in this one system alone but it is time to get back to constructive stuff.

Squiz support upgraded our HTMLtidy and put in a patch and all seems well now. :slight_smile:

I just upgraded 3.22.2 -> 3.26.2.


It seems that htmltidy (or something) was run on all divs (including raw html I think) in the system and converted quotes and apostophes to ? and upside down ?



The pages rendered fine in 3.22.2 and had not been touched, so I'm pretty sure the upgrade caused it. If I look at the pages in rollback they look ok. Could it maybe have something to do with contexts?







Edit: When I acquire locks and commit the divs they get fixed. It's only a problem with characters which were inserted using the 'Insert Special Character' tool.

Are you sure that you did not change your character encoding?

Yep, nothing changed except the matrix version.