Are there any issues with just changing the Default Character Set to UTF-8? Currently our system is set to Western European (ISO) and we don't want to flick the switch if this is going to cause problems with existing sites within the system. Or are we just overreacting and nothing noticeable will happen?
Change Default Character Set
Short answer. Don't just flick the switch.
You'll encounter a few issues, most notibly some characters (outside of the UTF-8 char range) that would/may have been processed and commited to the database prior to changing the matrix settings. The first thing you might notice is some characters showing up differently (probably show as question marks) when previewing pages in a browser. Usually microsoft smart quotes are offenders here, I'm sure there would be a few based on years of copy/pasting into the admin editor.
Also, changing Matrix config setting is just one piece of the puzzle. Good news is, it's not too difficult to pull the puzzle together. I suggest engaging with your local Squiz office to advise/assist on best course of action. The process would go a little bit like this…
[list=1]
Thanks Scott - thought there might be a few additional steps.
So after following the process you outlined, there would still need to be a content review of sites to update the invalid characters?
Step 3 and 4 would identify and convert/clean up the majority of common offending characters, this means review/sign off should probably incorporate a review of hot spots/high traffic areas within your system. In most cases this just means a quick review of the high traffic/hot spot areas from the front end/browser view to make sure content appears as expected. Anything that might have slipped the net won't be show stoppers and could be dealt with on a case by case basis.
We did this and it was very successful. We did, as Scott said, have some issues with Smart Quotes, but was actually able to clean those up using the Search and Replace tool. That got about 90% of them, the rest we had to find manually.
Through our SLA Squiz UK just tried to upgrade our system from version 4.18.6 > 5.1.10.0 and in the process change the Default Character Set from Western European (ISO) to UTF-8 and found serious issues.
Some of these include:
- Encoding issues with most extended characters
- Non Latin characters displaying as numbers
- Forms no longer submitting if any extended character is used
- Unable to edit and save existing assets
- Getting some strange Fatal errors like this:
Fatal error: Uncaught exception 'Exception' with message: 'Unable to update attribute values for asset "Burger & Beer deal" (#397252) due to database error: SQLSTATE[22021]: Character not in repertoire: 7 ERROR: invalid byte sequence for encoding "UTF8": 0xa3' in [SYSTEM_ROOT]/core/include/asset.inc(3381): saveAttributes(false) #1 [SYSTEM_ROOT]/core/include/asset_edit_interface.inc(2173): saveAttributes() #2 [SYSTEM_ROOT]/core/include/asset_edit_interface.inc(1891): processInline(Object(Calendar_Event_Recurring), Object(Limbo_Outputter), false, 'details') #3 [SYSTEM_ROOT]/core/include/asset.inc(5443): process(Object(Calendar_Event_Recurring), Object(Limbo_Outputter), false) #4 [SYSTEM_ROOT]/core/include/asset_manager_edit_fns.inc(205): processBackend(Object(Limbo_Outputter), Array) #5 [SYSTEM_ROOT]/core/include/asset_manager.inc(9222): paintBackend() #6 [SYSTEM_ROOT]/core/include/backend.inc(1725): paintBackend(Object(Backend)) #7 [SYSTEM_ROOT]/core/include/backend.inc(209): _printMain() #8 [SYSTEM_ROOT]/core/include/mysource.inc(590): paint() #9 [SYSTEM_ROOT]/core/web/index.php(30): start()
According the Squiz UK it seems to be related to PHP issue and the bug has to be fixed by Squiz AU (Labs).
Hopefully I will hear back something end of next week and share the outcome.
Tuguldur
I wish Scott Hall’s procedure was that simple if Squiz UK knew what it was doing or are able to follow the instructions set out by the labs team.
It should be compulsory for all Squiz UK support team to participate in the forums.
Update:
We have managed to move Squiz Matrix 5!!! and now on UTF-8 AT LAST!
The process was painful and exhausting. It took almost 2 months to complete the upgrade and at times it felt like it was never going to happen.
If anyone is still on the Western European format for the back end, stop whatever your doing and make the effort to convert to UTF-8 now!!
The longer you leave the more problematic it is going to be in the future.