Outputting utf-8 characters from the REST resource


(J Stephen) #1

When using the rest resource to process some JSON containing UTF8 escape characters such as \u00fc (u umlaut) I was getting dodgey characters back.


I set about debugging this and having spent a day or two on it have reduced it to two things:



Firstly, the installed version of Spidermonkey does not appear to interpret these characters correctly, so I tried the latest version of Rhino, which does. Here's Spidermonkey on the command line:


    
    ~> js
    js> '\u00fc'
    
    js>


... and Rhino:

    
    Rhino 1.7 release 2 2009 03 22
    js> '\u00fc'
    ΓΌ
    js>


So I modified data/private/conf/tools.inc to temporarily point at the jar of Rhino:

    
    #define('SQ_TOOL_JS_PATH', '/usr/bin/js');
    define('SQ_TOOL_JS_PATH', '/usr/bin/java -jar /home/me/rhino/js-14.jar');


Secondly, the js process requires the LANG environment variable to be set. The only way I could find to do this was to pass it direct from the PHP in packages/web_services/rest/page_templates/page_rest_resource_js/page_rest_resource_js.inc, like so:

    
    #$process = proc_open(SQ_TOOL_JS_PATH . ' - ', $descriptorspec, $pipes);
    $process = proc_open(SQ_TOOL_JS_PATH . ' - ', $descriptorspec, $pipes, NULL, Array('LANG=en_GB.utf8'));


With these changes the REST resource correctly interprets the utf8 characters.

I would prefer not to have to modify the PHP, so my next step should be to compile the latest version of Rhino so it's at /usr/bin/js and somehow set LANG elsewhere, although I've tried it using Apache SetEnv and adding it to default profile, but these do not work. But before I embark on this, does anyone have any experience of this? Am I on a massive wild goose chase? Is there a simple solution?

(Nic Hubbard) #2

I think that you should submit a bug report for this.