The URL of assets with non A-Z chars


(Daniel Kolvik) #1

Hi,


when creating assets including chars like, å ä ö. The web path (URL) to that are saved with just cutting of those chars.



For example:



Creating an asset with the name "Öst" will be given the URL "st"



I want it to be saved to the most common char to the one not allowed in regular URL:s. In this way:



Å and Ä is saved as A

Ö is saved as O



Where can I find this. I guess there's some function creating the URLs today, wheres that at?


(Nathan de Vries) #2

In core/include/general_occasional.inc, there's a function called make_valid_web_paths(), which gets passed an asset name. I'm pretty sure this will work, but just note that it's a hack. Proper development time would need to be spent to properly convert unicode characters with umlauts, circumflexes, diacritics, diaeresis and graves.


You will need to replace the existing make_valid_web_paths() function with this:


    /**
    * Returns an array of the passed web paths made valid
    *
    * @param array	$paths	array of web paths to check
    *
    * @return array(string)
    * @access public
    */
    function make_valid_web_paths($paths)
    {
    	$valid_paths = Array();
    	foreach ($paths as $path) {
      $path = trim($path);
    
      // get rid of paths that are the same as the backend suffix
      if ($path == SQ_CONF_BACKEND_SUFFIX) {
     	 trigger_localised_error('SYS0114', E_USER_WARNING, $path);
     	 continue;
      }
      // get rid of paths that start with the two underscores
      if (preg_match('/^__/', $path)) {
     	 trigger_localised_error('SYS0115', E_USER_WARNING);
     	 continue;
      }
    
      $replacements = Array(
          'ä'	=> 'ae',
          'ö'	=> 'oe',
          'ü'	=> 'ue',
          'Ä'	=> 'Ae',
          'Ö'	=> 'Oe',
          'Ü'	=> 'Ue',
          'ë'	=> 'e',
          'ï'	=> 'i',
          'ÿ'	=> 'y',
          'Ë'	=> 'E',
          'Ï'	=> 'I',
          'Ÿ'	=> 'Y',
            );
    
      $path = str_replace(array_keys($replacements), array_values($replacements), $path)
    
      // no ampersands in web paths
      $path = str_replace('&', '_and_',  $path);
    
      // no spaces in web paths
      $path = preg_replace('/\\s+/', SQ_CONF_WEB_PATH_SEPARATOR, $path);
    
      // no parentheses
      $path = preg_replace('/[\(\)\[\]]/', '', $path);
    
      // taken (in part) from info here -> http://www.w3.org/Addressing/URL/5_URI_BNF.html
      $path = preg_replace('/[^a-zA-Z0-9\-$_@.!*~(),]/', '',  $path);
    
      // ignore blanks
      if ($path !== '' && !in_array($path, $valid_paths)) {
     	 $valid_paths[] = $path;
      }
    	}
    	return $valid_paths;
    
    }//end make_valid_web_paths()


Change, remove, or add characters to the $replacements array as you see fit.

(Daniel Kolvik) #3

Great!


Now I know where to find it…



I'll have a look in it.