Regular Expression Asset not working


#1

Matrix Version: 5.2

Hi everyone,

I am trying to build a simple regex that will split the file name when it encounters a hyphen.

For example, the file name is

a326-bachelor-of-international-studies.pdf

I would like the regex to break the file and give me

  • a326
  • bachelor of international studies

As a first step, I have setup a Regex asset and specified this regex in regex asset

/[^-]*/ (I have tested this and it gives me a326)

I have an asset listing, where I am trying to print it (a326) like this

%asset_name^preg_match_result:846841^array_slice:1:1%

as its mentioned in manuals that

preg_match_result: returns regular expression matches on the returned keyword value as an array. This is the same functionality as the preg_match_all() PHP function, and should be used in conjunction with the array keyword modifiers.

All its printing is an empty array i.e. []

Slight variation %asset_name^preg_match_result:846841^array_slice:0:1% is printing

[[“a326”,"",“bachelor”,"",“of”,"",“international”,"",“studies.pdf”,""]]

I am not sure why its returning a nested array, instead of a simple array.

I have tried escaping couple of different characters ^ and - in the regex, but its not making any difference.

Also, if someone can please explain how to define a regex “replacement” and then how to use it ?


(Bart Banda) #2

You might not need to use the regex asset, I can do this (in latest 5.3 at least) with just keyword modifiers:

%asset_name^replace:-.*:% %asset_name^replace:\^[\^-]*-\s*:%

Will print: a326 bachelor-of-international-studies.txt

Then if you want to remove the rest of teh dashes, just to:

%asset_name^replace:-.*:% %asset_name^replace:\^[\^-]*-\s*:^replace:-: %

Would that work?


#3

Thanks Bart, This will work, but as I mentioned I need to print 2 pieces of info something like this

  • a326
  • bachelor of international studies

from a326-bachelor-of-international-studies.pdf

So far I have tried

%asset_name^replace:-.*:% —> This returns a326. All good
%asset_name^replace:.+?[?=-]:% —> This returns studies.pdf. But when I test this expression .+?[?=-] on https://regex101.com/ it matches a326- so Squiz should return bachelor-of-international-studies.pdf rather than studies.pdf

So confusing …


(Bart Banda) #4

What happens if you try what I had?

%asset_name^replace:\^[\^-]*-\s*:^replace:-: %


#5

It gives me this a326 bachelor of international studies.pdf

What I want is this bachelor of international studies


(Bart Banda) #6

Hmm, that’s strange, I get bachelor of international studies.txt

Actually, I think it may be due to escaping the ^ character not being supported in your version. Was only added in 5.3.3.0.

Your regex version correctly returns studies.pdf because that’s what it matches on.

Maybe try sticking:
^[^-]-\s
In a regex asset and use that?

When I try that on regex101 it works: https://regex101.com/r/CeT8iP/1

Then you can just use a ^replace:-: % modifier after that to replace the remaining dashes with spaces.


#7

Thanks Bart, when I try to add this ^[^-]-\s to an Regex asset, I get this PHP Warning …

Regular expression “^[^-]-\s” is not valid. It will be ignored when applying the keyword modifier

Squiz takes this /^[^-]-\s/ but it gives me a blank array as output i.e []

Any other ideas ?


(Bart Banda) #8

Ah, the stars got stripped by markdown, should be:

^[^-]*-\s*

So this works for me:

Note that the /-/ replacement is getting replaced with a blank space

Then use: %asset_name^preg_replace:10503%


#9

Great, Many thanks Bart. All working now.

Can’t believe I spent over 4 hours trying to make it work … :sweat: