Keyword modifier to return first sentence only


(Klye) #1

Is there a keyword modifier that returns an array up to the first instance of a character? In this case, I want to present only the first sentence of a metadata field, so up to the first instance of a full stop. I'm guessing a regex is needed but can't find the correct format for this.   I have been using the maxwords modifier [ %asset_metadata_<fieldname>^maxwords:20% ] to guestimate the length of a sentence, but it's hit and miss. 

 

Please and thank you

 

(Peter McLeod) #2
 
Hi 
 
A regex could work, alternatively a combition of the 'explode' and 'array_slice' keyword modifers could also be used.
 
Just looking for a fullstop might be too simple though. 
 
Would be reasonable to say that a terminal point character (or one followed by a quote mark) eg (. ! ? or ." !" ?" or .' !' ?') followed by a white space defines the end of a sentence.  
 
This would be easy to implement. But it would break in cases of words abbreviated with a fullstop, or if a three fullstop ellipis is use in the middle of the sentence, or depending on how quoted text is added in the body of a sentence etc. Truly identifying a 'sentence' gets into the area of natural language processing.Though if this was to happen it wouldn't be much different to what you are doing by limiting the string to 20 words anyway
 
Create a regex asset and add:
/(?<=\.\s|!\s|\?\s|\."\s|!"\s|\?"|\.'\s|!'\s|\?'\s)(.|\s)*/
Leave the replacement empty. It will match everything after the first of any of the terminator characters followed be a white space but not include it, when the the matched text is replaced.
 
To implement you probably want to remove html, lines breaks and multiple spaces to make things work as expected:
%asset_metadata_[METADATA FIELD]^striphtml^replace:\s+: ^preg_replace:[REGEX ASSET ID]%

Thanks

Peter

(Mahearnpad) #3

Can you use explode (where the delimiter is the full-stop?) Then you get the first array index, which should be the first sentence.


(Bart Banda) #4

Could you not just simply do %globals_asset_metadata_fieldname^replace:\..*:.%

 

Will replace everything after the first full stop with nothing. You might also need to do a ^striphtml modifier first if your field is a WYSIWYG one.  


(Klye) #5

Thanks All, Bart's suggestion seems to be working as I need - it's a raw HTML Paint Layout so the stripHTML modifier wasn't necessary. I did have to edit a couple of the sources to re-enter the full stop, as it wasn't being picked up at first. May have had something to do with the way it was originally entered (likely a copy+paste). Working well now, thank you.