Monday 10 December 2018

bash - SED - Non greedy regex cant seem to work in sed




When I run a regex pattern from a online RegEx testing tool on the text below works fine. However, it is not working when using in sed on unix



Text:



001Transaction Successful2016-07-01-12:05:40.383N2016-07-01-12:05:44.171



RegEx:



(.*?)<\/DtTm>


Usage in Sed: Looking to remove anything between and



sed 's/(.*?)<\/DtTm>//g'



Expected Output:



001Transaction SuccessfulN

Answer



GNU sed has two modes, basic and extended. Neither of these, nor the single basic mode of less advanced sed implementations, permit non-greedy specifications. As per the info sed output:




Note that the regular expression matcher is greedy, i.e., matches are attempted from left to right and, if two or more matches are possible starting at the same character, it selects the longest.





So, if you need non-greedy, you will have to choose another tool, such as Perl (or something else supporting PCRE), which is probably what the online testing tool you mentioned is using.



The good thing is, the Perl substitute command is so stunningly similar to the sed one that you can often just change the program name (and possibly use a different delimiter character in complex REs so you don't end up with sawtooths like \/\/\/\/\/):



perl -pe 's|.*?||g'

No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...