Thursday 19 September 2019

Converting PCRE recursive regex pattern to .NET balancing groups definition

PCRE has a feature called recursive pattern, which can be used to match nested subgroups. For example, consider the "grammar"



Q -> \w | '[' A ';' Q* ','? Q* ']' | '<' A '>'
A -> (Q | ',')*
// to match ^A$.



It can be done in PCRE with the pattern



^((?:,|(\w|\[(?1);(?2)*,?(?2)*\]|<(?1)>))*)$


(Example test case: http://www.ideone.com/L4lHE)



Should match:




abcdefg abc,def,ghi abc,,,def ,,,,,, [abc;] [a,bc;] sss[abc;d] as[abc;d,e] [abc;d,e][fgh;j,k]
[b;,] <,,,> <> <><> <>,<> a<<<<>>>> <<<<<>>>><><<<>>>>
[a;b] [[;];] [,;,] [;[;]] [<[;]>;<[;][;,<[;,]>]>]



Should not match:



bc> [a;d,e] [a] <<<<<>>>><><<<>>>>> <<<<<>>>><><<<>>> [abc;def;] [[;],] [;,,] [abc;d,e,f]
[<[;]>;<[;][;,<[;,]>]]> ]




There is no recursive pattern in .NET. Instead, it provides balancing groups for stack-based manipulation for matching simple nested patterns.



Is it possible to convert the above PCRE pattern into .NET Regex style?



(Yes I know it's better not to use regex in for this. It's just a theoretical question.)



References

No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...