Saturday, 4 January 2020

string - Regex how to match an optional character



I have a regex that I thought was working correctly until now. I need to match on an optional character. It may be there or it may not.




Here are two strings. The top string is matched while the lower is not. The absence of a single letter in the lower string is what is making it fail.



I'd like to get the single letter after the starting 5 digits if it's there and if not, continue getting the rest of the string. This letter can be A-Z.



If I remove ([A-Z]{1}) +.*? + from the regex, it will match everything I need except the letter but it's kind of important.



20000      K               Q511195DREWBT            E00078748521
30000 K601220PLOPOH Z00054878524



Here is the regex I'm using.



/^([0-9]{5})+.*? ([A-Z]{1}) +.*? +([A-Z]{1})([0-9]{3})([0-9]{3})([A-Z]{3})([A-Z]{3}) +([A-Z])[0-9]{3}([0-9]{4})([0-9]{2})([0-9]{2})/

Answer



Use



[A-Z]?



to make the letter optional. {1} is redundant. (Of course you could also write [A-Z]{0,1} which would mean the same, but that's what the ? is there for.)



You could improve your regex to



^([0-9]{5})+\s+([A-Z]?)\s+([A-Z])([0-9]{3})([0-9]{3})([A-Z]{3})([A-Z]{3})\s+([A-Z])[0-9]{3}([0-9]{4})([0-9]{2})([0-9]{2})


And, since in most regex dialects, \d is the same as [0-9]:



^(\d{5})+\s+([A-Z]?)\s+([A-Z])(\d{3})(\d{3})([A-Z]{3})([A-Z]{3})\s+([A-Z])\d{3}(\d{4})(\d{2})(\d{2})



But: do you really need 11 separate capturing groups? And if so, why don't you capture the fourth-to-last group of digits?


No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print ...