Non greedy (reluctant) regex matching in sed?

Friday, 10 November 2017

Non greedy (reluctant) regex matching in sed?

itemprop="text">

I'm trying to use sed to clean up
lines of URLs to extract just the domain.

So
from:

http://www.suepearson.co.uk/product/174/71/3816/

I
want:

http://www.suepearson.co.uk/

(either
with or without the trailing slash, it doesn't
matter)

I have
tried:

 sed
            's|\(http:\/\/.*?\/\).*|\1|'

and
(escaping the non-greedy
quantifier)

sed
            's|\(http:\/\/.*\?\/\).*|\1|'

but
I can not seem to get the non-greedy quantifier (?) to work, so
it always ends up matching the whole string.

class="post-text" itemprop="text">

class="normal">Answer

Neither
basic nor extended Posix/GNU regex recognizes the non-greedy quantifier; you need a
later regex. Fortunately, Perl regex for this context is pretty easy to
get:

perl -pe
            's|(http://.*?/).*|\1|'

Blog

Friday, 10 November 2017

Non greedy (reluctant) regex matching in sed?

No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file