I'm trying to use sed to clean up
lines of URLs to extract just the domain.
So
from:
http://www.suepearson.co.uk/product/174/71/3816/
I
want:
http://www.suepearson.co.uk/
(either
with or without the trailing slash, it doesn't
matter)
I have
tried:
sed
's|\(http:\/\/.*?\/\).*|\1|'
and
(escaping the non-greedy
quantifier)
sed
's|\(http:\/\/.*\?\/\).*|\1|'
but
I can not seem to get the non-greedy quantifier (?
) to work, so
it always ends up matching the whole string.
No comments:
Post a Comment