Friday 31 May 2019

.net - Regular expression for parsing links from a webpage?



I'm looking for a .NET regular expression extract all the URLs from a webpage but haven't found one to be comprehensive enough to cover all the different ways you can specify a link.



And a side question:



Is there one regex to rule them all? Or am I better off using a series of less complicated regular expressions and just using mutliple passes against the raw HTML? (Speed vs. Maintainability)



Answer



((mailto\:|(news|(ht|f)tp(s?))\://){1}\S+)


I took this from regexlib.com



[editor's note: the {1} has no real function in this regex; see this post]


No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...