Herr Bischoff


How to Extract All URLs From a Text File

While looking for a simple way to extract all URLs from an HTML file, I came across this gem.

grep -o -E "https?://[][[:alnum:]._~:/?#@!\$&'()*+,;%-]+"

According to RFC 3986 only certain characters are valid for use in URL strings. I haven’t observed any obvious issues with it this far.