we all know that it’s impossible to perfectly match URLs from human input… but i spent a little time compiling various solutions on the boards and have ended up with this (note – this is largely pieced together from other sources – I’m not claiming original work):

The more difficult part was differentiating what “should” end a sentence. Dots are obviously allowed and present in most traditional URIs at a TLD, but what if a sentence ends with a dot? “This is a link to http://google.com.” – we obviously need to catch it. Also tried blacklisting first, but was convinced by others who tried to tackle this that whitelist was the way to go. Oddly, URLs support virtually any character (even brackets and parens, which are used by big sites like wikipedia). One case where I deviated was the protocol and TLD – most people tried to stuff in every known protocol with ORs, and limited TLDs to a certain number of characters… I’ve allowed any series of alpha characters as a protocol (so it’ll pick up geo:// or maps:// or whatever), and don’t limit the TLD length – even .info is common enough to break many of of the patterns I found that capped at 3 characters – I don’t limit in case someday a new TLD is introduced – it’s highly doubtful it’ll ever be more than say 5, but i don’t think a limit serves a real purpose… certainly considering the commonality of local servers installs whose vhosts might use naming conventions like http://upshots.local).