Tuesday, 10 April 2012


I frequently found myself wanting to get all files matching a particular Regular Expression that were linked to from a particular page (such as all linked PDFs). Rather than doing the sensible thing and trying to find something that would do this for me I threw together a Python script and started using it.

If you find that this is a particular itch of yours and/or wish to contribute to the rather simple codebase: it can be found at Bitbucket and I'd rather like to stop using beautifulsoup and start getting links just with regex (so as to remove the library requirement)

