Pooya Karimian
Blog Archives: HTML::TreeBuilder
March 07, 2004
HTML::TreeBuilder
I found HTML::TreeBuilder a useful and easy to use Perl module for filtering out unwanted tags. Just take a look at an example:
I was browsing http://www.hottest-lyrics.com and I found it a good place for downloading lyrics of my favorite singers. The good points are that it has the lyrics categorized for each album and also they can be downloaded by recursive wgets:
$ wget -r http://www.hottest-lyrics.com/s.css
$ wget -k -E -r -np -b -t 0 -l 50 -U \ "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" \ http://www.hottest-lyrics.com/l/loreena-mckennitt-lyrics-2376.html
After downloading the lyrics, I noticed lots of javascripts and images in HTML sources. So I used the following perl code to clean it out :
Posted to Programming by pooya at March 7, 2004 07:18 AM
Comments