Pooya Karimian

Blog Archives: HTML::TreeBuilder

« Nuvo Humanoid Robot | Main | Spring »

HTML::TreeBuilder

I found HTML::TreeBuilder a useful and easy to use Perl module for filtering out unwanted tags. Just take a look at an example:
I was browsing http://www.hottest-lyrics.com and I found it a good place for downloading lyrics of my favorite singers. The good points are that it has the lyrics categorized for each album and also they can be downloaded by recursive wgets:


$ wget -r http://www.hottest-lyrics.com/s.css
$ wget -k -E -r -np -b -t 0 -l 50 -U \ "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" \ http://www.hottest-lyrics.com/l/loreena-mckennitt-lyrics-2376.html

After downloading the lyrics, I noticed lots of javascripts and images in HTML sources. So I used the following perl code to clean it out :



Posted to Programming by pooya at March 7, 2004 07:18 AM
Comments

Posted by: budowa domów at June 24, 2007 01:05 AM

Good article and site. Congratulations


Posted by: domy drewniane at June 28, 2007 05:45 AM

Nice site. Greetings


Posted by: sklep rowerowy at August 1, 2007 01:33 AM

Good jobs.Thanks.




[Friday 2024-04-26] [Updated Friday 2014-07-18]