link の抽出を Web::Scraper で書くと
あら、素敵。 http://e8y.net/mag/013-web-scraper/ が詳しいです。
#!/opt/local/bin/perl use strict; use Web::Scraper; use URI; my $uri = URI->new("http://developer.apple.com/jp/documentation/japanese.html"); my $scraper = scraper { process 'a', 'url[]' => '@href'; }; my $result = $scraper->scrape($uri); for my $l (@{$result->{url}}) { print "$l\n"; };
ruby だと http://d.hatena.ne.jp/secondlife/20060922/1158923779 がある。
require 'rubygems' require 'scrapi' require 'open-uri' $KCODE = 'u' my_url = URI.parse('http://developer.apple.com/jp/documentation/japanese.html') links = Scraper.define{ process "a[href]", "urls[]"=>"@href" result :urls } links.scrape(my_url).each do |path| ret = my_url + path puts ret.to_s end