In the last meeting of Purdue Perl Mongers, Joe Kline mentioned Sawyer X's YAPC::NA talk on Modern Web Scraping, where he talked about Web::Query, which uses CSS selectors, compared to the XPath selectors he uses for his own web scraping.
I had just written and posted code where I used Mojo::DOM to scrape YouTube. So decided to do a head-to-head parsing of the same corpus.
And found that, except for
Seriously, only a small string that says it's using Web::Query or Mojo::DOM that's different.
In running, Mojo::DOM is a little bit faster, though.
I had just written and posted code where I used Mojo::DOM to scrape YouTube. So decided to do a head-to-head parsing of the same corpus.
And found that, except for
wq($file)
and Mojo::DOM->new($file)
, the code is identical.Seriously, only a small string that says it's using Web::Query or Mojo::DOM that's different.
In running, Mojo::DOM is a little bit faster, though.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env perl | |
use feature qw{ say state unicode_eval unicode_strings } ; | |
use strict ; | |
use warnings ; | |
use utf8 ; | |
use Data::Dumper ; | |
use Mojo::DOM ; | |
use Web::Query ; | |
my $base = 'https://www.youtube.com' ; | |
my $file = join '', (<DATA>) ; | |
$file =~ s/\p{FORMAT}//g ; # find and replace Unicode formatting chars - http://www.perlmonks.org/?node_id=1020973 | |
wq($file)->find('.channels-content-item')->each( | |
sub { | |
state $c = 1 ; | |
my $e = $_ ; | |
my $content = $e->find('.yt-lockup-content')->first ; | |
my $anchor = $content->find('a')->first ; | |
my $title = $anchor->text ; | |
my $link = $base . $anchor->attr('href') ; | |
say join ' : ', ( sprintf '%02d', $c++ ), 'wq', $title, $link ; | |
} | |
) ; | |
Mojo::DOM->new($file)->find('.channels-content-item')->each( | |
sub { | |
state $c = 1 ; | |
my $e = $_ ; | |
my $content = $e->find('.yt-lockup-content')->first ; | |
my $anchor = $content->find('a')->first ; | |
my $title = $anchor->text ; | |
my $link = $base . $anchor->attr('href') ; | |
say join ' : ', ( sprintf '%02d', $c++ ), 'md', $title, $link ; | |
} | |
) ; | |
exit ; | |
__DATA__ | |
... not appropriate to include several thousands of lines of HTML here... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
01 : wq : Lightning Talks, Phil Windley, and YAPC 2015 Closing : https://www.youtube.com/watch?v=3xLMG9ELcPI | |
02 : wq : John McDonald HPCI manage cluster cloud computing : https://www.youtube.com/watch?v=fb7XZj__Pqg | |
03 : wq : bulk 88 Writing XS in plain C : https://www.youtube.com/watch?v=Iu6RV2wKQwo | |
04 : wq : Brian Gottreau If you can't remember history rewrite it so you can : https://www.youtube.com/watch?v=6ByzqrG2Nsc | |
05 : wq : Brad Lhotsky Lessons from High Velocity Logging : https://www.youtube.com/watch?v=6gXxBgGEv_I | |
06 : wq : Andrew Grangaard Effective Git : https://www.youtube.com/watch?v=oS-mMKnAAL0 | |
07 : wq : Ivan Kohler How Perl helped us make a million dollars : https://www.youtube.com/watch?v=D9fzN18F8iQ | |
08 : wq : Walt Mankowski Making movies for fun and science : https://www.youtube.com/watch?v=xf2UHZu9NJA | |
09 : wq : Shawn Moore Lifting Moose : https://www.youtube.com/watch?v=w9HHHNVrmOs | |
10 : wq : Jason McIntosh The True Story of Plerd : https://www.youtube.com/watch?v=5X4VaeoCSe8 | |
11 : wq : Dana Jacobsen BigNums When 64 bits just is not enough : https://www.youtube.com/watch?v=Dhl4_Chvm_g | |
12 : wq : Joseph Hall and A Series of Unfortunate Requests : https://www.youtube.com/watch?v=wbaH_jxcA7g | |
13 : wq : Neil Mansilla Building Smarter Microservices with Scale Oriented Architecture : https://www.youtube.com/watch?v=USXSnfilG4g | |
14 : wq : Jonathan Taylor Moose in Production A Two year Retrospective : https://www.youtube.com/watch?v=tD1oRoaVn2M | |
15 : wq : David Golden Juggling Chainsaws Perl and MongoDB : https://www.youtube.com/watch?v=Nf3e6cPU9B0 | |
16 : wq : Michael Conrad DeLorean Digital Dashboard : https://www.youtube.com/watch?v=SERH3_gZOTo | |
17 : wq : Graham Ollis Practical FFI with Platypus : https://www.youtube.com/watch?v=XjvpxfVJLNg | |
18 : wq : Ricardo Signes (rjbs) - Perl 5.22 and You : https://www.youtube.com/watch?v=I8VVtqVh9y0 | |
19 : wq : Rafael Almeria - Live Perl : https://www.youtube.com/watch?v=nZHWVAPm9IA | |
20 : wq : Daisuke Maki -YAPC::Asian Tokyo Behind The Scenes : How We Organize A Conference for 2000 Attendees : https://www.youtube.com/watch?v=VcwsR1yVuII | |
21 : wq : John Whitney - Perl via Paper Ink Metal and Oil : https://www.youtube.com/watch?v=INSn6cYK19U | |
22 : wq : Stevan Little (stevan) - Perl's Syntactic Legacy: Using the future to improve the past : https://www.youtube.com/watch?v=sJC725e8ysM | |
23 : wq : Joe Kline (gizmo) - My Ordnung : https://www.youtube.com/watch?v=vBiKxw1JMZM | |
24 : wq : Tim Bunce - Life: Enhancing your frame of reference : https://www.youtube.com/watch?v=Y24QnadqqJ4 | |
25 : wq : VM Brasseur (vmbrasseur) - Failure: Why it happens & How to benefit from it : https://www.youtube.com/watch?v=DLn4fZsZsKM | |
26 : wq : Nick Patch (patch) - Hello, my name is _______. : https://www.youtube.com/watch?v=SKbqCB2NPXw | |
27 : wq : Andrew Hewus Fresh (AFresh1) - Perl in OpenBSD : https://www.youtube.com/watch?v=GwrnOpYXimE | |
28 : wq : D Ruth Bavousett (druthb) - Scrum for One : https://www.youtube.com/watch?v=Zh7dXvQY-hg | |
29 : wq : Q&A With Larry Wall : https://www.youtube.com/watch?v=PK9UnAmrxsA | |
30 : wq : Seth Johnson - Keynote: Seth Johnson - What Perl Taught Me About Life : https://www.youtube.com/watch?v=afaKtWp0JKM | |
31 : wq : Curtis Poe (Ovid) - Perl 6 for Mere Mortals : https://www.youtube.com/watch?v=S0OGsFmPW2M | |
32 : wq : Florian Ragwitz (rafl) - Ansible for Programmers : https://www.youtube.com/watch?v=x3ZbYQSGkBY | |
33 : wq : Bruce Gray (Util) - Stop Panicking! Perl 6 is just like Perl 5 (where it counts). : https://www.youtube.com/watch?v=KSWp9B-s-Sg | |
34 : wq : Steven Lembark - Mongering in a Box: Building Perl application containers with Dockers : https://www.youtube.com/watch?v=NuRClr-xREc | |
35 : wq : DrForr - Everything Old is New Again: Quaternion in Perl6 : https://www.youtube.com/watch?v=fKksZBUDMEo | |
36 : wq : Jordan Adler (jmadler) Mobile Apps... in Perl?! : https://www.youtube.com/watch?v=7mRHapWZ-AI | |
37 : wq : Logan Bell - Give Catalyst Some Swag : https://www.youtube.com/watch?v=mHmdrgnMCps | |
38 : wq : Logan Bell - Perl to Go : https://www.youtube.com/watch?v=y573MDoLraY | |
39 : wq : Henry Van Styn (vanstyn) - RapidApp by example - database web apps on steroids : https://www.youtube.com/watch?v=9HMHD1u9uc4 | |
40 : wq : James E Keenan (kid51) - A Simple Development Tool for Refactoring & Benchmarking : https://www.youtube.com/watch?v=vSNdp1QkCyE | |
41 : wq : WHATEVER YOU DO DON'T VIEW THIS : https://www.youtube.com/watch?v=-AJo_RVDoF0 | |
42 : wq : Mark Prather (Trg404) - From bartending to nerdtending : https://www.youtube.com/watch?v=uvETUUMZo9E | |
43 : wq : William Stevenson (wds) - Dude, where's my data analyst? A quick guide to machine learning : https://www.youtube.com/watch?v=p53qpU78LxI | |
44 : wq : Chad Granum (Exodist) - Perl Testing, whats new with Test:: More and beyond : https://www.youtube.com/watch?v=uFzr6wu5Pq4 | |
45 : wq : Sawyer X - Modern web scraping : https://www.youtube.com/watch?v=wcXmCMGwZQo | |
46 : wq : Joel Berger (jberger) - Test Your App's Javascript using Test:: Mojo::Role::Phantom : https://www.youtube.com/watch?v=CKbzBNz4Ksg | |
47 : wq : Sean Quinlan (spq_easy) - Leave the system alone! : https://www.youtube.com/watch?v=mph-9hqJQ98 | |
48 : wq : Upasana Shukla (upsasana) How to Bring Newbies to Perl : https://www.youtube.com/watch?v=yewFM9XEmlQ | |
49 : wq : Matt S. Trout (mst) Build management with a dash of prolog : https://www.youtube.com/watch?v=C2RJfykfVcM | |
50 : wq : Prairie Nyx - CoderDojo and Perl Evangelism : https://www.youtube.com/watch?v=kkD4pCRvwK4 | |
51 : wq : Karen Pauley - Working with Volunteers: Learning from My Mistakes : https://www.youtube.com/watch?v=ek4fmzyXGwM | |
52 : wq : Stephen Scaffidi (hercynium) - In the desert without a camel : https://www.youtube.com/watch?v=OK1ZY_bw660 | |
53 : wq : R Geoffrey Avery (eGeoffrey) Lightning Talks Day 1 : https://www.youtube.com/watch?v=mQVUvAz3zhQ | |
54 : wq : Welcome to YAPC & States of the Velociraptors : The Perl5 community lightning talks : https://www.youtube.com/watch?v=88K1h1XhEeo | |
55 : wq : YAPC::NA::2014 Highlights : https://www.youtube.com/watch?v=GLqtHab06dM | |
56 : wq : Matt S Trout (mst) - Devops Logique : https://www.youtube.com/watch?v=RQwY28DItLI | |
57 : wq : John Anderson (genehack) - Yet Another Keynote Speech : https://www.youtube.com/watch?v=MU6IFUZZBuQ | |
58 : wq : Sawyer X - The Joy in What We Do : https://www.youtube.com/watch?v=CjOQZf0Ad74 | |
59 : wq : R Geoffrey Avery (rGeoffrey) - Lightning Talks Day 3 : https://www.youtube.com/watch?v=m-6o2dBc1qE | |
60 : wq : Peter Martini - Sub Signatures: Next Steps : https://www.youtube.com/watch?v=ot5yOrMJogA | |
01 : md : Lightning Talks, Phil Windley, and YAPC 2015 Closing : https://www.youtube.com/watch?v=3xLMG9ELcPI | |
02 : md : John McDonald HPCI manage cluster cloud computing : https://www.youtube.com/watch?v=fb7XZj__Pqg | |
03 : md : bulk 88 Writing XS in plain C : https://www.youtube.com/watch?v=Iu6RV2wKQwo | |
04 : md : Brian Gottreau If you can't remember history rewrite it so you can : https://www.youtube.com/watch?v=6ByzqrG2Nsc | |
05 : md : Brad Lhotsky Lessons from High Velocity Logging : https://www.youtube.com/watch?v=6gXxBgGEv_I | |
06 : md : Andrew Grangaard Effective Git : https://www.youtube.com/watch?v=oS-mMKnAAL0 | |
07 : md : Ivan Kohler How Perl helped us make a million dollars : https://www.youtube.com/watch?v=D9fzN18F8iQ | |
08 : md : Walt Mankowski Making movies for fun and science : https://www.youtube.com/watch?v=xf2UHZu9NJA | |
09 : md : Shawn Moore Lifting Moose : https://www.youtube.com/watch?v=w9HHHNVrmOs | |
10 : md : Jason McIntosh The True Story of Plerd : https://www.youtube.com/watch?v=5X4VaeoCSe8 | |
11 : md : Dana Jacobsen BigNums When 64 bits just is not enough : https://www.youtube.com/watch?v=Dhl4_Chvm_g | |
12 : md : Joseph Hall and A Series of Unfortunate Requests : https://www.youtube.com/watch?v=wbaH_jxcA7g | |
13 : md : Neil Mansilla Building Smarter Microservices with Scale Oriented Architecture : https://www.youtube.com/watch?v=USXSnfilG4g | |
14 : md : Jonathan Taylor Moose in Production A Two year Retrospective : https://www.youtube.com/watch?v=tD1oRoaVn2M | |
15 : md : David Golden Juggling Chainsaws Perl and MongoDB : https://www.youtube.com/watch?v=Nf3e6cPU9B0 | |
16 : md : Michael Conrad DeLorean Digital Dashboard : https://www.youtube.com/watch?v=SERH3_gZOTo | |
17 : md : Graham Ollis Practical FFI with Platypus : https://www.youtube.com/watch?v=XjvpxfVJLNg | |
18 : md : Ricardo Signes (rjbs) - Perl 5.22 and You : https://www.youtube.com/watch?v=I8VVtqVh9y0 | |
19 : md : Rafael Almeria - Live Perl : https://www.youtube.com/watch?v=nZHWVAPm9IA | |
20 : md : Daisuke Maki -YAPC::Asian Tokyo Behind The Scenes : How We Organize A Conference for 2000 Attendees : https://www.youtube.com/watch?v=VcwsR1yVuII | |
21 : md : John Whitney - Perl via Paper Ink Metal and Oil : https://www.youtube.com/watch?v=INSn6cYK19U | |
22 : md : Stevan Little (stevan) - Perl's Syntactic Legacy: Using the future to improve the past : https://www.youtube.com/watch?v=sJC725e8ysM | |
23 : md : Joe Kline (gizmo) - My Ordnung : https://www.youtube.com/watch?v=vBiKxw1JMZM | |
24 : md : Tim Bunce - Life: Enhancing your frame of reference : https://www.youtube.com/watch?v=Y24QnadqqJ4 | |
25 : md : VM Brasseur (vmbrasseur) - Failure: Why it happens & How to benefit from it : https://www.youtube.com/watch?v=DLn4fZsZsKM | |
26 : md : Nick Patch (patch) - Hello, my name is _______. : https://www.youtube.com/watch?v=SKbqCB2NPXw | |
27 : md : Andrew Hewus Fresh (AFresh1) - Perl in OpenBSD : https://www.youtube.com/watch?v=GwrnOpYXimE | |
28 : md : D Ruth Bavousett (druthb) - Scrum for One : https://www.youtube.com/watch?v=Zh7dXvQY-hg | |
29 : md : Q&A With Larry Wall : https://www.youtube.com/watch?v=PK9UnAmrxsA | |
30 : md : Seth Johnson - Keynote: Seth Johnson - What Perl Taught Me About Life : https://www.youtube.com/watch?v=afaKtWp0JKM | |
31 : md : Curtis Poe (Ovid) - Perl 6 for Mere Mortals : https://www.youtube.com/watch?v=S0OGsFmPW2M | |
32 : md : Florian Ragwitz (rafl) - Ansible for Programmers : https://www.youtube.com/watch?v=x3ZbYQSGkBY | |
33 : md : Bruce Gray (Util) - Stop Panicking! Perl 6 is just like Perl 5 (where it counts). : https://www.youtube.com/watch?v=KSWp9B-s-Sg | |
34 : md : Steven Lembark - Mongering in a Box: Building Perl application containers with Dockers : https://www.youtube.com/watch?v=NuRClr-xREc | |
35 : md : DrForr - Everything Old is New Again: Quaternion in Perl6 : https://www.youtube.com/watch?v=fKksZBUDMEo | |
36 : md : Jordan Adler (jmadler) Mobile Apps... in Perl?! : https://www.youtube.com/watch?v=7mRHapWZ-AI | |
37 : md : Logan Bell - Give Catalyst Some Swag : https://www.youtube.com/watch?v=mHmdrgnMCps | |
38 : md : Logan Bell - Perl to Go : https://www.youtube.com/watch?v=y573MDoLraY | |
39 : md : Henry Van Styn (vanstyn) - RapidApp by example - database web apps on steroids : https://www.youtube.com/watch?v=9HMHD1u9uc4 | |
40 : md : James E Keenan (kid51) - A Simple Development Tool for Refactoring & Benchmarking : https://www.youtube.com/watch?v=vSNdp1QkCyE | |
41 : md : WHATEVER YOU DO DON'T VIEW THIS : https://www.youtube.com/watch?v=-AJo_RVDoF0 | |
42 : md : Mark Prather (Trg404) - From bartending to nerdtending : https://www.youtube.com/watch?v=uvETUUMZo9E | |
43 : md : William Stevenson (wds) - Dude, where's my data analyst? A quick guide to machine learning : https://www.youtube.com/watch?v=p53qpU78LxI | |
44 : md : Chad Granum (Exodist) - Perl Testing, whats new with Test:: More and beyond : https://www.youtube.com/watch?v=uFzr6wu5Pq4 | |
45 : md : Sawyer X - Modern web scraping : https://www.youtube.com/watch?v=wcXmCMGwZQo | |
46 : md : Joel Berger (jberger) - Test Your App's Javascript using Test:: Mojo::Role::Phantom : https://www.youtube.com/watch?v=CKbzBNz4Ksg | |
47 : md : Sean Quinlan (spq_easy) - Leave the system alone! : https://www.youtube.com/watch?v=mph-9hqJQ98 | |
48 : md : Upasana Shukla (upsasana) How to Bring Newbies to Perl : https://www.youtube.com/watch?v=yewFM9XEmlQ | |
49 : md : Matt S. Trout (mst) Build management with a dash of prolog : https://www.youtube.com/watch?v=C2RJfykfVcM | |
50 : md : Prairie Nyx - CoderDojo and Perl Evangelism : https://www.youtube.com/watch?v=kkD4pCRvwK4 | |
51 : md : Karen Pauley - Working with Volunteers: Learning from My Mistakes : https://www.youtube.com/watch?v=ek4fmzyXGwM | |
52 : md : Stephen Scaffidi (hercynium) - In the desert without a camel : https://www.youtube.com/watch?v=OK1ZY_bw660 | |
53 : md : R Geoffrey Avery (eGeoffrey) Lightning Talks Day 1 : https://www.youtube.com/watch?v=mQVUvAz3zhQ | |
54 : md : Welcome to YAPC & States of the Velociraptors : The Perl5 community lightning talks : https://www.youtube.com/watch?v=88K1h1XhEeo | |
55 : md : YAPC::NA::2014 Highlights : https://www.youtube.com/watch?v=GLqtHab06dM | |
56 : md : Matt S Trout (mst) - Devops Logique : https://www.youtube.com/watch?v=RQwY28DItLI | |
57 : md : John Anderson (genehack) - Yet Another Keynote Speech : https://www.youtube.com/watch?v=MU6IFUZZBuQ | |
58 : md : Sawyer X - The Joy in What We Do : https://www.youtube.com/watch?v=CjOQZf0Ad74 | |
59 : md : R Geoffrey Avery (rGeoffrey) - Lightning Talks Day 3 : https://www.youtube.com/watch?v=m-6o2dBc1qE | |
60 : md : Peter Martini - Sub Signatures: Next Steps : https://www.youtube.com/watch?v=ot5yOrMJogA |
No comments:
Post a Comment