/var/log/rant: June 2015

2015/06/29

Fixing an old logic issue

I am not especially proud of the code below.

	#!/usr/bin/env perl
	use 5.010 ;
	use strict ;
	use warnings ;
	use CGI ;
	use DBI ;
	use Data::Dumper ;
	use JSON ;
	use lib '/path/to/my/lib' ;
	use SecGenTools::Accession ':all' ;

	my $cgi = new CGI ;
	my $json = new JSON ;
	my $hash ;
	my $param ;
	map { $param->{$_} = $cgi->param($_) ; } $cgi->param() ;

	my $request_id = $param->{request_id} ;

	say "content-type: text/plain\n" ;

	my @resp ;
	$hash->{info} = $param ;

	if ( defined $request_id && length $request_id > 1 ) {
	my @json_accessions = grep {/^\d/} keys %$param ;
	my @db_accessions = get_accession_list($request_id) ;
	my $db_accessions = join '\|', @db_accessions ;
	my @accessions = sort grep {/$db_accessions/} @json_accessions ;
	my $response = 0 ;

	$hash->{def_req} = defined $request_id ;
	$hash->{len_id} = length $request_id > 1 ;
	$hash->{json_accessions} = \@json_accessions ;
	$hash->{db_accessions} = \@db_accessions ;
	$hash->{accessions} = \@accessions ;

	for my $acc (@accessions) {
	my $data ;
	my $lib = $param->{$acc} ;
	$data->{library_name} = $lib ;
	$data->{accession_id} = sprintf '%06d' , $acc ;
	my $r = update_accession_library_name($data) ;
	$hash->{debug} .= qq{$acc : $lib : $r : } ;
	$response++ if $r != '0A0' ;
	push @resp, $r ;
	}
	$hash->{param} = join "\n", @accessions ;
	$hash->{response} = "$response Accessions Changed" ;
	}
	else {
	$hash->{response} = 'Debug' ;
	}


	say JSON::to_json($hash) ;

	exit ;

view raw rename_libraries.cgi hosted with ❤ by GitHub

It does it's job. Give it a request and a number of accessions and the names you want them to go by, and it changes them in the database.

Except...

Accessions are defined as zero-padded six-digit numbers, so instead of 99999, you'd have 099999. If you're strict, everything's fine.

But user's are not always strict. Sometimes they just put in 99999, expecting it to just work.

Oh, if only it were that easy.

I have requests here for the purpose of ensuring that for request 09999, you can only change accessions associated with that request. This is what lines 27-29 are for, to get the set of accessions that are entered by the user and one of the given request's accessions.

Yes, requests are defined as zero-padded five-digit numbers.

If I don't zero-pad the accessions, I get nothing in @accessions.

But if I do zero pad, I get no library name from $param->{ $acc }.

There is a fix for it. I could go back to the source and ensure that this sees no un-padded numbers. I could run through the $param hashref again, But clearly, this is something I should've built in at first.

2015/06/22

"Well, That Was Strange": Hunting Gremlins in SQL and Perl

The query base is 90 lines.

Depending on what it's used for, one specific entry or the whole lot, it has different endings, but the main body is 90 lines. There are 20 left joins in it.

It is an ugly thing.

So ugly, in fact, that am loath to include it here.

So ugly that I felt it necessary to comment and explain my use of joins.

This is where the trouble started.

I noticed it when I was running tests, getting the following error.

	DBD::mysql::st execute failed: called with 1 bind variables when 0 are needed [for Statement " SELECT p.id
	...
	WHERE p.id = ?
	"] at /group/gcore/apps/lib/DB.pm line 185.

view raw my_error.txt hosted with ❤ by GitHub

Clearly, it needed a bind variable, but something along the line blocked it.

I had this problem on Friday morning on our newer server, then it stopped. Honestly, it was such a fire-fighting day that I lost track of what was happening with it.

Then the module was put on the old server and the problem rose up again.

Whether that makes me Shatner or Lithgow
remains an exercise for the reader.

I said "my code has gremlins" and went home at 5pm.

When I got back to the lab this morning, I made different test scripts, each identical except for the hashbang. I set one for system Perl, which is 5.10, one for the one we hardcode into most of our web and cron uses, which is 5.16, and the one we have as env perl, currently 5.20.

The cooler solution would've been to have several versions of Perl installed with Perlbrew, then running perlbrew exec perl myprogram.pl instead, but I don't have Perlbrew installed on that system.

The error occurs with 5.10. It does not with 5.16 or 5.20.

And when I run it against a version without the comments in the query, it works everywhere.

I don't have any clue if the issue is with Perl 5.10 or with the version of DBI currently installed with 5.10, and I don't expect to. The old system is a Sun machine that was off support before I was hired in, and the admin for it reminds us each time we talk to him that it's only a matter of time before it falls and can no longer get up. I haven't worked off that machine for around two years, and this query's move to the old server is part of the move of certain services to the new machine.

And, as everything is fine with Perls 5.16 or higher, I must regard this as a solved problem except with legacy installs.

I know that MySQL accepts # as the comment character, but Sublime Text prefers to make -- mean SQL comments, so when I commented the query, I used the double-dash, and our solution is to remove the comments when deploying to the old server. It's a temporary solution, to be sure, but deploying to the old server is only temporary, too.

It's a sad and strange situation where the solution is to uncomment code, but here, that seems to be it.

Update: Matt S. Trout pushed me to check into the DBD::mysql versions, to see which versions corresponded to the error. The offending 5.10 perl used DBD::mysql v. 4.013, and looking at the DBD::mysql change log, I see bug #30033: Fixed handling of comments to allow comments that contain characters that might otherwise cause placeholder detection to not work properly. Matt suggests adding "use DBD::mysql 4.014;", which is more than reasonable.

2015/06/17

Head-to-Head Web Scraping with Perl: Mojo::DOM vs Web::Query

In the last meeting of Purdue Perl Mongers, Joe Kline mentioned Sawyer X's YAPC::NA talk on Modern Web Scraping, where he talked about Web::Query, which uses CSS selectors, compared to the XPath selectors he uses for his own web scraping.

I had just written and posted code where I used Mojo::DOM to scrape YouTube. So decided to do a head-to-head parsing of the same corpus.

And found that, except for wq($file) and Mojo::DOM->new($file), the code is identical.

Seriously, only a small string that says it's using Web::Query or Mojo::DOM that's different.

In running, Mojo::DOM is a little bit faster, though.

	#!/usr/bin/env perl

	use feature qw{ say state unicode_eval unicode_strings } ;
	use strict ;
	use warnings ;
	use utf8 ;

	use Data::Dumper ;
	use Mojo::DOM ;
	use Web::Query ;

	my $base = 'https://www.youtube.com' ;
	my $file = join '', (<DATA>) ;
	$file =~ s/\p{FORMAT}//g ; # find and replace Unicode formatting chars - http://www.perlmonks.org/?node_id=1020973

	wq($file)->find('.channels-content-item')->each(
	sub {
	state $c = 1 ;
	my $e = $_ ;
	my $content = $e->find('.yt-lockup-content')->first ;
	my $anchor = $content->find('a')->first ;
	my $title = $anchor->text ;
	my $link = $base . $anchor->attr('href') ;
	say join ' : ', ( sprintf '%02d', $c++ ), 'wq', $title, $link ;
	}
	) ;
	Mojo::DOM->new($file)->find('.channels-content-item')->each(
	sub {
	state $c = 1 ;
	my $e = $_ ;
	my $content = $e->find('.yt-lockup-content')->first ;
	my $anchor = $content->find('a')->first ;
	my $title = $anchor->text ;
	my $link = $base . $anchor->attr('href') ;
	say join ' : ', ( sprintf '%02d', $c++ ), 'md', $title, $link ;
	}
	) ;
	exit ;

	__DATA__
	... not appropriate to include several thousands of lines of HTML here...

view raw yapcna2015.wq.pl hosted with ❤ by GitHub

	01 : wq : Lightning Talks, Phil Windley, and YAPC 2015 Closing : https://www.youtube.com/watch?v=3xLMG9ELcPI
	02 : wq : John McDonald HPCI manage cluster cloud computing : https://www.youtube.com/watch?v=fb7XZj__Pqg
	03 : wq : bulk 88 Writing XS in plain C : https://www.youtube.com/watch?v=Iu6RV2wKQwo
	04 : wq : Brian Gottreau If you can't remember history rewrite it so you can : https://www.youtube.com/watch?v=6ByzqrG2Nsc
	05 : wq : Brad Lhotsky Lessons from High Velocity Logging : https://www.youtube.com/watch?v=6gXxBgGEv_I
	06 : wq : Andrew Grangaard Effective Git : https://www.youtube.com/watch?v=oS-mMKnAAL0
	07 : wq : Ivan Kohler How Perl helped us make a million dollars : https://www.youtube.com/watch?v=D9fzN18F8iQ
	08 : wq : Walt Mankowski Making movies for fun and science : https://www.youtube.com/watch?v=xf2UHZu9NJA
	09 : wq : Shawn Moore Lifting Moose : https://www.youtube.com/watch?v=w9HHHNVrmOs
	10 : wq : Jason McIntosh The True Story of Plerd : https://www.youtube.com/watch?v=5X4VaeoCSe8
	11 : wq : Dana Jacobsen BigNums When 64 bits just is not enough : https://www.youtube.com/watch?v=Dhl4_Chvm_g
	12 : wq : Joseph Hall and A Series of Unfortunate Requests : https://www.youtube.com/watch?v=wbaH_jxcA7g
	13 : wq : Neil Mansilla Building Smarter Microservices with Scale Oriented Architecture : https://www.youtube.com/watch?v=USXSnfilG4g
	14 : wq : Jonathan Taylor Moose in Production A Two year Retrospective : https://www.youtube.com/watch?v=tD1oRoaVn2M
	15 : wq : David Golden Juggling Chainsaws Perl and MongoDB : https://www.youtube.com/watch?v=Nf3e6cPU9B0
	16 : wq : Michael Conrad DeLorean Digital Dashboard : https://www.youtube.com/watch?v=SERH3_gZOTo
	17 : wq : Graham Ollis Practical FFI with Platypus : https://www.youtube.com/watch?v=XjvpxfVJLNg
	18 : wq : Ricardo Signes (rjbs) - Perl 5.22 and You : https://www.youtube.com/watch?v=I8VVtqVh9y0
	19 : wq : Rafael Almeria - Live Perl : https://www.youtube.com/watch?v=nZHWVAPm9IA
	20 : wq : Daisuke Maki -YAPC::Asian Tokyo Behind The Scenes : How We Organize A Conference for 2000 Attendees : https://www.youtube.com/watch?v=VcwsR1yVuII
	21 : wq : John Whitney - Perl via Paper Ink Metal and Oil : https://www.youtube.com/watch?v=INSn6cYK19U
	22 : wq : Stevan Little (stevan) - Perl's Syntactic Legacy: Using the future to improve the past : https://www.youtube.com/watch?v=sJC725e8ysM
	23 : wq : Joe Kline (gizmo) - My Ordnung : https://www.youtube.com/watch?v=vBiKxw1JMZM
	24 : wq : Tim Bunce - Life: Enhancing your frame of reference : https://www.youtube.com/watch?v=Y24QnadqqJ4
	25 : wq : VM Brasseur (vmbrasseur) - Failure: Why it happens & How to benefit from it : https://www.youtube.com/watch?v=DLn4fZsZsKM
	26 : wq : Nick Patch (patch) - Hello, my name is _______. : https://www.youtube.com/watch?v=SKbqCB2NPXw
	27 : wq : Andrew Hewus Fresh (AFresh1) - Perl in OpenBSD : https://www.youtube.com/watch?v=GwrnOpYXimE
	28 : wq : D Ruth Bavousett (druthb) - Scrum for One : https://www.youtube.com/watch?v=Zh7dXvQY-hg
	29 : wq : Q&A With Larry Wall : https://www.youtube.com/watch?v=PK9UnAmrxsA
	30 : wq : Seth Johnson - Keynote: Seth Johnson - What Perl Taught Me About Life : https://www.youtube.com/watch?v=afaKtWp0JKM
	31 : wq : Curtis Poe (Ovid) - Perl 6 for Mere Mortals : https://www.youtube.com/watch?v=S0OGsFmPW2M
	32 : wq : Florian Ragwitz (rafl) - Ansible for Programmers : https://www.youtube.com/watch?v=x3ZbYQSGkBY
	33 : wq : Bruce Gray (Util) - Stop Panicking! Perl 6 is just like Perl 5 (where it counts). : https://www.youtube.com/watch?v=KSWp9B-s-Sg
	34 : wq : Steven Lembark - Mongering in a Box: Building Perl application containers with Dockers : https://www.youtube.com/watch?v=NuRClr-xREc
	35 : wq : DrForr - Everything Old is New Again: Quaternion in Perl6 : https://www.youtube.com/watch?v=fKksZBUDMEo
	36 : wq : Jordan Adler (jmadler) Mobile Apps... in Perl?! : https://www.youtube.com/watch?v=7mRHapWZ-AI
	37 : wq : Logan Bell - Give Catalyst Some Swag : https://www.youtube.com/watch?v=mHmdrgnMCps
	38 : wq : Logan Bell - Perl to Go : https://www.youtube.com/watch?v=y573MDoLraY
	39 : wq : Henry Van Styn (vanstyn) - RapidApp by example - database web apps on steroids : https://www.youtube.com/watch?v=9HMHD1u9uc4
	40 : wq : James E Keenan (kid51) - A Simple Development Tool for Refactoring & Benchmarking : https://www.youtube.com/watch?v=vSNdp1QkCyE
	41 : wq : WHATEVER YOU DO DON'T VIEW THIS : https://www.youtube.com/watch?v=-AJo_RVDoF0
	42 : wq : Mark Prather (Trg404) - From bartending to nerdtending : https://www.youtube.com/watch?v=uvETUUMZo9E
	43 : wq : William Stevenson (wds) - Dude, where's my data analyst? A quick guide to machine learning : https://www.youtube.com/watch?v=p53qpU78LxI
	44 : wq : Chad Granum (Exodist) - Perl Testing, whats new with Test:: More and beyond : https://www.youtube.com/watch?v=uFzr6wu5Pq4
	45 : wq : Sawyer X - Modern web scraping : https://www.youtube.com/watch?v=wcXmCMGwZQo
	46 : wq : Joel Berger (jberger) - Test Your App's Javascript using Test:: Mojo::Role::Phantom : https://www.youtube.com/watch?v=CKbzBNz4Ksg
	47 : wq : Sean Quinlan (spq_easy) - Leave the system alone! : https://www.youtube.com/watch?v=mph-9hqJQ98
	48 : wq : Upasana Shukla (upsasana) How to Bring Newbies to Perl : https://www.youtube.com/watch?v=yewFM9XEmlQ
	49 : wq : Matt S. Trout (mst) Build management with a dash of prolog : https://www.youtube.com/watch?v=C2RJfykfVcM
	50 : wq : Prairie Nyx - CoderDojo and Perl Evangelism : https://www.youtube.com/watch?v=kkD4pCRvwK4
	51 : wq : Karen Pauley - Working with Volunteers: Learning from My Mistakes : https://www.youtube.com/watch?v=ek4fmzyXGwM
	52 : wq : Stephen Scaffidi (hercynium) - In the desert without a camel : https://www.youtube.com/watch?v=OK1ZY_bw660
	53 : wq : R Geoffrey Avery (eGeoffrey) Lightning Talks Day 1 : https://www.youtube.com/watch?v=mQVUvAz3zhQ
	54 : wq : Welcome to YAPC & States of the Velociraptors : The Perl5 community lightning talks : https://www.youtube.com/watch?v=88K1h1XhEeo
	55 : wq : YAPC::NA::2014 Highlights : https://www.youtube.com/watch?v=GLqtHab06dM
	56 : wq : Matt S Trout (mst) - Devops Logique : https://www.youtube.com/watch?v=RQwY28DItLI
	57 : wq : John Anderson (genehack) - Yet Another Keynote Speech : https://www.youtube.com/watch?v=MU6IFUZZBuQ
	58 : wq : Sawyer X - The Joy in What We Do : https://www.youtube.com/watch?v=CjOQZf0Ad74
	59 : wq : R Geoffrey Avery (rGeoffrey) - Lightning Talks Day 3 : https://www.youtube.com/watch?v=m-6o2dBc1qE
	60 : wq : Peter Martini - Sub Signatures: Next Steps : https://www.youtube.com/watch?v=ot5yOrMJogA
	01 : md : Lightning Talks, Phil Windley, and YAPC 2015 Closing : https://www.youtube.com/watch?v=3xLMG9ELcPI
	02 : md : John McDonald HPCI manage cluster cloud computing : https://www.youtube.com/watch?v=fb7XZj__Pqg
	03 : md : bulk 88 Writing XS in plain C : https://www.youtube.com/watch?v=Iu6RV2wKQwo
	04 : md : Brian Gottreau If you can't remember history rewrite it so you can : https://www.youtube.com/watch?v=6ByzqrG2Nsc
	05 : md : Brad Lhotsky Lessons from High Velocity Logging : https://www.youtube.com/watch?v=6gXxBgGEv_I
	06 : md : Andrew Grangaard Effective Git : https://www.youtube.com/watch?v=oS-mMKnAAL0
	07 : md : Ivan Kohler How Perl helped us make a million dollars : https://www.youtube.com/watch?v=D9fzN18F8iQ
	08 : md : Walt Mankowski Making movies for fun and science : https://www.youtube.com/watch?v=xf2UHZu9NJA
	09 : md : Shawn Moore Lifting Moose : https://www.youtube.com/watch?v=w9HHHNVrmOs
	10 : md : Jason McIntosh The True Story of Plerd : https://www.youtube.com/watch?v=5X4VaeoCSe8
	11 : md : Dana Jacobsen BigNums When 64 bits just is not enough : https://www.youtube.com/watch?v=Dhl4_Chvm_g
	12 : md : Joseph Hall and A Series of Unfortunate Requests : https://www.youtube.com/watch?v=wbaH_jxcA7g
	13 : md : Neil Mansilla Building Smarter Microservices with Scale Oriented Architecture : https://www.youtube.com/watch?v=USXSnfilG4g
	14 : md : Jonathan Taylor Moose in Production A Two year Retrospective : https://www.youtube.com/watch?v=tD1oRoaVn2M
	15 : md : David Golden Juggling Chainsaws Perl and MongoDB : https://www.youtube.com/watch?v=Nf3e6cPU9B0
	16 : md : Michael Conrad DeLorean Digital Dashboard : https://www.youtube.com/watch?v=SERH3_gZOTo
	17 : md : Graham Ollis Practical FFI with Platypus : https://www.youtube.com/watch?v=XjvpxfVJLNg
	18 : md : Ricardo Signes (rjbs) - Perl 5.22 and You : https://www.youtube.com/watch?v=I8VVtqVh9y0
	19 : md : Rafael Almeria - Live Perl : https://www.youtube.com/watch?v=nZHWVAPm9IA
	20 : md : Daisuke Maki -YAPC::Asian Tokyo Behind The Scenes : How We Organize A Conference for 2000 Attendees : https://www.youtube.com/watch?v=VcwsR1yVuII
	21 : md : John Whitney - Perl via Paper Ink Metal and Oil : https://www.youtube.com/watch?v=INSn6cYK19U
	22 : md : Stevan Little (stevan) - Perl's Syntactic Legacy: Using the future to improve the past : https://www.youtube.com/watch?v=sJC725e8ysM
	23 : md : Joe Kline (gizmo) - My Ordnung : https://www.youtube.com/watch?v=vBiKxw1JMZM
	24 : md : Tim Bunce - Life: Enhancing your frame of reference : https://www.youtube.com/watch?v=Y24QnadqqJ4
	25 : md : VM Brasseur (vmbrasseur) - Failure: Why it happens & How to benefit from it : https://www.youtube.com/watch?v=DLn4fZsZsKM
	26 : md : Nick Patch (patch) - Hello, my name is _______. : https://www.youtube.com/watch?v=SKbqCB2NPXw
	27 : md : Andrew Hewus Fresh (AFresh1) - Perl in OpenBSD : https://www.youtube.com/watch?v=GwrnOpYXimE
	28 : md : D Ruth Bavousett (druthb) - Scrum for One : https://www.youtube.com/watch?v=Zh7dXvQY-hg
	29 : md : Q&A With Larry Wall : https://www.youtube.com/watch?v=PK9UnAmrxsA
	30 : md : Seth Johnson - Keynote: Seth Johnson - What Perl Taught Me About Life : https://www.youtube.com/watch?v=afaKtWp0JKM
	31 : md : Curtis Poe (Ovid) - Perl 6 for Mere Mortals : https://www.youtube.com/watch?v=S0OGsFmPW2M
	32 : md : Florian Ragwitz (rafl) - Ansible for Programmers : https://www.youtube.com/watch?v=x3ZbYQSGkBY
	33 : md : Bruce Gray (Util) - Stop Panicking! Perl 6 is just like Perl 5 (where it counts). : https://www.youtube.com/watch?v=KSWp9B-s-Sg
	34 : md : Steven Lembark - Mongering in a Box: Building Perl application containers with Dockers : https://www.youtube.com/watch?v=NuRClr-xREc
	35 : md : DrForr - Everything Old is New Again: Quaternion in Perl6 : https://www.youtube.com/watch?v=fKksZBUDMEo
	36 : md : Jordan Adler (jmadler) Mobile Apps... in Perl?! : https://www.youtube.com/watch?v=7mRHapWZ-AI
	37 : md : Logan Bell - Give Catalyst Some Swag : https://www.youtube.com/watch?v=mHmdrgnMCps
	38 : md : Logan Bell - Perl to Go : https://www.youtube.com/watch?v=y573MDoLraY
	39 : md : Henry Van Styn (vanstyn) - RapidApp by example - database web apps on steroids : https://www.youtube.com/watch?v=9HMHD1u9uc4
	40 : md : James E Keenan (kid51) - A Simple Development Tool for Refactoring & Benchmarking : https://www.youtube.com/watch?v=vSNdp1QkCyE
	41 : md : WHATEVER YOU DO DON'T VIEW THIS : https://www.youtube.com/watch?v=-AJo_RVDoF0
	42 : md : Mark Prather (Trg404) - From bartending to nerdtending : https://www.youtube.com/watch?v=uvETUUMZo9E
	43 : md : William Stevenson (wds) - Dude, where's my data analyst? A quick guide to machine learning : https://www.youtube.com/watch?v=p53qpU78LxI
	44 : md : Chad Granum (Exodist) - Perl Testing, whats new with Test:: More and beyond : https://www.youtube.com/watch?v=uFzr6wu5Pq4
	45 : md : Sawyer X - Modern web scraping : https://www.youtube.com/watch?v=wcXmCMGwZQo
	46 : md : Joel Berger (jberger) - Test Your App's Javascript using Test:: Mojo::Role::Phantom : https://www.youtube.com/watch?v=CKbzBNz4Ksg
	47 : md : Sean Quinlan (spq_easy) - Leave the system alone! : https://www.youtube.com/watch?v=mph-9hqJQ98
	48 : md : Upasana Shukla (upsasana) How to Bring Newbies to Perl : https://www.youtube.com/watch?v=yewFM9XEmlQ
	49 : md : Matt S. Trout (mst) Build management with a dash of prolog : https://www.youtube.com/watch?v=C2RJfykfVcM
	50 : md : Prairie Nyx - CoderDojo and Perl Evangelism : https://www.youtube.com/watch?v=kkD4pCRvwK4
	51 : md : Karen Pauley - Working with Volunteers: Learning from My Mistakes : https://www.youtube.com/watch?v=ek4fmzyXGwM
	52 : md : Stephen Scaffidi (hercynium) - In the desert without a camel : https://www.youtube.com/watch?v=OK1ZY_bw660
	53 : md : R Geoffrey Avery (eGeoffrey) Lightning Talks Day 1 : https://www.youtube.com/watch?v=mQVUvAz3zhQ
	54 : md : Welcome to YAPC & States of the Velociraptors : The Perl5 community lightning talks : https://www.youtube.com/watch?v=88K1h1XhEeo
	55 : md : YAPC::NA::2014 Highlights : https://www.youtube.com/watch?v=GLqtHab06dM
	56 : md : Matt S Trout (mst) - Devops Logique : https://www.youtube.com/watch?v=RQwY28DItLI
	57 : md : John Anderson (genehack) - Yet Another Keynote Speech : https://www.youtube.com/watch?v=MU6IFUZZBuQ
	58 : md : Sawyer X - The Joy in What We Do : https://www.youtube.com/watch?v=CjOQZf0Ad74
	59 : md : R Geoffrey Avery (rGeoffrey) - Lightning Talks Day 3 : https://www.youtube.com/watch?v=m-6o2dBc1qE
	60 : md : Peter Martini - Sub Signatures: Next Steps : https://www.youtube.com/watch?v=ot5yOrMJogA

view raw zzz.yapcna2015.wq.output.txt hosted with ❤ by GitHub

2015/06/15

Making a list of YAPC::NA 2015 Videos using Mojo::DOM

Last week, Perl devs from all over North America, and some from other continents, met in Salt Lake City, Utah, for YAPC::NA 2015.

Last week, I took vacation. But I spent the time with family, going to Ohio.

So, of course, I wanted to get a list of all the talks that they were able to record and put on YouTube, to list for my local Mongers group, Purdue.pm, and to allow me to go back and watch at my leisure.

I could've parsed the HTML with regular expressions, but that isn't protocol, so I used this as an excuse to work with Mojo::DOM. I generally prefer finding code examples to reading documentation, so here's my code.

To get the HTML, I opened https://www.youtube.com/user/yapcna/videos in Chrome, clicked load more to get all of this years' videos ( and some of last year's), then grabbed the HTML from Chrome Dev Tools and pasted it into the __DATA__ section of the program. Grabbing the HTML with LWP or the like wouldn't have grabbed it all.

Enjoy!

	#!/usr/bin/env perl

	use feature qw{ say } ;
	use strict ;
	use warnings ;

	use Data::Dumper ;
	use Mojo::DOM ;

	# urls start at doc root, so we need the base
	my $base = 'https://www.youtube.com' ;

	my $file = join '', (<DATA>) ;
	my $dom = Mojo::DOM->new($file) ;

	# there are DIVS and LIs and SPANS and H3s and As galore. This gets just
	# the right LIs to start looking in
	for my $e ( $dom->find('.channels-content-item')->reverse->each ) {

	# content contains the title and link
	# $e->find() returns a Mojo::Collection object, essentially an array
	# we want to get the first/only Mojo::DOM object, so ->first
	my $content = $e->find('.yt-lockup-content')->first ;
	my $anchor = $content->find('a')->first ;
	my $title = $anchor->text ;
	my $link = $base . $anchor->attr('href') ;

	# meta contains when it was released, so we can distinguish
	# this year from last
	my $meta = $e->find('.yt-lockup-meta-info')->first ;
	my $days = $meta->find('li')->last->text ;
	# we only want the new stuff, so drop if the meta info
	# contains months
	my $bool = $days =~ m{months} ? 0 : 1 ;
	next unless $bool ;

	say qq{
	$c: $title
	$link
	} ;
	}

	exit ;

	__DATA__
	... paste the html in here ...

view raw yapcna2015.pl hosted with ❤ by GitHub

2015/06/03

Testing AJAX APIs with Perl

In my lab, we have an AJAX-laden web tool which loads a certain JSON API on page load. It was judged that what we had was too slow, so I created a program that wrote that JSON to a static file on regular intervals. Problem with that, of course, is that changes to the data would not show up in the static file until the next scheduled update.

So, we created a third version, which checks the database for checksum, and if it changes, it regenerates the file and sends the data. Otherwise, it opens the file and sends the data.

I tested with Chrome Dev Tools, which told a bit of the story, but at the scale where it's closer to anecdotes than data. I wanted to go into the hundreds of hits, not just one. I pulled out Benchmark, which told a story, but wasn't quite what I wanted. It started the clock, ran it n times, then stopped the clock, while I wanted to get clock data on each GET.

I also realized I needed to test to be sure that the data I was getting was the same, so I used Test::Most to compare the object I pulled out of the JSON. That was useful, but most useful was the program I wrote using Time::HiRes to more accurately grab the times, then use Statistics::Basic and List::Util to take the collected arrays of sub-second response times and show me how much faster it is to cache.

And it is fairly significant. The best and worst performance were comparable, but the average case has the cached version being about twice as fast, and using the static file being about 7 times faster. With, of course, the same problems.

If I wasn't about to take time out of the office, I'd start looking into other methods to get things faster. Good to know, though, that I have the means to test and benchmark it once I get back next week.

	#!/usr/bin/env perl

	# my modern perl boilerplate
	use feature qw( say ) ;
	use strict ;
	use warnings ;

	# modules used
	use LWP::UserAgent ;
	use Benchmark qw{ :all } ;

	my $agent = LWP::UserAgent->new() ;
	my $count = 20 ;
	my $base = 'https://example.edu/AJAX/endpoints' ;

	my @apis ;
	push @apis, '/the_caching_one.cgi' ;
	push @apis, '/the_dynamic_one.cgi' ;
	push @apis, '/the_static_file.json' ;

	timethese( $count , {
	'api' => sub { $agent->get( $base . $apis[0] ) } ,
	'cache' => sub { $agent->get( $base . $apis[1] ) } ,
	'static' => sub { $agent->get( $base . $apis[2] ) } ,
	} ) ;

	exit ;
	__DATA__
	Benchmark: timing 20 iterations of api, cache, static...
	api: 11 wallclock secs ( 1.14 usr + 0.06 sys = 1.20 CPU) @ 16.67/s (n=20)
	cache: 7 wallclock secs ( 1.05 usr + 0.03 sys = 1.08 CPU) @ 18.52/s (n=20)
	static: 2 wallclock secs ( 1.20 usr + 0.02 sys = 1.22 CPU) @ 16.39/s (n=20)

view raw b2.pl hosted with ❤ by GitHub

	#!/usr/bin/env perl

	# my modern perl boilerplate
	use feature qw( say ) ;
	use strict ;
	use warnings ;

	# modules used
	use List::Util qw{ min max sum } ;
	use LWP::UserAgent ;
	use Statistics::Basic qw(:all nofill) ;
	use Time::HiRes qw( gettimeofday tv_interval ) ;

	my $agent = LWP::UserAgent->new() ;
	my $count = 20 ;
	my $base = 'https://example.edu/AJAX/endpoints' ;

	my @apis ;
	push @apis, '/the_caching_one.cgi' ;
	push @apis, '/the_dynamic_one.cgi' ;
	push @apis, '/the_static_file.json' ;

	my $times ;

	# for each API endpoint being tested, run $count
	# times and collect the elapsed time it takes to get said URL
	# ensuring that the data is correct is another issue
	for my $c ( 1 .. $count ) {
	for my $api (@apis) {
	my $end = ( split m{/}, $api )[-1] ;
	my $url = $base . $api ;
	my $t0 = [gettimeofday] ;
	$agent->get($url) ;
	my $t1 = [gettimeofday] ;
	my $elapsed = tv_interval( $t0, $t1 ) ;
	push @{ $times->{$end} }, $elapsed * 1000 ;
	}
	}

	say join "\t", qw{ name iter min max mean median } ;
	say '-' x 55 ;

	for my $api ( sort keys %$times ) {
	my @times = @{ $times->{$api} } ;
	my $size = scalar @times ;
	my $max = max @times ;
	my $min = min @times ;
	my $omean = mean(@times) ;
	my $mean = 0 + $omean->query ;
	my $omedian = median(@times) ;
	my $median = 0 + $omedian->query ;

	say join "\t", $api,
	map { sprintf '%4d', $_ } $size, $min, $max, $mean, $median ;
	}

	say '' ;
	say 'All times in milliseconds. Smaller is better' ;
	say '' ;

	__DATA__

	name iter min max mean median
	-------------------------------------------------------
	pi 20 378 894 610 583
	pi.cgi 20 217 886 356 334
	pi.json 20 49 171 83 74

	All times in milliseconds. Smaller is better

view raw benchmark_api.pl hosted with ❤ by GitHub

	#!/usr/bin/env perl

	# this program compares three versions of the PI api for new submissions
	# to see if they have the same data. If they don't have the same data
	# their benchmarks are not comparable

	# my modern perl boilerplate
	use feature qw( say ) ;
	use strict ;
	use warnings ;

	# modules used
	use LWP::UserAgent ;
	use Test::Most ;
	use JSON ;

	my $agent = LWP::UserAgent->new() ;
	my $base = 'https://example.edu/AJAX/endpoints' ;

	my @apis ;
	push @apis, '/the_caching_one.cgi' ;
	push @apis, '/the_dynamic_one.cgi' ;
	push @apis, '/the_static_file.json' ;

	my $data ;

	# for each API endpoint being tested, get the responses
	# and store in $data
	for my $api (@apis) {
	my $end = ( split m{/}, $api )[-1] ;
	my $url = $base . $api ;
	my $r = $agent->get($url) ;
	if ( $r->is_success ) {
	my $content = $r->content ;
	$data->{$end} = decode_json($content) ;
	}
	else { say 'ERROR', $end }
	}

	# compare each endpoint's output with the others, using $done
	# to avoid duplication
	my $done ;
	for my $k1 ( sort keys %$data ) {
	for my $k2 ( sort keys %$data ) {
	next if $k1 eq $k2 ;
	my $k = join ' ', sort $k1 , $k2 ;
	next if $done->{$k}++ ;
	my $j1 = $data->{ $k1 }{ data } ;
	my $j2 = $data->{ $k2 }{ data } ;
	cmp_deeply( $j1 , $j2 , 'are equal: ' . $k ) ;
	}
	}

	done_testing() ;
	exit ;

view raw test_api.pl hosted with ❤ by GitHub

Cookie Notice