Cookie Notice

As far as I know, and as far as I remember, nothing in this page does anything with Cookies.

2015/06/15

Making a list of YAPC::NA 2015 Videos using Mojo::DOM

Last week, Perl devs from all over North America, and some from other continents, met in Salt Lake City, Utah, for YAPC::NA 2015.

Last week, I took vacation. But I spent the time with family, going to Ohio.

So, of course, I wanted to get a list of all the talks that they were able to record and put on YouTube, to list for my local Mongers group, Purdue.pm, and to allow me to go back and watch at my leisure.

I could've parsed the HTML with regular expressions, but that isn't protocol, so I used this as an excuse to work with Mojo::DOM. I generally prefer finding code examples to reading documentation, so here's my code.

To get the HTML, I opened https://www.youtube.com/user/yapcna/videos in Chrome, clicked load more to get all of this years' videos ( and some of last year's), then grabbed the HTML from Chrome Dev Tools and pasted it into the __DATA__ section of the program. Grabbing the HTML with LWP or the like wouldn't have grabbed it all.

Enjoy!



#!/usr/bin/env perl
use feature qw{ say } ;
use strict ;
use warnings ;
use Data::Dumper ;
use Mojo::DOM ;
# urls start at doc root, so we need the base
my $base = 'https://www.youtube.com' ;
my $file = join '', (<DATA>) ;
my $dom = Mojo::DOM->new($file) ;
# there are DIVS and LIs and SPANS and H3s and As galore. This gets just
# the right LIs to start looking in
for my $e ( $dom->find('.channels-content-item')->reverse->each ) {
# content contains the title and link
# $e->find() returns a Mojo::Collection object, essentially an array
# we want to get the first/only Mojo::DOM object, so ->first
my $content = $e->find('.yt-lockup-content')->first ;
my $anchor = $content->find('a')->first ;
my $title = $anchor->text ;
my $link = $base . $anchor->attr('href') ;
# meta contains when it was released, so we can distinguish
# this year from last
my $meta = $e->find('.yt-lockup-meta-info')->first ;
my $days = $meta->find('li')->last->text ;
# we only want the new stuff, so drop if the meta info
# contains months
my $bool = $days =~ m{months} ? 0 : 1 ;
next unless $bool ;
say qq{
$c: $title
$link
} ;
}
exit ;
__DATA__
... paste the html in here ...
view raw yapcna2015.pl hosted with ❤ by GitHub

No comments:

Post a Comment