Last week, Perl devs from all over North America, and some from other continents, met in Salt Lake City, Utah, for YAPC::NA 2015.
Last week, I took vacation. But I spent the time with family, going to Ohio.
So, of course, I wanted to get a list of all the talks that they were able to record and put on YouTube, to list for my local Mongers group, Purdue.pm, and to allow me to go back and watch at my leisure.
I could've parsed the HTML with regular expressions, but that isn't protocol, so I used this as an excuse to work with Mojo::DOM. I generally prefer finding code examples to reading documentation, so here's my code.
To get the HTML, I opened https://www.youtube.com/user/yapcna/videos in Chrome, clicked load more to get all of this years' videos ( and some of last year's), then grabbed the HTML from Chrome Dev Tools and pasted it into the __DATA__ section of the program. Grabbing the HTML with LWP or the like wouldn't have grabbed it all.
Enjoy!
Last week, I took vacation. But I spent the time with family, going to Ohio.
So, of course, I wanted to get a list of all the talks that they were able to record and put on YouTube, to list for my local Mongers group, Purdue.pm, and to allow me to go back and watch at my leisure.
I could've parsed the HTML with regular expressions, but that isn't protocol, so I used this as an excuse to work with Mojo::DOM. I generally prefer finding code examples to reading documentation, so here's my code.
To get the HTML, I opened https://www.youtube.com/user/yapcna/videos in Chrome, clicked load more to get all of this years' videos ( and some of last year's), then grabbed the HTML from Chrome Dev Tools and pasted it into the __DATA__ section of the program. Grabbing the HTML with LWP or the like wouldn't have grabbed it all.
Enjoy!
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env perl | |
use feature qw{ say } ; | |
use strict ; | |
use warnings ; | |
use Data::Dumper ; | |
use Mojo::DOM ; | |
# urls start at doc root, so we need the base | |
my $base = 'https://www.youtube.com' ; | |
my $file = join '', (<DATA>) ; | |
my $dom = Mojo::DOM->new($file) ; | |
# there are DIVS and LIs and SPANS and H3s and As galore. This gets just | |
# the right LIs to start looking in | |
for my $e ( $dom->find('.channels-content-item')->reverse->each ) { | |
# content contains the title and link | |
# $e->find() returns a Mojo::Collection object, essentially an array | |
# we want to get the first/only Mojo::DOM object, so ->first | |
my $content = $e->find('.yt-lockup-content')->first ; | |
my $anchor = $content->find('a')->first ; | |
my $title = $anchor->text ; | |
my $link = $base . $anchor->attr('href') ; | |
# meta contains when it was released, so we can distinguish | |
# this year from last | |
my $meta = $e->find('.yt-lockup-meta-info')->first ; | |
my $days = $meta->find('li')->last->text ; | |
# we only want the new stuff, so drop if the meta info | |
# contains months | |
my $bool = $days =~ m{months} ? 0 : 1 ; | |
next unless $bool ; | |
say qq{ | |
$c: $title | |
$link | |
} ; | |
} | |
exit ; | |
__DATA__ | |
... paste the html in here ... |
No comments:
Post a Comment