Cookie Notice

As far as I know, and as far as I remember, nothing in this page does anything with Cookies.


Graphs are not that Scary!

As with most things I blog about, this starts with Twitter. I follow a lot of people on Twitter, and I use Lists. I want to be able to group people more-or-less on community, because there's the community where they talk about programming, for example, and the community where they talk about music, or the town I live in.

I can begin to break things up myself, but curation is a hard thing, so I wanted to do it automatically. And I spent a long time not knowing what to do. I imagined myself traversing trees in what looks like linked lists reimagined by Cthulhu, and that doesn't sound like much fun at all.

Eventually, I decided to search on "graphs and Perl". Of course, I probably should've done it earlier, but oh well. I found Graph. I had used GD::Graph before, which is a plotting library. (There has to be some index of how overloaded words are.) And once I installed it, I figured it out: As a programmer, all you're dealing with are arrays and hashes. Nothing scary.

Word Ladder

We'll take a problem invented by Lewis Carroll called a "word ladder", where you find your way from one word (for example, "cold") to another ("warm") by changing one letter at a time:


Clearly, this can and is often done by hand, but if you're looking to automate it, there are three basic problems: what are the available words, how do you determine when words are one change away, and how do you do this to get the provable shortest path?

First, I went to CERIAS years ago and downloaded word lists. Computer security researchers use them because real words are bad passwords, so, lists of real words can be used to create rainbow tables and the like. My lists are years old, so there may be new words I don't account for, but unlike Lewis Carroll, I can get from APE to MAN in five words, not six.


Not sure that Lewis Carroll would've accepted AAS, but there you go

There is a term for the number of changes it takes to go from one word to another, and it's called the Levenshtein Distance. I first learned about this from perlbrew, which is how, if you type "perlbrew isntall", it guesses that you meant to type "perlbrew install". It's hardcoded there because perlbrew can't assume you have anything but perl and core modules. I use the function from perlbrew instead of Text::Levenshtein but it is a module worth looking into.

And the final answer is "Put it into a graph and use Dijkstra's Algorithm!"

Perhaps not with the exclamation point.

Showing Code

Here's making a graph of it:

#!/usr/bin/env perl

use feature qw{say} ;
use strict ;
use warnings ;

use Data::Dumper ;
use Graph ;
use List::Util qw{min} ;
use Storable ;

for my $l ( 3 .. 16 ) {
    create_word_graph($l) ;
exit ;

# -------------------------------------------------------------------
# we're creating a word graph of all words that are of length $length
# where the nodes are all words and the edges are unweighted, because
# they're all weighted 1. No connection between "foo" and "bar" because 
# the distance is "3".

sub create_word_graph {
    my $length = shift ;
    my %dict = get_words($length) ;
    my @dict = sort keys %dict ; # sorting probably is unnecessary
    my $g    = Graph->new() ;

    # compare each word to each word. If the distance is 1, put it
    # into the graph. This implementation is O(N**2) but probably
    # could be redone as O(NlogN), but I didn't care to.

    for my $i ( @dict ) {
        for my $j ( @dict ) {
            my $dist = editdist( $i, $j ) ;
            if ( $dist == 1 ) {
                $g->add_edge( $i, $j ) ;

    # Because I'm using Storable to store the Graph object for use
    # later, I only use this once. But, I found there's an endian
    # issue if you try to open Linux-generated Storable files in
    # Strawberry Perl.

    store $g , "/home/jacoby/.word_$" ;

# -------------------------------------------------------------------
# this is where we get the words and only get words of the correct
# length. I have a number of dictionary files, and I put them in
# a hash to de-duplicate them.

sub get_words {
    my $length = shift ;
    my %output ;
    for my $d ( glob( '/home/jacoby/bin/Toys/Dict/*' ) ) {
        if ( open my $fh, '<', $d ) {
            for my $l ( <$fh> ) {
                chomp $l ;
                $l =~ s/\s//g ;
                next if length $l != $length ;
                next if $l =~ /\W/ ;
                next if $l =~ /\d/ ;
                $output{ uc $l }++ ;
    return %output ;

# -------------------------------------------------------------------
# straight copy of Wikipedia's "Levenshtein Distance", straight taken
# from perlbrew. If I didn't have this, I'd probably use 
# Text::Levenshtein.

sub editdist {
    my ( $f, $g ) = @_ ;
    my @a = split //, $f ;
    my @b = split //, $g ;

    # There is an extra row and column in the matrix. This is the
    # distance from the empty string to a substring of the target.
    my @d ;
    $d[ $_ ][ 0 ] = $_ for ( 0 .. @a ) ;
    $d[ 0 ][ $_ ] = $_ for ( 0 .. @b ) ;

    for my $i ( 1 .. @a ) {
        for my $j ( 1 .. @b ) {
            $d[ $i ][ $j ] = (
                  $a[ $i - 1 ] eq $b[ $j - 1 ]
                ? $d[ $i - 1 ][ $j - 1 ]
                : 1 + min( $d[ $i - 1 ][ $j ], $d[ $i ][ $j - 1 ], $d[ $i - 1 ][ $j - 1 ] )
                ) ;

    return $d[ @a ][ @b ] ;

Following are what my wordlists can do. Something tells me that, when we get to 16-letter words, it's more a bunch of disconnected nodes than a graph.

1718 3-letter words
6404 4-letter words
13409 5-letter words
20490 6-letter words
24483 7-letter words
24295 8-letter words
19594 9-letter words
13781 10-letter words
8792 11-letter words
5622 12-letter words
3349 13-letter words
1851 14-letter words
999 15-letter words
514 16-letter words

My solver isn't perfect, and the first thing I'd want to add is ensuring that both the starting and ending words are actually in the word list. Without that, your code goes on forever.

So, I won't show off the whole program below, but it does use Storable, Graph and feature qw{say}.

dijkstra( $graph , 'foo' , 'bar' ) ;

# -------------------------------------------------------------------
# context-specific perl implementation of Dijkstra's Algorithm for
# shortest-path

sub dijkstra {
    my ( $graph, $source, $target, ) = @_ ;

    # the graph pre-exists and is passed in 
    # $source is 'foo', the word we're starting from
    # $target is 'bar', the word we're trying to get to

    my @q ; # will be the list of all words
    my %dist ; # distance from source. $dist{$source} will be zero 
    my %prev ; # this holds our work being every edge of the tree
               # we're pulling from the graph. 

    # we set the the distance for every node to basically infinite, then 
    # for the starting point to zero

    for my $v ( $graph->unique_vertices ) {
        $dist{$v} = 1_000_000_000 ;    # per Wikipeia, infinity
        push @q, $v ;
    $dist{$source} = 0 ;

LOOP: while (@q) {

        # resort, putting words with short distances first
        # first pass being $source , LONG WAY AWAY

        @q = sort { $dist{$a} <=> $dist{$b} } @q ;
        my $u = shift @q ;

        # say STDERR join "\t", $u, $dist{$u} ;

        # here, we end the first time we see the target.
        # we COULD get a list of every path that's the shortest length,
        # but that's not what we're doing here

        last LOOP if $u eq $target ;

        # this is a complex and unreadable way of ensuring that
        # we're only getting edges that contain $u, which is the 
        # word we're working on right now

        for my $e (
            grep {
                my @a = @$_ ;
                grep {/^${u}$/} @a
            } $graph->unique_edges
            ) {

            # $v is the word on the other end of the edge
            # $w is the distance, which is 1 because of the problem
            # $alt is the new distance between $source and $v, 
            # replacing the absurdly high number set before

            my ($v) = grep { $_ ne $u } @$e ;
            my $w   = 1 ;
            my $alt = $dist{$u} + $w ;
            if ( $alt < $dist{$v} ) {
                $dist{$v} = $alt ;
                $prev{$v} = $u ;

    my @nodes = $graph->unique_vertices ;
    my @edges = $graph->unique_edges ;
    return {
        distances => \%dist,
        previous  => \%prev,
        nodes     => \@nodes,
        edges     => \@edges,
        } ;

I return lots of stuff, but the part really necessary is %prev, because that, $source and $target are everything you need. Assuming we're trying to go from FOR to FAR, a number of words will satisfy $prev{FOR}, but it's the one we're wanting. In the expanded case of FOO to BAR, $prev->{BAR} = 'FAR', $prev->{FAR} is 'FOR', and $prev->{FOR} is 'FOO'.

And nothing in there is complex. It's all really hashes or arrays or values. Nothing a programmer should have any problem with.

CPAN has a number of other modules of use: Graph::Dijkstra has that algorithm already written, and Graph::D3 allows you to create a graph in such a way that you can use it in D3.js. Plus, there are a number of modules in Algorithm::* that do good and useful things. So go in, start playing with it. It's deep, there are weeds, but it isn't scary.


Modern Perl but not Modern::Perl

This started while driving to work. If I get mail from coworkers, I get Pushover notifications, and halfway from home, I got a bunch of notifications.

We don't know the cause of the issue, but I do know the result

We have env set on our web server set so that Perl is /our/own/private/bin/perl and not /usr/bin/perl, because this runs in a highly-networked and highly-clustered environment, mostly RedHat 6.8 and with 5.10.1 as system perl, if we want to have a consistent version and consistent modules, we need our own. This allows us to have #!/usr/bin/env perl as our shbang.

And this morning, for reasons I don't know, it stopped working. Whatever perl was being called, it wasn't /our/own/private/bin/perl. And this broke things.

One of the things that broke is this: Whatever perl is /usr/bin/env perl, it doesn't have Modern::Perl.

I'm for Modern Perl. My personal take is that chromatic and Modern Perl kept Perl alive in with Perl 5 while Larry Wall and the other language developers worked on Perl 6. Thus, I am grateful that it exists. But, while I was playing with it, I found a problem: Modern::Perl is not in Core, so you cannot rely on it being there, so a script might be running with a version greater than 5.8.8 and be able to give you everything you need, which to me is normally use strict, use warnings and use feature qw{say}, but if you're asking for Modern::Perl for it, it fails, and because you don't know which Modern thing you want, you don't know how to fix it.

This is part of my persistent hatred of magic. If it works and you don't understand how, you can't fix it if it stops working. I got to the heavy magic parts of Ruby and Rails and that, as well as "Life Happens", are why I stopped playing with it. And, I think, this is a contributing factor with this morning's brokenness.


Net::Twitter Cookbook: Favorites and Followers


Also known as "Likes", they're an indication in Twitter that you approve of a status update. Most of the time, they're paired with retweets as signs by the audience to the author that the post is agreeable. Like digital applause.

This is all well and good, but it could be used for so much more, if you had more access and control over them.

So I did.

The first step is to collect them. There's an API to get them, and collecting them in bulk is easy. A problem is avoiding grabbing the same tweet twice.

# as before, the "boilerplate" can be found elsewhere in my blog.
use IO::Interactive qw{ interactive } ;
my $config ;
$config->{start} = 0 ;
$config->{end}   = 200 ;

for ( my $page = $config->{start}; $page <= $config->{end}; ++$page ) {
        say {interactive} qq{\tPAGE $page} ;
        my $r = $twit->favorites( { 
            page => $page ,
            count => 200 ,
            } ) ;
        last unless @$r ;

        # push @favs , @$r ;
        for my $fav (@$r) {
            if ( $config->{verbose} ) { 
                 say {interactive} handle_date( $fav->{created_at} ) 
            store_tweet( $config->{user}, $fav ) ;
        sleep 60 * 3 ;    # five minutes

Once I had a list of my tweets, one of the first things I did was use them to do "Follow Friday". If you know who you favorited over the last week, it's an easy thing to get a list of the usernames, count them and add them until you have reached the end of the list or 140 characters.

Then, as I started playing with APIs and wanted to write my own ones, I created an API to find ones containing a substring, like json or pizza or sleep. This way, I could begin to use a "favorite" as a bookmark.

(I won't show demo code, because I'm not happy or proud of the the code, which lives in a trailing-edge environment, and because it's more database-related than Twitter-focused.)

As an aside, I do not follow back. There are people who follow me who I have no interest in reading, and there are people I follow who care nothing about my output. In general, I treat Twitter as something between a large IRC client and an RSS reader, and I never expected nor wanted RSS feeds to track me.

But this can be a thing worth tracking, which you can do, without any storage, with the help of a list. Start with getting a list of those following you, those you follow, and the list of accounts (I almost wrote "people", but that isn't guaranteed) in your follows-me list. If they follow you and aren't in your list, add them. If they're in the list and you have started following them, take them out. If they're on the list and aren't following you, drop them. As long as you're not big-time (Twitter limits lists to 500 accounts), that should be enough to keep a Twitter list of accounts you're not following.

use List::Compare ;

    my $list = 'id num of your Twitter list';

    my $followers = $twit->followers_ids() ;
    my @followers = @{ $followers->{ids} } ;

    my $friends = $twit->friends_ids() ;
    my @friends = @{ $friends->{ids} } ;

    my @list = get_list_members( $twit, $list ) ;
    my %list = map { $_ => 1 } @list ;

    my $lc1 = List::Compare->new( \@friends,   \@followers ) ;
    my $lc2 = List::Compare->new( \@friends,   \@list ) ;
    my $lc3 = List::Compare->new( \@followers, \@list ) ;

    # if follows me and I don't follow, put in the list
    say {interactive} 'FOLLOWING ME' ;
    for my $id ( $lc1->get_complement ) {
        next if $list{$id} ;
        add_to_list( $twit, $list, $id ) ;

    # if I follow, take off the list
    say {interactive} 'I FOLLOW' ;
    for my $id ( $lc2->get_intersection ) {
        drop_from_list( $twit, $list, $id ) ;

    # if no longer following me, take off the list
    say {interactive} 'NOT FOLLOWING' ;
    for my $id ( $lc3->get_complement ) {
        drop_from_list( $twit, $list, $id ) ;

#========= ========= ========= ========= ========= ========= =========
sub add_to_list {
    my ( $twit, $list, $id ) = @_ ;
    say STDERR qq{ADDING $id} ;
    eval { $twit->add_list_member(
            { list_id => $list, user_id => $id, } ) ; } ;
    if ($@) {
        warn $@->error ;

#========= ========= ========= ========= ========= ========= =========
sub drop_from_list {
    my ( $twit, $list, $id ) = @_ ;
    say STDERR qq{REMOVING $id} ;
    eval {
        $twit->delete_list_member( { list_id => $list, user_id => $id, } ) ;
        } ;
    if ($@) {
        warn $@->error ;

But are there any you should follow? Are there any posts in the the feed that you might "like"? What do you "like" anyway?

There's a way for us to get an idea of what you would like, which is your past likes. First, we must get, for comparison, a collection of what your Twitter feed is like normally. (I grab 200 posts an hour and store them. This looks and works exactly like my "grab favorites code", except I don't loop it.

    my $timeline = $twit->home_timeline( { count => 200 } ) ;

    for my $tweet (@$timeline) {
        my $id          = $tweet->{id} ;                          # twitter_id
        my $text        = $tweet->{text} ;                        # text
        my $created     = handle_date( $tweet->{created_at} ) ;   # created
        my $screen_name = $tweet->{user}->{screen_name} ;         # user id
        if ( $config->{verbose} ) {
            say {interactive} handle_date( $tweet->{created_at} );
            say {interactive} $text ;
            say {interactive} $created ;
            say {interactive} $screen_name ;
            say {interactive} '' ;
        store_tweet( $config->{user}, $tweet ) ;
        # exit ;

So, we have a body of tweets that you like, and a body of tweets that are a representative sample of what Twitter looks like to you. On to Algorithm::NaiveBayes!

use Algorithm::NaiveBayes ;
use IO::Interactive qw{ interactive } ;
use String::Tokenizer ;

my $list   = 'ID of your list';
my $nb     = train() ;
my @top    = read_list( $config, $nb , $list ) ;

say join ' ' , (scalar @top ), 'tweets' ;

for my $tweet (
    sort { $a->{analysis}->{fave} <=> $b->{analysis}->{fave} } @top ) {
    my $fav = int $tweet->{analysis}->{fave} * 100 ;
    say $tweet->{text} ;
    say $tweet->{user}->{screen_name} ;
    say $tweet->{gen_url} ;
    say $fav ;
    say '' ;

exit ;

#========= ========= ========= ========= ========= ========= =========
# gets the first page of your Twitter timeline.
# avoids checking a tweet if it's 1) from you (you like yourself;
#   we get it) and 2) if it doesn't give enough tokens to make a
#   prediction.
sub read_list {
    my $config = shift ;
    my $nb     = shift ;
    my $list   = shift ;


    my @favorites ;
    my $timeline =     $twit->list_statuses({list_id => $list});

    for my $tweet (@$timeline) {
        my $id          = $tweet->{id} ;                          # twitter_id
        my $text        = $tweet->{text} ;                        # text
        my $created     = handle_date( $tweet->{created_at} ) ;   # created
        my $screen_name = $tweet->{user}->{screen_name} ;         # user id
        my $check       = toke( lc $text ) ;
        next if lc $screen_name eq lc $config->{user} ;
        next if !scalar keys %{ $check->{attributes} } ;
        my $r = $nb->predict( attributes => $check->{attributes} ) ;
        my $fav = int $r->{fave} * 100 ;
        next if $fav < $config->{limit} ;
        my $url = join '/', 'http:', '', '', $screen_name,
            'status', $id ;
        $tweet->{analysis} = $r ;
        $tweet->{gen_url}  = $url ;
        push @favorites, $tweet ;

    return @favorites ;

#========= ========= ========= ========= ========= ========= =========
sub train {

    my $nb = Algorithm::NaiveBayes->new( purge => 1 ) ;
    my $path = '/home/jacoby/.nb_twitter' ;

    # adapted on suggestion from Ken to

    # gets all tweets in your baseline table
    my $baseline = get_all() ;
    for my $entry (@$baseline) {
        my ( $tweet, $month, $year ) = (@$entry) ;
        my $label = join '', $year, ( sprintf '%02d', $month ) ;
        my $ham = toke(lc $tweet) ;
        next unless scalar keys %$ham ;
            attributes => $ham->{attributes},
            label      => ['base'],
            ) ;

    gets all tweets in your favorites table
    my $favorites = get_favorites() ;
    for my $entry (@$favorites) {
        my ( $tweet, $month, $year ) = (@$entry) ;
        my $label = join '', $year, ( sprintf '%02d', $month ) ;
        my $ham = toke(lc $tweet) ;
        next unless scalar keys %$ham ;
            attributes => $ham->{attributes},
            label      => ['fave'],
            ) ;

    $nb->train() ;
    return $nb ;

#========= ========= ========= ========= ========= ========= =========
# tokenizes a tweet by breaking it into characters, removing URLs
# and short words
sub toke {
    my $tweet = shift ;
    my $ham ;
    my $tokenizer = String::Tokenizer->new() ;
    $tweet =~ s{https?://\S+}{}g ;
    $tokenizer->tokenize($tweet) ;

    for my $t ( $tokenizer->getTokens() ) {
        $t =~ s{\W}{}g ;
        next if length $t < 4 ;
        next if $t !~ /\D/ ;
        my @x = $tweet =~ m{($t)}gmix ;
        $ham->{attributes}{$t} = scalar @x ;
    return $ham ;

Honestly, String::Tokenizer is probably a bit too overkill for this, but I'll go with it for now. It might be better to get a list of the 100 or 500 most common words and exclude them from the tweets, instead of limiting by size. As is, strings like ada and sql would be excluded. But it's good for now.

We get a list of tweets including a number between 0 and 1, representing the likelihood, by Bayes, that I would like the tweet. In the end, it's turned into an integer between 0 and 100. You can also run this against your normal timeline to pull out tweets you would've liked but missed. I often do this

I run the follows_me version on occasion. So far, it is clear to me that the people I don't follow, I don't follow for a reason, and that remains valid.

If you use this and find value in it, please tell me below. Thanks and good coding.


Using the Symbol Table: "Help"?

I've been looking at command-line code for both fun and work. I know I can have one module handle just the interface, and have the module where the functionality happens pass the functionality along.

#!/usr/bin/env perl

use feature qw'say state' ;
use strict ;
use warnings ;
use utf8 ;

my $w = Wit->new( @ARGV ) ;
$w->run() ;

package Wit ;
use lib '/home/jacoby/lib' ;
use base 'Witter' ;
use Witter::Twitter ;


package Witter ;

# highly adapted from perlbrew.

use feature qw{ say } ;
use strict ;
use warnings ;

sub new {
    my ( $class, @argv ) = @_ ;
    my $self ;
    $self->{foo}  = 'bar' ;
    $self->{args} = [] ;
    if (@argv) {
        $self->{args} = \@argv ;
    return bless $self, $class ;

sub run {
    my ($self) = @_ ;
    $self->run_command( $self->{args} ) ;

sub run_command {
    my ( $self, $args ) = @_ ;

    if (   scalar @$args == 0
        || lc $args->[0] eq 'help'
        || $self->{help} ) {
        $self->help(@$args) ;
        exit ;

    if ( lc $args->[0] eq 'commands' ) {
        say join "\n\t", '', $self->commands() ;
        exit ;

    my $command = $args->[0] ;

    my $s = $self->can("twitter_$command") ;
    unless ($s) {
        $command =~ y/-/_/ ;
        $s = $self->can("twitter_$command") ;

    unless ($s) {

        my @commands = $self->find_similar_commands($command) ;
        if ( @commands > 1 ) {
            @commands = map { '    ' . $_ } @commands ;
                "Unknown command: `$command`. Did you mean one of the following?\n"
                . join( "\n", @commands )
                . "\n" ;
        elsif ( @commands == 1 ) {
            die "Unknown command: `$command`. Did you mean `$commands[0]`?\n"
        else {
            die "Unknown command: `$command`. Typo?\n" ;

    unless ( 'CODE' eq ref $s ) { say 'Not a valid command' ; exit ; }

    $self->$s(@$args) ;

sub help {
    my ($self,$me,@args) = @_ ;
    say 'HELP!' ;
    say join "\t", @args;

sub commands {
    my ($self) = @_ ;
    my @commands ;

    my $package = ref $self ? ref $self : $self ;
    my $symtable = do {
        no strict 'refs' ;
        \%{ $package . '::' } ;
        } ;

    foreach my $sym ( sort keys %$symtable ) {
        if ( $sym =~ /^twitter_/ ) {
            my $glob = $symtable->{$sym} ;
            if ( defined *$glob{CODE} ) {
                $sym =~ s/^twitter_// ;
                $sym =~ s/_/-/g ;
                push @commands, $sym ;

    return @commands ;

# Some functions removed for sake of brevity

package Witter::Twitter ;

use strict ;
use feature qw{ say state } ;
use warnings FATAL => 'all' ;

use Exporter qw{import} ;
use Net::Twitter ;
use JSON::XS ;

our $VERSION = 0.1 ;

our @EXPORT ;
for my $entry ( keys %Witter::Twitter:: ) {
    next if $entry !~ /^twitter_/mxs ;
    push @EXPORT, $entry ;

sub twitter_foo {
    my ( $self, @args ) = @_ ;
    say "foo" ;
    say join '|', @args ;
1 ;

And the above works when called as below.
jacoby@oz 13:49 60°F 51.24,-112.49 ~ 
$ ./witter 

jacoby@oz 13:52 60°F 51.25,-94.51 ~ 
$ ./witter help 

jacoby@oz 13:53 60°F 50.59,-88.64 ~ 
$ ./witter commands


jacoby@oz 13:53 60°F 50.59,-88.64 ~ 
$ ./witter help foo

jacoby@oz 13:53 60°F 50.59,-88.64 ~ 
$ ./witter foo

jacoby@oz 13:53 60°F 50.59,-88.64 ~ 
$ ./witter moo
Unknown command: `moo`. Did you mean `foo`?

In the above example, I'm just doing the one add-on module, Witter::Twitter and one function, Witter::Twitter::foo, but clearly, I would want it open-ended, so that if someone wanted to add Witter::Facebook, all the information about the Facebook functions would be in that module.

Then, of course, I would have to use another prefix than twitter_, but we'll leave that, and ensuring that modules don't step on each others' names, to another day.

The part that concerns me is help. Especially help foo. It should be part of the the module it's in; If Witter::Twitter is the module with foo(), only it should be expected to know about foo().

But how to communicate it? I'm flashing on our %docs and $docs{foo}= 'This is foo, silly' but the point of the whole thing is to allow the addition of modules that the initial module doesn't know about, and it would require knowing to look for %Witter::Twitter::docs.

I suppose adding a docs_ function that looks like this.
sub docs_foo {
    return q{
    This explains the use of the 'foo' command 


I'm diving into this in part because I have code that uses basically this code, and I need to add functionality to it, and while I'm in there, I might as well make user documentation better. Or even possible.

I'm also parallel-inspired by looking at a Perl project built on and using old Perl ("require 5.005") and recent blog posts about Linus Torvalds and "Good Taste". There's something tasteful about being able to add use Your::Module and nothing else to code, but if the best I can do is knowledge that there's a foo command, with no sense of what it does, that seems like the kind of thing that Linus would rightfully curse me for.

Is there a better way of doing things like this? I recall there being interesting things in Net::Tumblr that would require me to step up and learn Moose or the like. This is yet another important step toward me becoming a better and more tasteful programmer, but not germane to today's ranting.


Gender and Wearables?

First I heard about modern wearables was at Indiana Linuxfest, where the speaker went on about the coming wave of microcontrollers and posited a belt buckle that was aware of when it was pointing toward magnetic north and activate a haptic feedback device, so that, for the wearer, eventually true sense of direction would eventually become another sense.

I'm sure I could find a sensor that could tell me that, that could ship from China and cost me less than a gumball. I'm sure I could easily get a buzzer, that I could control it all with a small Arduino board like a Trinket or Flora or Nano or the like. and Jimmy DiResta has already taught me how to make a belt buckle. And I actually kinda want one. But I haven't made it.

In part it's because my available resources to push toward projects like this are small at the moment. In part, though, it's because, once I put on my watch, my tablet, my glasses and the Leatherman on my belt, I'm accessorized out.

I think most American men are about the same. 


Perl on Ubuntu on Windows: A Solution

I suppose I should've just been more patient.

After a while of waiting and watching and trying to think of a better bug report, one that might get a response, and failing, I got a response.

You can't install the module because File::Find can not recurse directories 
on the file system that is in use by Ubuntu on Windows.

The solution is to edit

sudo vi /usr/lib/perl/5.18.2/
Set dont_use_nlink to 'define':

dont_use_nlink => 'define',
Now it's possible to install all modules you want!
(this is a dupliceate of #186)

I haven't made this change yet. I am loathe to change core modules like that, although I have done so in the past. Because I have done so in the past, and it ended up stupid. But I will likely do it.

I was mentally throwing it to the kernel, but was wrong, which is interesting. Makes me think that, rather than running Ubuntu on Windows, doing something with Vagrant might be the better plan.


Net::Twitter Cookbook: Tweetstorm!

I had planned to get into following, lists and blocking, but a wild hair came over me and I decided I wanted to make a tweetstorm app.

What's a tweetstorm?

And this is the inner loop of it.
    my @tweetstorm_text ; # previously generated. URL shortening and numbering are assumed
    my $screen_name ; #your screen name as text
    my $status_id ; #created but not set, so first tweet won't be a reply-to

    for my $status ( @tweetstorm_text } ) {

        my $tweet ; # the object containing the twee
        $tweet->{status} = $status ;
        $tweet->{in_reply_to_status_id} = $status_id if $status_id ;

        if ( $twit->update($tweet) ) { #if Tweet is successful

            # we get your profile, which contains your most recent tweet
            # $status_id gets set to the ID of that tweet, which then
            # gets used for the next tweet.

            my $profiles
                = $twit->lookup_users( { screen_name => $config->{user} } ) ;
            my $profile = shift @$profiles ;
            my $prev    = $profile->{status} ;
            $status_id = $profile->{status}->{id} ;

So, that's it. I've thrown the whole thing on GitHub if you want to play with it yourself.


Thoughts on ML Techniques to better handle Twitter

Thinking things through afk and thus not as polished as my normal posts.

Been thinking about grouping my friends (those I follow) strictly by relationship mapping. In part, I haven't done this because I can't read the math in the paper describing it, and in part because there are points where I serve as connecting node between two clusters and they have started interacting independently. I know a joy of Twitter is that it allows people to connect by interest and personality, not geography, but when a programmer in Washington and an activist in Indiana talk food and cats with each other, it makes my "programmer" cluster and my "Indiana" cluster less distinct.

So, what to do?

Topic Modelling.

I know about this via the Talking Machines podcast, and, without mathematic notation, if you take a body of text as a collection of words, the words it contains will vary by subject. If the topic is "politics", the text might contain "vote" and "veto" and "election" and "impeach". If the topic is "football", we'd see "lateral", "quarterback", "tackle" and "touchdown".

Rather than separating Twitter followers into groups simply by interactions, I could start with both certain lists I have curated (and yes, there are both "local tweeters" and "programmer" lists) and hashtags (because if you hashtag your tweet #perl, you likely are talking about Perl) to start to identify what words are more likely to come up when discussing certain subjects, then start adding then to those lists automatically.

If I can work this out 140 characters at a time.


Perl on Ubuntu on Windows: Finding The Right Test Case

I'm still hung on getting CPAN working for Ubuntu on Windows. In the comments, Chas. Owens gave me great advice for proceeding with this problem:
Write a minimal test case, run the test case with strace on Ubuntu on Windows and Ubuntu on Linux. The outputs are supposed be identical, if they aren't (and we expect them not to be because of the error), then you have a good shot at identifying the misbehaving syscall (which is the only difference between Ubuntu on Windows and Ubuntu on Linux).

Once you see the difference, look into what the syscall does and try to determine which system is implementing it correctly (probably Linux). If it is Windows post to the github tracker with the test case and the identified syscall. If it is Linux, then report it to Ubuntu.
My first thought was to do this within cpanm, but I thought sudo su then strace -o platform -ff strace cpanm YAML::XS was a bit much. In part because when platform was Linux, it generated one file and hundreds on Windows.

Then it struck me that instead, I should just focus on the tests themselves. I cloned Ingy döt Net's YAML repo and tried to run prove test/ in both places. Went fine with Linux, failed with Windows. Butrealized after a second, it succeeded while using my personal perl, not system perl. /usr/bin/prove test/ failed on Ubuntu. apt-get install libtest-base-perl on both systems helped a lot, but now it wants Test::More, (I know because I searched for what modules the tests are asking for.)

For all I know, there's a package that provides Test::More, but it isn't libtest-more-perl, and digging further into that seems like a waste.

So I'm thinking it through again, looking at a failing test in YAML::XS:

use t::TestYAMLTests tests => 2;
use utf8;

is Dump("1234567890\n1234567890\n1234567890\n"), "--- |
", 'Literal Scalar';

is Dump("A\nB\nC\n"), q{--- "A\nB\nC\n"} . "\n", 'Double Quoted Scalar';

By "failing test" I am saying it works in natural Linux but not Ubuntu on Windows. And it's striking me: I need to find Dump. Where is Dump? In the C. It is an XS module, is it not? So, it's striking me that the solution is in C.

Which means I have to write C.

More later.

I think there's only been one time when I coded C on the clock, and only one time when my C knowledge was required on the clock.

The former was at a former workplace, where I wrote and compiled some C to compare UltraEdit with another editor, so I could help decide which we were going to buy a site license for. As I can only remember UltraEdit, I can say that's the one I liked better. The code itself was scarcely better than Hello World.

The latter was at another former workplace, where there was a tool that allowed mechanical engineers to drag together components like traces, and then first turned those traces into C code and compiled them. There was an issue where it wouldn't work, and I found the error logs and worked back.

I'm looking at perl_libyaml.c. I'm looking at perl_libyaml.h. I don't have nearly enough C skills to start here.
 * This is the main Dump function.
 * Take zero or more Perl objects and return a YAML stream (as a string)
Dump(SV *dummy, ...)
    perl_yaml_dumper_t dumper;
    yaml_event_t event_stream_start;
    yaml_event_t event_stream_end;
    int i;
    SV *yaml = sv_2mortal(newSVpvn("", 0));
    sp = mark;


    /* Set up the emitter object and begin emitting */
    yaml_emitter_set_unicode(&dumper.emitter, 1);
    yaml_emitter_set_width(&dumper.emitter, 2);
        (void *) yaml
    yaml_emitter_emit(&dumper.emitter, &event_stream_start);

    dumper.anchors = newHV();
    dumper.shadows = newHV();

    sv_2mortal((SV *)dumper.anchors);
    sv_2mortal((SV *)dumper.shadows);

    for (i = 0; i < items; i++) {
        dumper.anchor = 0;

        dump_prewalk(&dumper, ST(i));
        dump_document(&dumper, ST(i));


    /* End emitting and destroy the emitter object */
    yaml_emitter_emit(&dumper.emitter, &event_stream_end);

    /* Put the YAML stream scalar on the XS output stack */
    if (yaml) {

Hrm. Maybe back to the Perl.

Looking at, and make it seem like you could make a master class on how modules go together. Previously, when I've dived in to figure out issues, yeah, there's lots of confusing parts but I could find my way to the broken thing. perl_libyaml.c is looking much more approachable now.


Overkill III: Permutations of Overkill (Perl6)

Long-time readers remember my previous code with the Magic Box. In short, given the numbers [ 3, 4, 5, 6, 7, 8, 9, 10, 11 ] , arrange them in a 3x3 square such that every row, column and diagonal adds up to 21.

My first pass, on a very old computer, had the C version taking a second and a half, the Perl5 version taking a minute and a half, and the Perl6 version taking an hour and a half. But I no longer use that computer, and am now using a much newer one, so I took the opportunity to try again.

I have Nvidia cards in my box, so I could eventually write Overkill IV about that, but I don't have the knowledge or time right now. I did install rakudobrew and the newest version of Perl6, but otherwise, everything was unchanged.

C dropped to 0.6 seconds. Perl5 now takes 16 seconds. Perl6 now takes 10 minutes. So, the newer computer brought substantial speed gains for each, but the most dramatic is with Perl6, which tells us that the Perl6 developers have not only worked on correctness and completeness, but speed. Good for them! Good for us!

But wait! There's more!

@loltimo saw the post and suggested that I use permutations.

"What are permutations?", you ask? That's what basically happens in recursive_magic_box(), where all the possible variations are created. Given A, B and C, there are six possible orderings: ABC ACB BAC BCA CAB CBA. If I was given all possible orderings of an array, rather than (mis)handling it on my own and having to check that I got it right, would that make things faster?

As it turns out, yes. 47 seconds, rather than 10 minutes. More than 10x faster. But still, not as fast as Perl5. I now want to implement permutations for Perl5 and see if it makes that one faster.

And, it makes me feel much better about Perl6.

Code Samples Below


Perl and cpanm on Bash on Ubuntu on Windows 10: Who do I report this to?

I promised a longer post on Bash on Ubuntu on Windows 10, explaining why I use it, when I don't and why I even use Windows 10, but that isn't today.

Today is me trying to figure out who to report a bug to, and how to effectively report it.

The system (called "Ubuntu on Windows" for the rest of this blog post) is Ubuntu 14.04 LTS running kinda with Windows as the kernel instead of Linux. A good thing is that, if you can apt-get install a package, you can put it on Ubuntu on Windows. I don't think you can use it if 1) it connects to the kernel (because the kernel is Windows) or 2) it pops up a window. I've heard about people using XMing to get around that, but I haven't tried it yet.

The problem is that you get what you have available with dpkg, and the perl that comes with 14.04 is 5.18, where current is 5.24. Some modules haven't been packaged, and there's usually a gap between the package version and the cpan version, so I tend to use perlbrew and use my own perl instead of system perl.

And cpan and cpanm can't build on Ubuntu on Windows. I demonstrated this by trying to install YAML::XS with cpanm. I won't include the build log, but I will link to the build log. The problem comes at lines 146, 162 and 207:
#   Failed test 'Literal Scalar'
#   at t/dump-heuristics.t line 4.
#          got: '--- '1234567890
#   1234567890
#   1234567890
# '
# '
#     expected: '--- |
#   1234567890
#   1234567890
#   1234567890
# '

#   Failed test 'Double Quoted Scalar'
#   at t/dump-heuristics.t line 10.
#          got: '--- 'A
#   B
#   C
# '
# '
#     expected: '--- "A\nB\nC\n"
# '
# Looks like you failed 2 tests of 2.
t/dump-heuristics.t ...... 


#   Failed test 'Dumping Chinese hash works'
#   at t/utf8.t line 31.
#          got: '---
# Email:
# 地址: 新竹市 300 石坊街 37-8 號
# 店名: OpCafé
# 時間: 11:01~23:59
# 電話: '03-5277806
#   0991-100087
# '
# '
#     expected: '---
# Email:
# 地址: 新竹市 300 石坊街 37-8 號
# 店名: OpCafé
# 時間: 11:01~23:59
# 電話: "03-5277806\n0991-100087\n"
# '
# Looks like you failed 1 test of 8.
t/utf8.t ................. 
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/8 subtests 

The problem involves incorrectly handling newlines, but all this works and YAML::XS installs perfectly on my personal Perl, which I demonstrate below.
$ which perl ; perl -v | head -n 5 ; perl -MYAML::XS -e "print 'Yet Another Perl Hacker'"


This is perl 5, version 24, subversion 0 (v5.24.0) built for i686-linux
(with 1 registered patch, see perl -V for more detail)

Copyright 1987-2016, Larry Wall
Yet Another Perl Hacker

So, if it's Perl on Ubuntu on Windows, and the Perl works perfectly other places (including Strawberry Perl on the same Windows 10 machine (but only in PowerShell) ) and the Ubuntu works perfectly on other places, it must be the Windows, right?

That's what I thought, too. There's a GitHub repo for Ubuntu on Windows. Well, really, it's an issue tracker for Ubuntu on Windows, because there's nothing else in there. You can't clone and build your own. But, it does put the issue tracker in a place where the natural audience for Ubuntu on Windows would go. I put in an issue, but I think I might have asked wrong. I have cleaned up my issue some, but I see no sign that anyone else has seen it.

So, what is my next step? It isn't just YAML::XS; that's just a module I wanted at the time. It isn't just cpanm; I tried with cpan, too. It is just with Windows 10. Thoughts and suggestions from the greater Perl community?


On Not YET Contributing to an Open Source Project

Not the module in question. 

I had an idea, and the shortest path between here and a working implementation is through CPAN, so I found a module and tried to install it.

No go. Failed tests.

So I find the GitHub repo and make an issue.

That doesn't get me any closer to a working implementation of my idea, nor does it involve me coding or running code. We can't have that, so I took the next step, which was forking, cloning and branching the code, then playing with it to find out what's going on and why.

I won't tell you which repo; that's really besides the point. I've dealt with Perl enough to know that, when the module was last updated, everything was tested and everything worked. There is therefore no condemnation for those who are in CPAN.

Simply speaking, at some point, the API the module is interacting with changed how it works, returning an image instead of JSON that would contain the URL of said image. Not an unreasonable thing to do, I think. That saves you a step, and for my purposes, I'd never have need to get either the image or the location of the image. I feel free to believe that force installing the module would get me
an acceptable outcome.

But if I know a thing is broken and I know how to fix it, and I don't fix it, I'm simply leaving the world in a broken state. That's hardly a responsible response.

The question becomes "Just how do I fix this?", and I see three choices:
  • FOLLOW THE TEST! The test wants an object which is converted from the JSON the API returns. I could easily skip the API call and just return{whatever}/image/. This will even be a valid URL, but when the API changes again, which it will, the URL will be inaccurate. This seems brittle to me.
  • CHANGE THE TEST! The API wants to pass back an image. I can pass that on through and rewrite the test. But the module isn't brand new, which means that someone out there is using the failing function to get the URL of the image and this change will brake that existing code. It's probably already broken; this module won't install, as established, but this is a significant change in the API, which shouldn't be made without consideration. 
  • DROP IT ALL! Remove the function that causes the problem. Remove the tests. Leave only a stub saying "This functionality has been removed". In some ways, this is the coward's way out, but it would be the simplest thing. Easier to remove functionality than to add it.
After I thought about it, I was leaning toward an All-Three approach: Making three branches, implementing each idea, then submitting three parallel pull requests, leaving it to the maintainer to decide the proper choice.

Yeah, that's not a smart plan. 2/3 of that work is going to be not used, by definition. After consultation with my advisors (read, anyone who would listen on Twitter, IRC or Hangouts), I decided that asking before coding anything, so I added details to my (admittedly short and incomplete, submitted before understanding) issue to say "I see three alternatives; what do you want me to do?"

But that is the proper way to do it. It isn't like I went "I want a feature, and here it is." I just want a working module to do the thing I'm thinking of (which might not be as cool as I thought it was), and the things I could do to make it work again could bug the people actually using it. The community of users.

every change breaks someone's workflow
As always, this is a point where xkcd understands and explains all.
So, I have been thinking about this, asking about this and writing about this, rather than implementing the initial idea, or even fixing the problem.

Which frustrates me to no end.


A Little Fun with OpenCV

I have a webcam that I have set to take pictures at work. Is this a long-term art project? Is it a classic Twitter over-share? An example of Quantified Self gone horribly, horribly wrong? I honestly don't know.

I do know that it's brought me into interesting places before. Getting my camera to work with V4L. Tweeting images. Determining if the screen is locked, both on Unity and Gnome. I've come to the point where I can set @hourly ~/bin/ I don't, rather 0 9-18 * * 1-5 ~/bin/, because if it's before 9am or on a weekend, I won't be here, so don't even try.

But if I'm pulled away, there are photos like the above. Who needs to see my space without me?

Yeah, who needs to see my space with me? We'll pass on that.

One of the members of Purdue Perl Mongers formerly worked with OpenCV, and I asked him to present on that topic this month. He even created a GitHub repo with his example code. It's Python, not Perl, which will be a digression for this blog, but not an unprecedented one. This gave me enough confidence in OpenCV and how it worked to create a program that uses it to tell if anyone's in the frame.

(The demo code suggested installing python-opencv for Ubuntu, to which I must add that the crucial opencv-data does not come with it. Just a warning.)


# usage:
# 1

# when runs, returns the number of upper bodies, and thus people, it currently sees

import cv2
# pass either device id (0 usually for webcam) or path to a video file
cam = cv2.VideoCapture(0)

# face cascade bundled with OpenCV
# classifier = cv2.CascadeClassifier('/usr/share/opencv/haarcascades/haarcascade_frontalface_default.xml')
classifier = cv2.CascadeClassifier('/usr/share/opencv/haarcascades/haarcascade_upperbody.xml')
# I found that haarcascade_upperbody did the best for identifying me.

# read in frame
retval, frame =

# convert to grayscale
frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

# run classifier
results = classifier.detectMultiScale(
 # let's assume face will be close/big if in front of webcam

# number of found upper bodies in webcam
print len(results)

And here it is in context.
use 5.010 ;
use strict ;
use warnings ;
use IO::Interactive qw{ interactive } ;
use Net::Twitter ;
use YAML::XS qw{ DumpFile LoadFile } ;
use Getopt::Long ;

use lib '/home/jacoby/lib' ;
use Locked qw{ is_locked there };

my $override = 0 ;

    'override' => \$override ,
    ) ;

my $config_file = $ENV{ HOME } . '/.twitter.cnf' ;
my $config      = LoadFile( $config_file ) ;

my $user = 'screen_name' ;
my $status = 'Just took a picture' ;

if ( is_locked() == 0 && there() == 1 || $override ) {
    say { interactive } 'unlocked' ;
    # highest supported resolution of my old-school webcam 
    # Twitter DOES have a 5MB limit on file size
    my $cam = q{/usr/bin/fswebcam -q -r 640x480 --jpeg 85 -D 1 --no-banner} ;
    my $time = time ;
    my $image = "/home/jacoby/Pictures/Self/shot-${time}.jpeg" ;
    qx{$cam $image} ;
    tweet_pic( $status , $image ) ;
    say { interactive } $time ;

sub tweet_pic {
    my ( $status , $pic ) = @_ ;
    my ( $access_token, $access_token_secret ) ;
    ( $access_token, $access_token_secret ) = restore_tokens( $user ) ;
    my $twit = Net::Twitter->new(
        traits              => [qw/API::RESTv1_1/],
        consumer_key    => $config->{ consumer_key },
        consumer_secret => $config->{ consumer_secret },
        ssl => 1 ,
        ) ;

    if ( $access_token && $access_token_secret ) {
        $twit->access_token( $access_token ) ;
        $twit->access_token_secret( $access_token_secret ) ;
    unless ( $twit->authorized ) {
        say qq{UNAUTHTORIZED} ; exit ;
        # I really handle this better

    my $img_ref ;
    my $file = $pic ;
    my ( $filename ) = reverse split m{/} , $pic ;
    push @$img_ref , $file ;
    push @$img_ref , $filename ;

    if ( $twit->update_with_media( $status , $img_ref  ) ) {
        no warnings ;
        say { interactive } $status ;
    else {
        say { interactive } 'FAIL' ;

# is in, but placed here for simplicity
# there are issues with recognizing my webcam, which 
# we quash by sending STDERR to /dev/null
sub there {
    my $checker = '/home/jacoby/bin/' ;
    return -1 if ! -f $checker ;
    my $o = qx{$checker 2> /dev/null} ;
    chomp $o ;
    return $o ;

sub restore_tokens {
    my ( $user ) = @_ ;
    my ( $access_token, $access_token_secret ) ;
    if ( $config->{ tokens }{ $user } ) {
        $access_token = $config->{ tokens }{ $user }{ access_token } ;
        $access_token_secret =
            $config->{ tokens }{ $user }{ access_token_secret } ;
    return $access_token, $access_token_secret ;

sub save_tokens {
    my ( $user, $access_token, $access_token_secret ) = @_ ;
    $config->{ tokens }{ $user }{ access_token }        = $access_token ;
    $config->{ tokens }{ $user }{ access_token_secret } = $access_token_secret ;
    DumpFile( $config_file, $config ) ;
    return 1 ;

Clearly, this is the start, it's currently very cargo-cult code, and it has several uses. A coming use is to go through my archive of pictures and identifying the ones where I'm gone. That'll either require me getting up to speed with Image::ObjectDetect or Python, but it's a thing I'm willing to do.

An interesting side-issue: I've found that the existing Haar cascades (the XML files that define a thing OpenCV can identify) do not like my glasses, and thus cannot identify my eyes or face with them on, thus, me using the upperbody cascade. I think I should train my own Haar classifier; I know I have enough pics of me for it.


Net::Twitter Cookbook: Images and Profiles

I've covered how you handle keys and authenticating with Twitter previously, so look into those for more information as we go ahead with sending tweets!

There was a time when Twitter was just text, so you could just send a link to an image. There were sites like TwitPix that hosted images for you, but eventually Twitter developed the ability to host media.

    my $media ;
    push @$media, $config->{ file } ;
    push @$media, 'icon.' . $suffix ;
    if ( $twit->update_with_media( $status , $media ) ) {
        say 'OK' ;

There are four ways you could define the media:

    $media = [ 'path to media' ] ;
    $media = [ 'path to media', 'replacment filename' ] ;
    $media = [ 'path to media', 'replacment filename', 
            Content-Type => mime/type' ] ;
    $media = [  undef , 'replacment filename', 
            Content-Type => mime/type', 'raw media data'] ;

I use the second in order to specify the file name that isn't whatever long and random filename it has. The module has ways to guess the correct mime type, but you can specify to avoid that. The last one, starting with undef, allows you to create or modify an image with Image::Magick or whatever you choose and tweet it without having to involve the filesystem at all.

A few words about what we mean by media. We mean images or video; no audio file option. By images, we mean PNG, JPG, GIF or WEBP. No SVG, and less that 5MB as of this writing. (Check for more info later.)

Video has other constraints, but I don't tweet just video often, if ever. I generally put it to YouTube and post the link.

There's a couple ways you can have more fun with images on Twitter. Those would be with your profile image, the square image that sits next to your tweet, and the profile banner, the landscape image that shows up at the top of your page.

They work like the above, handling media the same way.

    my $image ;
    push @$image, $config->{ file } ;
    push @$image, 'icon.' . $suffix ;
    if ( $twit->update_profile_image( $image ) ) {
        say 'OK' ;

    my $banner ;
    push @$banner, $config->{ file } ;
    push @$banner, 'icon.' . $suffix ;
    if ( $twit->update_profile_banner( $banner ) ) {
        say 'OK' ;

Profile images are square, and are converted to square if your initial image is not square. For both, JPG, GIF or PNG are allowed and are converted upon upload. If you try to upload an animated GIF, they will use the first frame. This can make your profile image less fun, but if you have a feed full of pictures that throb and spin, that can make you feel seasick.

And, since we're hitting profile-related issues, perhaps I should handle your profiles and we can get this behind us.

Changing your username, going from @jacobydave to something else, is doable but not via the API. It isn't a thing you can or should be doing with an API. You can change other details, however. You can change your name ("Dave Jacoby"), description ("Guy blogging about Perl's Net::Twitter library. Thought Leader. Foodie."), website ("") and location ("Dark Side of the Moon").

    use Getopt::Long ;

    my $config ;
        'description=s' => \$config->{ description },
        'location=s'    => \$config->{ location },
        'name=s'        => \$config->{ name },
        'web=s'         => \$config->{ url },
        ) ;

    my $params ; for my $p ( qw{ name url location description } ) {
    $params->{ $p } = $config->{ $p } if $config->{ $p } ; }
    $params->{ include_entities } = 0 ;  $params->{ skip_status } = 1

    if ( $twit->update_profile( $params ) ) {
        say 'OK' ;

I *DO* have fun fun with location, setting it with a number of strings; "My Happy Place" on Saturday, "Never got the hang..." for Thursdays. You can set this to any string, and Twitter does nearly nothing with it. I'd suggest you ignore it or set it to something general or clever, and forget it.

Now that we've covered how you tweet and handle your identity a little, next time we'll start to get into relationships, which are the things that make a social network social.


Net::Twitter Cookbook: How I tweet, plus

Previously, I wrote a post showing the basics on how to send a tweet using Perl and Net::Twitter. I showed the easiest you can do it.

Below is the code that I use most often. It is a command-line tool, where it's used something along the lines of jacobydave text of my tweet. Except, that's not how I type it, thanks to my .alias file. I have twitter aliased to '~/bin/named_twitter jacobydave ' and normally tweet like twitter this is how I tweet.

This isn't to say I never automate tweets; I certainly do. But it is a rare part of what I do with the Twitter API and Net::Twitter. I will dive deeper into issues with tweeting and direct messages, both in a technical and social matter, in a later post.

But I said that you have the consumer key and secret, which identify you as a service, and the access token and secret, which identify you as a user of the service. In the above code sample, I use YAML to store this data, but you could use JSON, Sqlite or anything else to store it. Among other benefits, you can put your code into GitHub, as I did above, without exposing anything.

As I proceed, I will assume that tokens are handled somehow and proceed directly to the cool part.

Again, you get the consumer key by going to and creating a new app.

You log into with your Twitter account, click "Create New App" and fill in the details. When I created my three, you had to specifically choose "can send DMs" specifically, so if you're creating apps to follow along, I do suggest you allow yourself that option.


Log your machines and Check your logs

"Logs" by Aapo Haapanen is licensed under CC BY 2.0

Our VMs were having problems last fall. Their connections to the file system would falter, causing a large number of processes sitting around waiting to write. The symptom we found was that the load averages would then rise incredibly high.

Like four-digits high.

So, I wrote something that would log load average once an hour. It was a convergence of lab need and an excuse to learn Log::Log4Perl. I also used Pushover to tell me when load average was greater than 20, as if I could do anything about it.

Below, Mail is a wrapper around Email::Sender::Simple and Pushover around LWP::UserAgent that handle the formatting and authentication. Neither are necessary for the logging.

#!/usr/bin/env perl

# checks for high load average on a host using uptime and
# reports using Pushover
# logs uptime, high or low, via Log4perl

# Also sends result of ps to email to start to indicate what's
# actually doing something


use feature qw{ say } ;
use strict ;
use warnings ;
use utf8 ;

use Data::Dumper ;
use DateTime ;
use IO::Interactive qw{ interactive } ;
use Log::Log4perl ;

use lib '/home/djacoby/lib' ;
use Mail ;
use Pushover ;

# my $host = $ENV{HOSTNAME} ;
my $host = `/bin/hostname -s ` ;
chomp $host ;

Log::Log4perl::init( '/home/djacoby/.log4perl.conf') ;
my $logger = Log::Log4perl::get_logger( 'varlogrant.uptime' );
my @uptime = check_uptime() ;
$logger->trace( qq{$host : $uptime[0] $uptime[1] $uptime[2]});

if ( $uptime[0] > 20 ) {
    my $ps = process_table() ;
    my $message ;
    $message->{ message } = "High Load Average on $host: " . join ' ' , @uptime ;
    my $out = pushover( $message ) ;
    #send_table( join "\n\n" , ( join ' ' , @uptime ) , $ps ) ;

exit ;

sub check_uptime {
    my $program = '/usr/bin/uptime' ;
    my $uptime = qx{$program} ;
    my @num = map {s/,//;$_ } ( split /\s+/ , $uptime )[-3,-2,-1] ;
    return @num ;

sub process_table {
    my $out = qx{/bin/ps -U gcore -u gcore u } ;
    return $out ;

sub send_table {
    my $body = shift ;
    my $date = DateTime->now()->iso8601() ;
    my $msg;
    $msg->{ identity } = 'example' ;
    $msg->{ subject } = qq{High Load on $host: $date} ;
    $msg->{ to } = '' ;
    $msg->{ body } = $body ;
    $msg->{ attachments } = [] ;
    send_mail($msg) ;

Eventually, those issues worked out. The evidence of file system hinkiness is that, on occasion, we try to save or open a file, it takes a few minutes — I have learned from experience that mkdir does not display atomicity — but we never see the high load averages and catastrophically long file access times of a few months ago.

But the logging never left my crontab.

I started looking at and playing with new things, and I wrote an API that allowed me to curl from several machines once an hour, and I would get Pushover notifications when machines were down.

(You can really thank Phil Sturgeon and his excellent book, Build APIs You Won't Hate, for that. I'm not quite there with my API, though. It'd probably make an interesting blog post, but it's built on pre-MVC technology.)

(And yes, I really like Pushover. In general, I turn off notifications for most apps and only pay attention to what I have Pushover tell me.)

Anyway, I'd get notifications telling me my web server is down, then pull out my phone and find the web server up and responsive. I'm putting that into MySQL, so a query told me that, on some hours, I'd get five heartbeats, some four, and some 3, so I was sure that the issue wasn't with the API.

I log and get Pushover notifications set at the @reboot section of my crontab, and that hadn't warned me recently, so I knew the machines were up, but not responding.

Then I remembered that I never stopped monitoring load averages, and started looking at those logs.

#!/usr/bin/env perl

# reads and parses the uptime log

use feature qw{ say } ;
use strict ; use warnings ; use utf8 ; use DateTime ; my $file = q{/home/jacoby/mnt/rcac/.uptime.log} ; if ( -f $file && open my $fh, '<', $file ) { my $data ; my $last ; my $obj ; while (<$fh>) { chomp ; my ( $date, $time, $server ) = split m{\s+} ; next unless $server ; # say $server ; my ( $year, $month, $day ) = split m{/}, $date ; my ( $hour, $minute, $second ) = split m{:}, $time ; my $latest = DateTime->new( year => $year, month => $month, day => $day, hour => $hour, minute => $minute, second => 0, time_zone => 'UTC', ) ; my $diff = 0 ; # next if $year != 2016 ; # next if $month < 7 ; my $ymd = $latest->ymd ; my $hms = $latest->hms ; next if $ymd !~ /^2016-07/ ; push @{ $obj->{$ymd}{$hms} }, $server ; } my @hosts = sort qw{ genomics genomics-test genomics-apps genomics-db } ; for my $y ( sort keys %$obj ) { my $day = $obj->{$y} ; for my $h ( sort keys %$day ) { my @list = @{ $obj->{$y}{$h} } ; my %list = map { $_ => 1 } @list ; my @down = grep { !$list{$_} } @hosts ; next if !scalar @down ; say join ' ', $y, $h, @down ; } } } __DATA__ two days results: 2016-07-23 01:00:00 genomics-test 2016-07-23 02:00:00 genomics-test 2016-07-23 08:00:00 genomics genomics-db 2016-07-23 13:00:00 genomics-apps 2016-07-23 16:00:00 genomics-apps 2016-07-23 18:00:00 genomics-db 2016-07-23 19:00:00 genomics-test 2016-07-23 21:00:00 genomics-apps 2016-07-24 05:00:00 genomics genomics-apps 2016-07-24 07:00:00 genomics 2016-07-24 10:00:00 genomics-apps 2016-07-24 10:01:00 genomics genomics-db genomics-test 2016-07-24 11:00:00 genomics-db 2016-07-24 13:00:00 genomics-apps 2016-07-24 18:00:00 genomics genomics-apps 2016-07-24 23:00:00 genomics genomics-apps genomics-db

We see above that of the four VMs I monitor, all four fail to log multiple times, and many times, three of four VMs fail to run their crontabs. Since I had something more solid than "Hey, that's funny", I went to my admins about this. Looks like VMs are failing to authenticate with the LDAP server. My admins are taking it up the chain.

So, beyond how I make and parse logs, which might not be the best examples you can find, the message here is that it's hard to identify a problem unless you're tracking it, and even tracking something else might help you identify a problem.


Net::Twitter Cookbook: How to Tweet

The first line between Twitter Use and Twitter Obsession is TweetDeck. That's when the update-on-demand single-thread of the web page gives way to multiple constantly-updated streams of the stream-of-consciousness ramblings of the Internet.

That's the first line.

The second line between Twitter use and Twitter obsession is when you want to automate the work. If you're an R person, that's twitteR. If you work in Python, that's tweepy.

And, if you're like me, and you normally use Perl, we're talking Net::Twitter.

What follows is the simplest possible Net::Twitter program.

#!/usr/bin/env perl
use feature qw{ say } ;
use strict ;
use warnings ;
use Net::Twitter ;

# the consumer key and secret identify you as a service. 
# you register your service at
# and receive the key and secret

# you really don't want to have these written into your script

my $consumer_key    = 'ckckckckckckckckckckck' ;
my $consumer_secret = 'cscscscscscscscscscscscscscscscscscscscs' ;

my $twit = Net::Twitter->new(
    traits          => [qw/API::RESTv1_1/],
    consumer_key    => $consumer_key,
    consumer_secret => $consumer_secret,
    ssl             => 1,
    ) ;

# the access token and secret identify you as a user.
# the registration process takes place below.
# the first time you run this program, you will not be authorized,
# and the program will give you a URL to open in a browser where
# you are already logged into twitter.

# you really don't want to have these written into your script

my $access_token = '1111111111-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' ;
my $access_token_secret = 'zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz' ;

$twit->access_token($access_token) ;
$twit->access_token_secret($access_token_secret) ;

# everything interesting will occur inside this if statement
if ( $twit->authorized ) {
    if ( $twit->update('Hello World!') ) {
        say 'It worked!' ;
    else {
        say 'Fail' ;
else {
    # You have no auth token
    # go to the auth website.
    # they'll ask you if you wanna do this, then give you a PIN
    # input it here and it'll register you.
    # then save your token vals.

    say "Authorize this app at ", $twit->get_authorization_url,
        ' and enter the PIN#' ;
    my $pin = <stdin> ;    # wait for input
    chomp $pin ;
    my ( $access_token, $access_token_secret, $user_id, $screen_name ) =
        $twit->request_access_token( verifier => $pin ) ;

    say 'The following lines need to be copied and pasted above' ;
    say $access_token ;
    say $access_token_secret ;

Again, this is as simple as we can reasonably do, without pulling the keys into a separate file, which I, again, strongly recommend you do. (I personally use YAML as the way I store and restore data such as access tokens and consumer keys. I will demonstrate that in a later post.)


Personal Programming Plans: Instagram2Background

I have this code which uses WebService::Instagram to grab my most recent picture from Instagram and set it as background image on my Ubuntu machines. I put it in my crontab and it just works.

#!/usr/bin/env perl

use feature qw'say' ;
use strict ;
use warnings ;

use Cwd 'abs_path' ;
use Data::Dumper ;
use IO::Interactive qw{ interactive } ;
use Try::Tiny ;
use YAML::XS qw{ LoadFile DumpFile } ;

use lib '/home/jacoby/lib' ;
use Instagram ;

my $config_file = join '/', $ENV{HOME}, '.i2b.yml' ;
my $config = LoadFile($config_file) ;

my $token    = $config->{user}{access_token} ;
my $id       = $config->{user}{id} ;
my $template = '' ;
my $ig       = connect($config) ;

$ig->set_access_token($token) ;
my $url = $template ;
$url =~ s/XX/$id/ ;
my $feed        = $ig->request($url) ;
my $data        = $feed->{data} ;
my @data        = grep { $_->{type} eq 'image' } @$data ;
my $most_recent = $data[0] ;
my $file        = '/home/jacoby/.i2b/i2b.jpg' ;

my $image_id    = $most_recent->{id} ;
my $image_url   = $most_recent->{images}->{standard_resolution}->{url} ;
my $image_text  = $most_recent->{caption}->{text} ;

if ( $config->{done} ne $image_id ) {
    my $image = imagegrab($image_url) ;
    imagewrite( $file, $image ) ;
    say {interactive} $image_id ;
    say {interactive} $image_text ;
    $config->{done} = $image_id ;
imageset($file) ;

DumpFile( $config_file, $config ) ;

exit ;

# takes a URL, returns the raw content
sub imagegrab {
    my $url      = shift ;
    my $agent    = new LWP::UserAgent ;
    my $request  = new HTTP::Request( 'GET', $url ) ;
    my $response = $agent->request($request) ;
    if ( $response->is_success ) {
        return $response->content ;
    return undef ;

# takes an filename and an image, and writes image to filename
sub imagewrite {
    my $file  = shift ;
    my $image = shift ;
    if ( open my $fh, '>', $file ) {
        print $fh $image ;
        close $fh ;
        return 1 ;
    return 0 ;

# takes a filename, sets it as backgroundimages
sub imageset {
    my $img = shift ;

    return unless $img ;
    my $command = join ' ', qw{
        gsettings set
        } ;
    my $command2 = join ' ', qw{
        gsettings set
        } ;

    my $bg = '"file://' . abs_path $img . '"' ;
    qx{$command $bg} ;
    qx{$command2 'zoom'} ;

With IFTTT, it's even easier on Android.

But I don't spend all my time with just Android and Ubuntu. I spend a fair amount of time in Windows. I have a start with that: I can use C# to set an image to the background. This is the first step. I know, at least a little, about scheduling tasks in Windows, which is the last step.

So, the coming steps:

  • Using C# to get the Instagram JSON
  • Using C# to parse the Instagram JSON and pull URL of newest image
  • Using C# to download said image. Certainly related to getting the JSON.
  • Using C# to write image file to Windows equivalent to /tmp (because this will be released to others).
  • Knowing what Windows for /tmp is.
  • Knowing where to hold my Instagram OAuth token data.
Seems like a small number of things to get done, once I sit down to do them. I just need to sit down, do them, and build the tool again.


Quantified Self: For What?

This is my daily step count since I first got a FitBit in 2012, in handy heatmap form.

It shows that 2014 was a pretty active year.

It shows that this year, I've really fallen off the game.

It shows that the main purpose of this process for me, of learning how to grab the data and plot it in different and hopefully useful ways, has succeeded.

It shows, really, that I'm much more about collecting the data than using it to change my life.

And I can only see that as a failure.

I've built other things on top of this. My daily steps pop up in my Twitter feed and bash prompt. If I my battery gets low, I get notified on my tablet. If I go several days without a connection (if the battery dies without me noticing, or if I lose it, as I have done recently), I also get notified. I've made it very convenient to me.

But I failed to make greater amounts of movement an important part of my life. I failed to develop an appreciation for running or walking, at least in comparison to everything else I do.

So, I need to start thinking about how I can change my behavior.

And I probably shouldn't get a replacement FitBit until I have a plan for that.


My Reason to hate Python might be GONE!

Let me show you some code.

#!/usr/bin/env python

mylist = [0,1,2,3]

for n in mylist:
 for  m in mylist:
  print m,n
    print m,n
 print n

Looks pretty normal, right? Just a loop, right? Just a loop within a loop.

Yes it is, but if you look closer, you'll notice two spaces in front of the second print statement.

This is exactly what happened to me the first time I tried Python, about 15 years ago. It was code that showed open machines in ENAD 302, and I ran it on an NT machine I had installed ActiveState Python on. I no longer have that job, thus no longer have that machine and that version of Python. I no longer can find the code, and the computer lab in ENAD 302 is gone.

As is ENAD.

All I have is the memory of having pages of error reports that didn't tell me that the problem was that, halfway into a 200-or-so line Python program. This has lead me to set expandtab or the equivalent for every editor I've used since. Burn me once, shame on you, but burn me twice...

I admit that disliked Python before that, but then it was more "Perl does this already, so why do I have to learn how to do the exact same thing in another dynamic language? What do I gain?" rather than "This language takes as a core feature a means to create undetectable errors."

But no. My hatred of Python stopped coming from a logical place. "Creates undetectable errors" is a logical argument, one that is no longer true, but I got taken to a place of negative emotion, like someone who was bitten by a dog as a child and now is overcome with fear or hate when one comes up now.

(I tried it a few times since, and each time, my experience said "this is an objectively stupid way of doing things", until I bumped into things like Sikuli or my FitBit code where there was either no other way or this was the easiest way to get to "working".)

Then I find someone online who says "tabs are better than spaces". For outputting formatted text, I do agree, but in code, that leads to invisible bugs. So, when someone is wrong online, you correct them.

But then, I wrote the above code, expecting the same errors and received none.

(In this process, I learned some interesting things about Sublime Text, like the way you can set color_scheme and draw_white_space and translate_tabs_to_spaces at a language-specific level, which I did to allow me to see the white space when writing the above code. Sublime Text is neat.)

I've been saying this for a while, but I think this is the last thing I needed to find out before I lived it: the Python that's here today is not the Python that bit me 15 years ago, and I should get over my hangups and "pet the dog". 


Let Them Fight: My Thoughts on #Googacle

It's wonderful to have the Oracle vs Google trial in San Francisco, so I can have the mental image of Google's Bugdroid and Java's Duke laying waste to the city like Godzilla and the MUTOs. Because, ultimately, that's what this is; two kaiju companies fighting it out at tremendous cost, and a man in a black robe taking the Ken Watanabe role saying "Let Them Fight".

Please, someone who can draw, put this up on DeviantArt. I need to see this.

I've said "I'm biased; I like every Google product I've worked with, and hate every Oracle product I've worked with." But this isn't true, because, on the one hand, VirtualBox and MySQL, and on the other, Google Wave.


But I admit my biases, and I do question them. Sun had the philosophy of "Software is free, hardware pays the bills", and licensed in accordance with that. This is why, after Oracle bought Sun, the Sun team in charge of MySQL could fork the GPLd code, leave to form MariaDB after (as I understand it) little more than a name change, and leave Oracle barely maintaining a direct competitor to their core product.

Sun open-sourced Java. Soon after I started CS, it became the language with which programming is taught at the college level. I think this is stupid, because the main benefit of Java is "write once, run anywhere", which is a direct response to the Unix Wars, where companies would make small weird incompatible changes to differentiate their kit from their competitors. Linux won the Unix wars, and now, you make one ELF-formatted executable and package it in DEB or RPM and you basically have 97.3% of server rooms, or more, once you factor in virtual machines.

"Write once, run anywhere" is a dead concept.

Java is still a core language, though, and Google, moving into a new, untested environment with a new, untested operating system, wanted something that programmers would feel comfortable with, so they went with Java.

But Java runtimes, as they existed at that time, were not up to the task, and they chose to re-implement Java, or at least a small subset of Java, so it behaved like Java to the Java devs they wanted to be Android devs.

This is where the question is. Oracle says "You didn't license it". Google says "We did license it; it's called the GPL". Or, at least, that's my understanding of the arguments; a big lesson of this trial is that developers shouldn't talk like lawyers and lawyers shouldn't talk like developers; that way lies to legal troubles.

The GPL is what makes Linux free, and so much else. There's a LAMP stack (Linux, Apache, MySQL, PHP*) that allowed so much of the changes in the last 20 years. Without LAMP, without GPL, there's no Amazon, no Google, no Facebook.

(Let's pretend, for the purposes of this rant, that this is all good, okay?)

This is a battle between kaiju. Google cares about me as little as Godzilla cares about Elizabeth Olsen. But we still want Godzilla to defeat the MUTOs and Mecha-Godzilla and whatever comes up, and, because it uses as tools the things I associate with freedom, I still want Godzilla ... I mean Google, to win.


Death of a Project

Years ago, I learned some R. When I was doing so, I had decided to move from just being a vi man to trying something a little more modern, so I was using ActiveState's KomodoEdit.

A problem was that KomodoEdit had syntax highlighting for many languages, but not R. So, I did some digging, found some code that did what I wanted that someone else abandoned, adapted it some and made it work. Then I made it a GitHub repo called RLangSyntax. I had an itch, I scratched it, I made the backscratcher available to others and I went on with my life, eventually moving to SublimeText.

Until ActiveState released Komodo(IDE|Edit) 9, and I started getting issues in my repo. Those issues were solved by incrementing maxVersion from 8 to 9, and then by remembering and documenting the build process and adding a release download of the resulting xpi. And, because I've gone on to other languages and editors, I left it.

Until ActiveState released KomodoIDE X for 10. I figured that, as with 9, I'd eventually get issues, and so I decided to jump ahead of it. I installed KomodoIDE, tried to install RLangSyntax to see what the errors were. I contacted the support team and asked them a few questions, like "can I just assume Komodo will keep the same engine for syntax highlighting, so I can set maxVersion as 20 or something and let it go?"

Here I wish to say that, for both the 8->9 and 9->X conversions, the Komodo support team was helpful, friendly and intense. A bit more intense than I like, but being helpful and friendly leads me to forgive a lot. I use Sublime Text and vi for my editing needs, but I certainly feel that Komodo could work for me. I like how KomodoIDE takes it's cue from Sublime Text and Atom.

But I make the minimum changes to get it to install and start waiting for the automated checking of their version of Package Manager will take place, and I get this comment:

"Hey guys, I do want to point out that as of Komodo 9.3 we have built-in support for R lang syntax."


I'm happy that happened. I really am. R is really not a thing I touch anymore. I look at ggplot2 and think "Hey, I could make really pretty plots with that", but I don't have reason to make new plots or change old plots right now. And, I'm very happy with my Sublime Text environment. This is an important change for both R and KomodoIDE, and I'm happy.

But, of all the toys I released on GitHub, this is the one that had the most interaction, which implies the most use. One measure of value as a programmer is the number of users your code has, and the change, while of use to those who code in R with KomodoIDE, makes me a programmer of lesser import. But, really, not so much, because beyond packaging and iterating maxVersion, I didn't add much.

So, that repo's documentation now reads: "You probably don't want to use this project. So long, and thanks for the fish."