Cookie Notice

As far as I know, and as far as I remember, nothing in this page does anything with Cookies.

2016/11/19

Graphs are not that Scary!



As with most things I blog about, this starts with Twitter. I follow a lot of people on Twitter, and I use Lists. I want to be able to group people more-or-less on community, because there's the community where they talk about programming, for example, and the community where they talk about music, or the town I live in.

I can begin to break things up myself, but curation is a hard thing, so I wanted to do it automatically. And I spent a long time not knowing what to do. I imagined myself traversing trees in what looks like linked lists reimagined by Cthulhu, and that doesn't sound like much fun at all.

Eventually, I decided to search on "graphs and Perl". Of course, I probably should've done it earlier, but oh well. I found Graph. I had used GD::Graph before, which is a plotting library. (There has to be some index of how overloaded words are.) And once I installed it, I figured it out: As a programmer, all you're dealing with are arrays and hashes. Nothing scary.

Word Ladder


We'll take a problem invented by Lewis Carroll called a "word ladder", where you find your way from one word (for example, "cold") to another ("warm") by changing one letter at a time:

    cold
    coRd
    cArd
    Ward
    warM

Clearly, this can and is often done by hand, but if you're looking to automate it, there are three basic problems: what are the available words, how do you determine when words are one change away, and how do you do this to get the provable shortest path?

First, I went to CERIAS years ago and downloaded word lists. Computer security researchers use them because real words are bad passwords, so, lists of real words can be used to create rainbow tables and the like. My lists are years old, so there may be new words I don't account for, but unlike Lewis Carroll, I can get from APE to MAN in five words, not six.

    ape
    apS
    aAs
    Mas
    maN

Not sure that Lewis Carroll would've accepted AAS, but there you go

There is a term for the number of changes it takes to go from one word to another, and it's called the Levenshtein Distance. I first learned about this from perlbrew, which is how, if you type "perlbrew isntall", it guesses that you meant to type "perlbrew install". It's hardcoded there because perlbrew can't assume you have anything but perl and core modules. I use the function from perlbrew instead of Text::Levenshtein but it is a module worth looking into.

And the final answer is "Put it into a graph and use Dijkstra's Algorithm!"

Perhaps not with the exclamation point.

Showing Code


Here's making a graph of it:

#!/usr/bin/env perl

use feature qw{say} ;
use strict ;
use warnings ;

use Data::Dumper ;
use Graph ;
use List::Util qw{min} ;
use Storable ;

for my $l ( 3 .. 16 ) {
    create_word_graph($l) ;
    }
exit ;

# -------------------------------------------------------------------
# we're creating a word graph of all words that are of length $length
# where the nodes are all words and the edges are unweighted, because
# they're all weighted 1. No connection between "foo" and "bar" because 
# the distance is "3".

sub create_word_graph {
    my $length = shift ;
    my %dict = get_words($length) ;
    my @dict = sort keys %dict ; # sorting probably is unnecessary
    my $g    = Graph->new() ;

    # compare each word to each word. If the distance is 1, put it
    # into the graph. This implementation is O(N**2) but probably
    # could be redone as O(NlogN), but I didn't care to.

    for my $i ( @dict ) {
        for my $j ( @dict ) {
            my $dist = editdist( $i, $j ) ;
            if ( $dist == 1 ) {
                $g->add_edge( $i, $j ) ;
                }
            }
        }

    # Because I'm using Storable to store the Graph object for use
    # later, I only use this once. But, I found there's an endian
    # issue if you try to open Linux-generated Storable files in
    # Strawberry Perl.

    store $g , "/home/jacoby/.word_$length.store" ;
    }

# -------------------------------------------------------------------
# this is where we get the words and only get words of the correct
# length. I have a number of dictionary files, and I put them in
# a hash to de-duplicate them.

sub get_words {
    my $length = shift ;
    my %output ;
    for my $d ( glob( '/home/jacoby/bin/Toys/Dict/*' ) ) {
        if ( open my $fh, '<', $d ) {
            for my $l ( <$fh> ) {
                chomp $l ;
                $l =~ s/\s//g ;
                next if length $l != $length ;
                next if $l =~ /\W/ ;
                next if $l =~ /\d/ ;
                $output{ uc $l }++ ;
                }
            }
        }
    return %output ;
    }

# -------------------------------------------------------------------
# straight copy of Wikipedia's "Levenshtein Distance", straight taken
# from perlbrew. If I didn't have this, I'd probably use 
# Text::Levenshtein.

sub editdist {
    my ( $f, $g ) = @_ ;
    my @a = split //, $f ;
    my @b = split //, $g ;

    # There is an extra row and column in the matrix. This is the
    # distance from the empty string to a substring of the target.
    my @d ;
    $d[ $_ ][ 0 ] = $_ for ( 0 .. @a ) ;
    $d[ 0 ][ $_ ] = $_ for ( 0 .. @b ) ;

    for my $i ( 1 .. @a ) {
        for my $j ( 1 .. @b ) {
            $d[ $i ][ $j ] = (
                  $a[ $i - 1 ] eq $b[ $j - 1 ]
                ? $d[ $i - 1 ][ $j - 1 ]
                : 1 + min( $d[ $i - 1 ][ $j ], $d[ $i ][ $j - 1 ], $d[ $i - 1 ][ $j - 1 ] )
                ) ;
            }
        }

    return $d[ @a ][ @b ] ;
    }

Following are what my wordlists can do. Something tells me that, when we get to 16-letter words, it's more a bunch of disconnected nodes than a graph.

1718 3-letter words
6404 4-letter words
13409 5-letter words
20490 6-letter words
24483 7-letter words
24295 8-letter words
19594 9-letter words
13781 10-letter words
8792 11-letter words
5622 12-letter words
3349 13-letter words
1851 14-letter words
999 15-letter words
514 16-letter words

My solver isn't perfect, and the first thing I'd want to add is ensuring that both the starting and ending words are actually in the word list. Without that, your code goes on forever.

So, I won't show off the whole program below, but it does use Storable, Graph and feature qw{say}.

dijkstra( $graph , 'foo' , 'bar' ) ;

# -------------------------------------------------------------------
# context-specific perl implementation of Dijkstra's Algorithm for
# shortest-path

sub dijkstra {
    my ( $graph, $source, $target, ) = @_ ;

    # the graph pre-exists and is passed in 
    # $source is 'foo', the word we're starting from
    # $target is 'bar', the word we're trying to get to

    my @q ; # will be the list of all words
    my %dist ; # distance from source. $dist{$source} will be zero 
    my %prev ; # this holds our work being every edge of the tree
               # we're pulling from the graph. 

    # we set the the distance for every node to basically infinite, then 
    # for the starting point to zero

    for my $v ( $graph->unique_vertices ) {
        $dist{$v} = 1_000_000_000 ;    # per Wikipeia, infinity
        push @q, $v ;
        }
    $dist{$source} = 0 ;

LOOP: while (@q) {

        # resort, putting words with short distances first
        # first pass being $source , LONG WAY AWAY

        @q = sort { $dist{$a} <=> $dist{$b} } @q ;
        my $u = shift @q ;

        # say STDERR join "\t", $u, $dist{$u} ;

        # here, we end the first time we see the target.
        # we COULD get a list of every path that's the shortest length,
        # but that's not what we're doing here

        last LOOP if $u eq $target ;

        # this is a complex and unreadable way of ensuring that
        # we're only getting edges that contain $u, which is the 
        # word we're working on right now

        for my $e (
            grep {
                my @a = @$_ ;
                grep {/^${u}$/} @a
            } $graph->unique_edges
            ) {

            # $v is the word on the other end of the edge
            # $w is the distance, which is 1 because of the problem
            # $alt is the new distance between $source and $v, 
            # replacing the absurdly high number set before

            my ($v) = grep { $_ ne $u } @$e ;
            my $w   = 1 ;
            my $alt = $dist{$u} + $w ;
            if ( $alt < $dist{$v} ) {
                $dist{$v} = $alt ;
                $prev{$v} = $u ;
                }
            }
        }

    my @nodes = $graph->unique_vertices ;
    my @edges = $graph->unique_edges ;
    return {
        distances => \%dist,
        previous  => \%prev,
        nodes     => \@nodes,
        edges     => \@edges,
        } ;
    }

I return lots of stuff, but the part really necessary is %prev, because that, $source and $target are everything you need. Assuming we're trying to go from FOR to FAR, a number of words will satisfy $prev{FOR}, but it's the one we're wanting. In the expanded case of FOO to BAR, $prev->{BAR} = 'FAR', $prev->{FAR} is 'FOR', and $prev->{FOR} is 'FOO'.

And nothing in there is complex. It's all really hashes or arrays or values. Nothing a programmer should have any problem with.

CPAN has a number of other modules of use: Graph::Dijkstra has that algorithm already written, and Graph::D3 allows you to create a graph in such a way that you can use it in D3.js. Plus, there are a number of modules in Algorithm::* that do good and useful things. So go in, start playing with it. It's deep, there are weeds, but it isn't scary.

2016/11/08

Modern Perl but not Modern::Perl

This started while driving to work. If I get mail from coworkers, I get Pushover notifications, and halfway from home, I got a bunch of notifications.

We don't know the cause of the issue, but I do know the result

We have env set on our web server set so that Perl is /our/own/private/bin/perl and not /usr/bin/perl, because this runs in a highly-networked and highly-clustered environment, mostly RedHat 6.8 and with 5.10.1 as system perl, if we want to have a consistent version and consistent modules, we need our own. This allows us to have #!/usr/bin/env perl as our shbang.

And this morning, for reasons I don't know, it stopped working. Whatever perl was being called, it wasn't /our/own/private/bin/perl. And this broke things.

One of the things that broke is this: Whatever perl is /usr/bin/env perl, it doesn't have Modern::Perl.

I'm for Modern Perl. My personal take is that chromatic and Modern Perl kept Perl alive in with Perl 5 while Larry Wall and the other language developers worked on Perl 6. Thus, I am grateful that it exists. But, while I was playing with it, I found a problem: Modern::Perl is not in Core, so you cannot rely on it being there, so a script might be running with a version greater than 5.8.8 and be able to give you everything you need, which to me is normally use strict, use warnings and use feature qw{say}, but if you're asking for Modern::Perl for it, it fails, and because you don't know which Modern thing you want, you don't know how to fix it.

This is part of my persistent hatred of magic. If it works and you don't understand how, you can't fix it if it stops working. I got to the heavy magic parts of Ruby and Rails and that, as well as "Life Happens", are why I stopped playing with it. And, I think, this is a contributing factor with this morning's brokenness.

2016/10/30

Net::Twitter Cookbook: Favorites and Followers

Favorites.

Also known as "Likes", they're an indication in Twitter that you approve of a status update. Most of the time, they're paired with retweets as signs by the audience to the author that the post is agreeable. Like digital applause.

This is all well and good, but it could be used for so much more, if you had more access and control over them.

So I did.

The first step is to collect them. There's an API to get them, and collecting them in bulk is easy. A problem is avoiding grabbing the same tweet twice.

# as before, the "boilerplate" can be found elsewhere in my blog.
use IO::Interactive qw{ interactive } ;
my $config ;
$config->{start} = 0 ;
$config->{end}   = 200 ;

for ( my $page = $config->{start}; $page <= $config->{end}; ++$page ) {
        say {interactive} qq{\tPAGE $page} ;
        my $r = $twit->favorites( { 
            page => $page ,
            count => 200 ,
            } ) ;
        last unless @$r ;

        # push @favs , @$r ;
        for my $fav (@$r) {
            if ( $config->{verbose} ) { 
                 say {interactive} handle_date( $fav->{created_at} ) 
                 }
            store_tweet( $config->{user}, $fav ) ;
            }
        sleep 60 * 3 ;    # five minutes
        }


Once I had a list of my tweets, one of the first things I did was use them to do "Follow Friday". If you know who you favorited over the last week, it's an easy thing to get a list of the usernames, count them and add them until you have reached the end of the list or 140 characters.

Then, as I started playing with APIs and wanted to write my own ones, I created an API to find ones containing a substring, like json or pizza or sleep. This way, I could begin to use a "favorite" as a bookmark.

(I won't show demo code, because I'm not happy or proud of the the code, which lives in a trailing-edge environment, and because it's more database-related than Twitter-focused.)

As an aside, I do not follow back. There are people who follow me who I have no interest in reading, and there are people I follow who care nothing about my output. In general, I treat Twitter as something between a large IRC client and an RSS reader, and I never expected nor wanted RSS feeds to track me.

But this can be a thing worth tracking, which you can do, without any storage, with the help of a list. Start with getting a list of those following you, those you follow, and the list of accounts (I almost wrote "people", but that isn't guaranteed) in your follows-me list. If they follow you and aren't in your list, add them. If they're in the list and you have started following them, take them out. If they're on the list and aren't following you, drop them. As long as you're not big-time (Twitter limits lists to 500 accounts), that should be enough to keep a Twitter list of accounts you're not following.

use List::Compare ;

    my $list = 'id num of your Twitter list';

    my $followers = $twit->followers_ids() ;
    my @followers = @{ $followers->{ids} } ;

    my $friends = $twit->friends_ids() ;
    my @friends = @{ $friends->{ids} } ;

    my @list = get_list_members( $twit, $list ) ;
    my %list = map { $_ => 1 } @list ;


    my $lc1 = List::Compare->new( \@friends,   \@followers ) ;
    my $lc2 = List::Compare->new( \@friends,   \@list ) ;
    my $lc3 = List::Compare->new( \@followers, \@list ) ;

    # if follows me and I don't follow, put in the list
    say {interactive} 'FOLLOWING ME' ;
    for my $id ( $lc1->get_complement ) {
        next if $list{$id} ;
        add_to_list( $twit, $list, $id ) ;
        }

    # if I follow, take off the list
    say {interactive} 'I FOLLOW' ;
    for my $id ( $lc2->get_intersection ) {
        drop_from_list( $twit, $list, $id ) ;
        }

    # if no longer following me, take off the list
    say {interactive} 'NOT FOLLOWING' ;
    for my $id ( $lc3->get_complement ) {
        drop_from_list( $twit, $list, $id ) ;
        }

#========= ========= ========= ========= ========= ========= =========
sub add_to_list {
    my ( $twit, $list, $id ) = @_ ;
    say STDERR qq{ADDING $id} ;
    eval { $twit->add_list_member(
            { list_id => $list, user_id => $id, } ) ; } ;
    if ($@) {
        warn $@->error ;
        }
    }

#========= ========= ========= ========= ========= ========= =========
sub drop_from_list {
    my ( $twit, $list, $id ) = @_ ;
    say STDERR qq{REMOVING $id} ;
    eval {
        $twit->delete_list_member( { list_id => $list, user_id => $id, } ) ;
        } ;
    if ($@) {
        warn $@->error ;
        }
    }



But are there any you should follow? Are there any posts in the the feed that you might "like"? What do you "like" anyway?

There's a way for us to get an idea of what you would like, which is your past likes. First, we must get, for comparison, a collection of what your Twitter feed is like normally. (I grab 200 posts an hour and store them. This looks and works exactly like my "grab favorites code", except I don't loop it.

    my $timeline = $twit->home_timeline( { count => 200 } ) ;

    for my $tweet (@$timeline) {
        my $id          = $tweet->{id} ;                          # twitter_id
        my $text        = $tweet->{text} ;                        # text
        my $created     = handle_date( $tweet->{created_at} ) ;   # created
        my $screen_name = $tweet->{user}->{screen_name} ;         # user id
        if ( $config->{verbose} ) {
            say {interactive} handle_date( $tweet->{created_at} );
            say {interactive} $text ;
            say {interactive} $created ;
            say {interactive} $screen_name ;
            say {interactive} '' ;
            }
        store_tweet( $config->{user}, $tweet ) ;
        # exit ;
        }


So, we have a body of tweets that you like, and a body of tweets that are a representative sample of what Twitter looks like to you. On to Algorithm::NaiveBayes!

use Algorithm::NaiveBayes ;
use IO::Interactive qw{ interactive } ;
use String::Tokenizer ;

my $list   = 'ID of your list';
my $nb     = train() ;
my @top    = read_list( $config, $nb , $list ) ;

say join ' ' , (scalar @top ), 'tweets' ;

for my $tweet (
    sort { $a->{analysis}->{fave} <=> $b->{analysis}->{fave} } @top ) {
    my $fav = int $tweet->{analysis}->{fave} * 100 ;
    say $tweet->{text} ;
    say $tweet->{user}->{screen_name} ;
    say $tweet->{gen_url} ;
    say $fav ;
    say '' ;
    }

exit ;

#========= ========= ========= ========= ========= ========= =========
# gets the first page of your Twitter timeline.
#
# avoids checking a tweet if it's 1) from you (you like yourself;
#   we get it) and 2) if it doesn't give enough tokens to make a
#   prediction.
sub read_list {
    my $config = shift ;
    my $nb     = shift ;
    my $list   = shift ;

    ...

    my @favorites ;
    my $timeline =     $twit->list_statuses({list_id => $list});

    for my $tweet (@$timeline) {
        my $id          = $tweet->{id} ;                          # twitter_id
        my $text        = $tweet->{text} ;                        # text
        my $created     = handle_date( $tweet->{created_at} ) ;   # created
        my $screen_name = $tweet->{user}->{screen_name} ;         # user id
        my $check       = toke( lc $text ) ;
        next if lc $screen_name eq lc $config->{user} ;
        next if !scalar keys %{ $check->{attributes} } ;
        my $r = $nb->predict( attributes => $check->{attributes} ) ;
        my $fav = int $r->{fave} * 100 ;
        next if $fav < $config->{limit} ;
        my $url = join '/', 'http:', '', 'twitter.com', $screen_name,
            'status', $id ;
        $tweet->{analysis} = $r ;
        $tweet->{gen_url}  = $url ;
        push @favorites, $tweet ;
        }

    return @favorites ;
    }

#========= ========= ========= ========= ========= ========= =========
sub train {

    my $nb = Algorithm::NaiveBayes->new( purge => 1 ) ;
    my $path = '/home/jacoby/.nb_twitter' ;

    # adapted on suggestion from Ken to

    # gets all tweets in your baseline table
    my $baseline = get_all() ;
    for my $entry (@$baseline) {
        my ( $tweet, $month, $year ) = (@$entry) ;
        my $label = join '', $year, ( sprintf '%02d', $month ) ;
        my $ham = toke(lc $tweet) ;
        next unless scalar keys %$ham ;
        $nb->add_instance(
            attributes => $ham->{attributes},
            label      => ['base'],
            ) ;
        }

    gets all tweets in your favorites table
    my $favorites = get_favorites() ;
    for my $entry (@$favorites) {
        my ( $tweet, $month, $year ) = (@$entry) ;
        my $label = join '', $year, ( sprintf '%02d', $month ) ;
        my $ham = toke(lc $tweet) ;
        next unless scalar keys %$ham ;
        $nb->add_instance(
            attributes => $ham->{attributes},
            label      => ['fave'],
            ) ;
        }

    $nb->train() ;
    return $nb ;
    }

#========= ========= ========= ========= ========= ========= =========
# tokenizes a tweet by breaking it into characters, removing URLs
# and short words
sub toke {
    my $tweet = shift ;
    my $ham ;
    my $tokenizer = String::Tokenizer->new() ;
    $tweet =~ s{https?://\S+}{}g ;
    $tokenizer->tokenize($tweet) ;

    for my $t ( $tokenizer->getTokens() ) {
        $t =~ s{\W}{}g ;
        next if length $t < 4 ;
        next if $t !~ /\D/ ;
        my @x = $tweet =~ m{($t)}gmix ;
        $ham->{attributes}{$t} = scalar @x ;
        }
    return $ham ;
    }


Honestly, String::Tokenizer is probably a bit too overkill for this, but I'll go with it for now. It might be better to get a list of the 100 or 500 most common words and exclude them from the tweets, instead of limiting by size. As is, strings like ada and sql would be excluded. But it's good for now.

We get a list of tweets including a number between 0 and 1, representing the likelihood, by Bayes, that I would like the tweet. In the end, it's turned into an integer between 0 and 100. You can also run this against your normal timeline to pull out tweets you would've liked but missed. I often do this

I run the follows_me version on occasion. So far, it is clear to me that the people I don't follow, I don't follow for a reason, and that remains valid.

If you use this and find value in it, please tell me below. Thanks and good coding.

2016/10/28

Using the Symbol Table: "Help"?

I've been looking at command-line code for both fun and work. I know I can have one module handle just the interface, and have the module where the functionality happens pass the functionality along.

#!/usr/bin/env perl

use feature qw'say state' ;
use strict ;
use warnings ;
use utf8 ;

my $w = Wit->new( @ARGV ) ;
$w->run() ;

package Wit ;
use lib '/home/jacoby/lib' ;
use base 'Witter' ;
use Witter::Twitter ;

1;

package Witter ;

# highly adapted from perlbrew.

use feature qw{ say } ;
use strict ;
use warnings ;

sub new {
    my ( $class, @argv ) = @_ ;
    my $self ;
    $self->{foo}  = 'bar' ;
    $self->{args} = [] ;
    if (@argv) {
        $self->{args} = \@argv ;
        }
    return bless $self, $class ;
    }

sub run {
    my ($self) = @_ ;
    $self->run_command( $self->{args} ) ;
    }

sub run_command {
    my ( $self, $args ) = @_ ;

    if (   scalar @$args == 0
        || lc $args->[0] eq 'help'
        || $self->{help} ) {
        $self->help(@$args) ;
        exit ;
        }

    if ( lc $args->[0] eq 'commands' ) {
        say join "\n\t", '', $self->commands() ;
        exit ;
        }

    my $command = $args->[0] ;

    my $s = $self->can("twitter_$command") ;
    unless ($s) {
        $command =~ y/-/_/ ;
        $s = $self->can("twitter_$command") ;
        }

    unless ($s) {

        my @commands = $self->find_similar_commands($command) ;
        if ( @commands > 1 ) {
            @commands = map { '    ' . $_ } @commands ;
            die
                "Unknown command: `$command`. Did you mean one of the following?\n"
                . join( "\n", @commands )
                . "\n" ;
            }
        elsif ( @commands == 1 ) {
            die "Unknown command: `$command`. Did you mean `$commands[0]`?\n"
                ;
            }
        else {
            die "Unknown command: `$command`. Typo?\n" ;
            }
        }

    unless ( 'CODE' eq ref $s ) { say 'Not a valid command' ; exit ; }

    $self->$s(@$args) ;
    }

sub help {
    my ($self,$me,@args) = @_ ;
    say 'HELP!' ;
    say join "\t", @args;
    }

sub commands {
    my ($self) = @_ ;
    my @commands ;

    my $package = ref $self ? ref $self : $self ;
    my $symtable = do {
        no strict 'refs' ;
        \%{ $package . '::' } ;
        } ;

    foreach my $sym ( sort keys %$symtable ) {
        if ( $sym =~ /^twitter_/ ) {
            my $glob = $symtable->{$sym} ;
            if ( defined *$glob{CODE} ) {
                $sym =~ s/^twitter_// ;
                $sym =~ s/_/-/g ;
                push @commands, $sym ;
                }
            }
        }

    return @commands ;
    }

# Some functions removed for sake of brevity


package Witter::Twitter ;

use strict ;
use feature qw{ say state } ;
use warnings FATAL => 'all' ;

use Exporter qw{import} ;
use Net::Twitter ;
use JSON::XS ;


our $VERSION = 0.1 ;

our @EXPORT ;
for my $entry ( keys %Witter::Twitter:: ) {
    next if $entry !~ /^twitter_/mxs ;
    push @EXPORT, $entry ;
    }

sub twitter_foo {
    my ( $self, @args ) = @_ ;
    say "foo" ;
    say join '|', @args ;
    }
1 ;


And the above works when called as below.
jacoby@oz 13:49 60°F 51.24,-112.49 ~ 
$ ./witter 
HELP!


jacoby@oz 13:52 60°F 51.25,-94.51 ~ 
$ ./witter help 
HELP!


jacoby@oz 13:53 60°F 50.59,-88.64 ~ 
$ ./witter commands

 foo

jacoby@oz 13:53 60°F 50.59,-88.64 ~ 
$ ./witter help foo
HELP!
foo

jacoby@oz 13:53 60°F 50.59,-88.64 ~ 
$ ./witter foo
foo
foo

jacoby@oz 13:53 60°F 50.59,-88.64 ~ 
$ ./witter moo
Unknown command: `moo`. Did you mean `foo`?

In the above example, I'm just doing the one add-on module, Witter::Twitter and one function, Witter::Twitter::foo, but clearly, I would want it open-ended, so that if someone wanted to add Witter::Facebook, all the information about the Facebook functions would be in that module.

Then, of course, I would have to use another prefix than twitter_, but we'll leave that, and ensuring that modules don't step on each others' names, to another day.

The part that concerns me is help. Especially help foo. It should be part of the the module it's in; If Witter::Twitter is the module with foo(), only it should be expected to know about foo().

But how to communicate it? I'm flashing on our %docs and $docs{foo}= 'This is foo, silly' but the point of the whole thing is to allow the addition of modules that the initial module doesn't know about, and it would require knowing to look for %Witter::Twitter::docs.

I suppose adding a docs_ function that looks like this.
sub docs_foo {
    return q{
    This explains the use of the 'foo' command 

...
};
END
}


I'm diving into this in part because I have code that uses basically this code, and I need to add functionality to it, and while I'm in there, I might as well make user documentation better. Or even possible.

I'm also parallel-inspired by looking at a Perl project built on and using old Perl ("require 5.005") and recent blog posts about Linus Torvalds and "Good Taste". There's something tasteful about being able to add use Your::Module and nothing else to code, but if the best I can do is knowledge that there's a foo command, with no sense of what it does, that seems like the kind of thing that Linus would rightfully curse me for.

Is there a better way of doing things like this? I recall there being interesting things in Net::Tumblr that would require me to step up and learn Moose or the like. This is yet another important step toward me becoming a better and more tasteful programmer, but not germane to today's ranting.

2016/10/15

Gender and Wearables?


First I heard about modern wearables was at Indiana Linuxfest, where the speaker went on about the coming wave of microcontrollers and posited a belt buckle that was aware of when it was pointing toward magnetic north and activate a haptic feedback device, so that, for the wearer, eventually true sense of direction would eventually become another sense.

I'm sure I could find a sensor that could tell me that, that could ship from China and cost me less than a gumball. I'm sure I could easily get a buzzer, that I could control it all with a small Arduino board like a Trinket or Flora or Nano or the like. and Jimmy DiResta has already taught me how to make a belt buckle. And I actually kinda want one. But I haven't made it.

In part it's because my available resources to push toward projects like this are small at the moment. In part, though, it's because, once I put on my watch, my tablet, my glasses and the Leatherman on my belt, I'm accessorized out.

I think most American men are about the same. 

2016/09/26

Perl on Ubuntu on Windows: A Solution

I suppose I should've just been more patient.

After a while of waiting and watching and trying to think of a better bug report, one that might get a response, and failing, I got a response.

You can't install the module because File::Find can not recurse directories 
on the file system that is in use by Ubuntu on Windows.

The solution is to edit Config.pm:

sudo vi /usr/lib/perl/5.18.2/Config.pm
Set dont_use_nlink to 'define':

dont_use_nlink => 'define',
Now it's possible to install all modules you want!
(this is a dupliceate of #186)

I haven't made this change yet. I am loathe to change core modules like that, although I have done so in the past. Because I have done so in the past, and it ended up stupid. But I will likely do it.

I was mentally throwing it to the kernel, but was wrong, which is interesting. Makes me think that, rather than running Ubuntu on Windows, doing something with Vagrant might be the better plan.


2016/09/09

Net::Twitter Cookbook: Tweetstorm!

I had planned to get into following, lists and blocking, but a wild hair came over me and I decided I wanted to make a tweetstorm app.

What's a tweetstorm?


And this is the inner loop of it.
    my @tweetstorm_text ; # previously generated. URL shortening and numbering are assumed
    my $screen_name ; #your screen name as text
    my $status_id ; #created but not set, so first tweet won't be a reply-to

    for my $status ( @tweetstorm_text } ) {

        my $tweet ; # the object containing the twee
        $tweet->{status} = $status ;
        $tweet->{in_reply_to_status_id} = $status_id if $status_id ;

        if ( $twit->update($tweet) ) { #if Tweet is successful

            # we get your profile, which contains your most recent tweet
            # $status_id gets set to the ID of that tweet, which then
            # gets used for the next tweet.

            my $profiles
                = $twit->lookup_users( { screen_name => $config->{user} } ) ;
            my $profile = shift @$profiles ;
            my $prev    = $profile->{status} ;
            $status_id = $profile->{status}->{id} ;
            }
        }

So, that's it. I've thrown the whole thing on GitHub if you want to play with it yourself.

2016/09/04

Thoughts on ML Techniques to better handle Twitter

Thinking things through afk and thus not as polished as my normal posts.

Been thinking about grouping my friends (those I follow) strictly by relationship mapping. In part, I haven't done this because I can't read the math in the paper describing it, and in part because there are points where I serve as connecting node between two clusters and they have started interacting independently. I know a joy of Twitter is that it allows people to connect by interest and personality, not geography, but when a programmer in Washington and an activist in Indiana talk food and cats with each other, it makes my "programmer" cluster and my "Indiana" cluster less distinct.

So, what to do?

Topic Modelling.

I know about this via the Talking Machines podcast, and, without mathematic notation, if you take a body of text as a collection of words, the words it contains will vary by subject. If the topic is "politics", the text might contain "vote" and "veto" and "election" and "impeach". If the topic is "football", we'd see "lateral", "quarterback", "tackle" and "touchdown".

Rather than separating Twitter followers into groups simply by interactions, I could start with both certain lists I have curated (and yes, there are both "local tweeters" and "programmer" lists) and hashtags (because if you hashtag your tweet #perl, you likely are talking about Perl) to start to identify what words are more likely to come up when discussing certain subjects, then start adding then to those lists automatically.

If I can work this out 140 characters at a time.

2016/09/01

Perl on Ubuntu on Windows: Finding The Right Test Case



I'm still hung on getting CPAN working for Ubuntu on Windows. In the comments, Chas. Owens gave me great advice for proceeding with this problem:
Write a minimal test case, run the test case with strace on Ubuntu on Windows and Ubuntu on Linux. The outputs are supposed be identical, if they aren't (and we expect them not to be because of the error), then you have a good shot at identifying the misbehaving syscall (which is the only difference between Ubuntu on Windows and Ubuntu on Linux).

Once you see the difference, look into what the syscall does and try to determine which system is implementing it correctly (probably Linux). If it is Windows post to the github tracker with the test case and the identified syscall. If it is Linux, then report it to Ubuntu.
My first thought was to do this within cpanm, but I thought sudo su then strace -o platform -ff strace cpanm YAML::XS was a bit much. In part because when platform was Linux, it generated one file and hundreds on Windows.

Then it struck me that instead, I should just focus on the tests themselves. I cloned Ingy döt Net's YAML repo and tried to run prove test/ in both places. Went fine with Linux, failed with Windows. Butrealized after a second, it succeeded while using my personal perl, not system perl. /usr/bin/prove test/ failed on Ubuntu. apt-get install libtest-base-perl on both systems helped a lot, but now it wants Test::More, (I know because I searched for what modules the tests are asking for.)

For all I know, there's a package that provides Test::More, but it isn't libtest-more-perl, and digging further into that seems like a waste.

So I'm thinking it through again, looking at a failing test in YAML::XS:

use t::TestYAMLTests tests => 2;
use utf8;

is Dump("1234567890\n1234567890\n1234567890\n"), "--- |
  1234567890
  1234567890
  1234567890
", 'Literal Scalar';

is Dump("A\nB\nC\n"), q{--- "A\nB\nC\n"} . "\n", 'Double Quoted Scalar';


By "failing test" I am saying it works in natural Linux but not Ubuntu on Windows. And it's striking me: I need to find Dump. Where is Dump? In the C. It is an XS module, is it not? So, it's striking me that the solution is in C.

Which means I have to write C.

More later.

I think there's only been one time when I coded C on the clock, and only one time when my C knowledge was required on the clock.

The former was at a former workplace, where I wrote and compiled some C to compare UltraEdit with another editor, so I could help decide which we were going to buy a site license for. As I can only remember UltraEdit, I can say that's the one I liked better. The code itself was scarcely better than Hello World.

The latter was at another former workplace, where there was a tool that allowed mechanical engineers to drag together components like traces, and then first turned those traces into C code and compiled them. There was an issue where it wouldn't work, and I found the error logs and worked back.

I'm looking at perl_libyaml.c. I'm looking at perl_libyaml.h. I don't have nearly enough C skills to start here.
/*
 * This is the main Dump function.
 * Take zero or more Perl objects and return a YAML stream (as a string)
 */
void
Dump(SV *dummy, ...)
{
    dXSARGS;
    perl_yaml_dumper_t dumper;
    yaml_event_t event_stream_start;
    yaml_event_t event_stream_end;
    int i;
    SV *yaml = sv_2mortal(newSVpvn("", 0));
    sp = mark;

    set_dumper_options(&dumper);

    /* Set up the emitter object and begin emitting */
    yaml_emitter_initialize(&dumper.emitter);
    yaml_emitter_set_unicode(&dumper.emitter, 1);
    yaml_emitter_set_width(&dumper.emitter, 2);
    yaml_emitter_set_output(
        &dumper.emitter,
        &append_output,
        (void *) yaml
    );
    yaml_stream_start_event_initialize(
        &event_stream_start,
        YAML_UTF8_ENCODING
    );
    yaml_emitter_emit(&dumper.emitter, &event_stream_start);

    dumper.anchors = newHV();
    dumper.shadows = newHV();

    sv_2mortal((SV *)dumper.anchors);
    sv_2mortal((SV *)dumper.shadows);

    for (i = 0; i < items; i++) {
        dumper.anchor = 0;

        dump_prewalk(&dumper, ST(i));
        dump_document(&dumper, ST(i));

        hv_clear(dumper.anchors);
        hv_clear(dumper.shadows);
    }

    /* End emitting and destroy the emitter object */
    yaml_stream_end_event_initialize(&event_stream_end);
    yaml_emitter_emit(&dumper.emitter, &event_stream_end);
    yaml_emitter_delete(&dumper.emitter);

    /* Put the YAML stream scalar on the XS output stack */
    if (yaml) {
        SvUTF8_off(yaml);
        XPUSHs(yaml);
    }
    PUTBACK;
}

Hrm. Maybe back to the Perl.

Looking at YAML.pm, YAML::Dumper.pm and YAML::Dumper::Base.pm make it seem like you could make a master class on how modules go together. Previously, when I've dived in to figure out issues, yeah, there's lots of confusing parts but I could find my way to the broken thing. perl_libyaml.c is looking much more approachable now.

2016/08/31

Overkill III: Permutations of Overkill (Perl6)


Long-time readers remember my previous code with the Magic Box. In short, given the numbers [ 3, 4, 5, 6, 7, 8, 9, 10, 11 ] , arrange them in a 3x3 square such that every row, column and diagonal adds up to 21.

My first pass, on a very old computer, had the C version taking a second and a half, the Perl5 version taking a minute and a half, and the Perl6 version taking an hour and a half. But I no longer use that computer, and am now using a much newer one, so I took the opportunity to try again.

I have Nvidia cards in my box, so I could eventually write Overkill IV about that, but I don't have the knowledge or time right now. I did install rakudobrew and the newest version of Perl6, but otherwise, everything was unchanged.

C dropped to 0.6 seconds. Perl5 now takes 16 seconds. Perl6 now takes 10 minutes. So, the newer computer brought substantial speed gains for each, but the most dramatic is with Perl6, which tells us that the Perl6 developers have not only worked on correctness and completeness, but speed. Good for them! Good for us!

But wait! There's more!

@loltimo saw the post and suggested that I use permutations.

"What are permutations?", you ask? That's what basically happens in recursive_magic_box(), where all the possible variations are created. Given A, B and C, there are six possible orderings: ABC ACB BAC BCA CAB CBA. If I was given all possible orderings of an array, rather than (mis)handling it on my own and having to check that I got it right, would that make things faster?

As it turns out, yes. 47 seconds, rather than 10 minutes. More than 10x faster. But still, not as fast as Perl5. I now want to implement permutations for Perl5 and see if it makes that one faster.

And, it makes me feel much better about Perl6.

Code Samples Below