Cookie Notice

As far as I know, and as far as I remember, nothing in this page does anything with Cookies.

2015/07/29

Justifying My Existence: Indent Style

I just got dinked for my indent style on StackOverflow.

I wrote an answer to someone's question, even as I figured there are much better ways to handle his greater issue than store everything in text files like that.

Nearly immediately, I see this: For uncuddled elses the practice is to let the closing block brace } be vertically aligned with the following elsif.

I remember when I started coding oh so many years ago. I remember looking at GNU style and not liking it.

if ($boolean)
{
    ...
}

"You've disconnected the beginning brace from the conditional", I thought. "I can't code like that."

The other primary style people talk about is K&R.

if ($boolean) {
    ...
}

"Better", I thought. "The beginning of the block is connected to the conditional, so that's good. But the end. The brace at the end of the block won't tell you it's connected to the block at all. Nope."

It's about the readability. The part that's in the main part of the code is the if statement. The block should be separate from the sounding code, and this style (I'm told it's Ratliff style) is what I've been using since.

if ($boolean) {
    ...
    }

My first degree was in journalism, and part of how I proved myself in my first computing jobs is making the large text blocks of the early web attractive and readable. At least, as far as early web browsers let you. And, while I am a vocal Python hater, and a hater of significant white space in programming languages in general, by-the-books Python is close to how I like to format code. (Just be sure to tell your editor that \t is the devil's work and should be shunned.)

Below is my .perltidyrc. I believe I first started using that tool soon after I read Perl Best Practices by Damian Conway. Ironically, perhaps, I moved to the long form because I found the terse form found in PBP to be unreadable and uneditable, anyway.


If you have a problem with my code alignment, perltidy exists. Use it.

I'd rather offend you with substance than with style, anyway.

2015/07/28

Threads Unspooling, or "What's my problem NOW?"

I have never done much with access control lists (or ACLs), as most of my time as a Linux and Unix user has been in positions where everything needed to control access could be done with standard unix permissions: owner/group/all and read/write/execute.

Also, most of the file systems were not set up to support them, which makes the barrier to entry enough that I never got beyond "I wonder how ACLs work".

I work with genomics data on the top university supercomputing cluster in the US, and we generate lots of data for lots of users, and we had been doing some ugly hacks to share data with our users, but with the new file system, we have ACLs, which makes it as easy as setfacl -R -m "u:username:r-x" /path/to/research.

ACLs are not actually my problem.

The length of time it takes to set ACLs on a large data set is my problem.

Running the tool to set everything takes five minutes. With a subset of our total data. Which is only going to get bigger. If we're talking about a daily "get everything back to proper shape", that's well within bounds. If it's something a user is supposed to run, then no.

So, I'm looking into threads, and I can set all my ACLs in parallel using Parallel::ForkManager, and while I'm not sure threads are the asynchronous solution for Modern Perl, they work and I can get a number of directories getting their ACLs recursively set at once.

Sometimes, however, because machines go down or NFS mounts get hosed or you kill a process just to watch it die, the setting process gets interrupted. Or, you do more work and generate more data, and that goes into a subdirectory. Then, the ACLs at the top of the directory tree may be correct, but the deep nodes will be wrong, and it's best to not wait until the end-of-the day "set everything" process to get your bits in order.

So you want to set a flag. If the flag is set, you do it all. And when I try to set flags in the threaded version, I get an error.

Threads are not actually my problem.

I have the threads in the database, which makes both the incomplete-pass and the add-new-data options equally easy to handle. And, to make databases easier to handle, I have a module I call oDB which handles database access so I don't have to worry about correctness or having passwords in plaintext in my code. It uses another module I wrote, called MyDB, to connect to MySQL in the first place. I share the gist above, but I cut to the chase below.

my $_dbh ;               # Save the handle.

sub db_connect {
    my ( $param_ptr, $attr_ptr ) = @_ ;
    my $port = '3306' ;

    # ...

    if ( defined $_dbh
        && ( !defined $param_ptr || $param_ptr eq '' ) ) {
        return $_dbh ;
        }

    # ...

    if ( defined $_dbh && $new_db_params eq $_db_params ) {
        return $_dbh ;
        }

    # ...

    $_dbh = DBI->connect( 
        $source, 
        $params{ user }, 
        $params{ password }, \%attr )
        or croak $DBI::errstr ;

    return $_dbh ;
    }    # End of db_connect


Essentially , the "right thing" in this case is to generate a new DB handle each and every time, and my code is doing everything in it's power to avoid creating a new DB handle.

My problem is that I didn't write this as thread-safe. Because doing so was the furthest thing from my mind.

My problem is a failure of imagination.

2015/07/27

Things I learned for perlbrew: Config

Config.

Mostly, I haven't developed for Perl, I've developed for the perl on this machine. Sometimes, my perl on this machine, with this set of modules.

With perlbrew, you're starting with whatever Perl is available, and sometimes, upgrading the perl you're using. So, it's good to know something about that perl, isn't it?

So, how do we do that?

Config.

Look in there and you get api_revision, api_version and api_subversion, which allows you to know which version you are running.

Which makes me think that there are options here, if you're deploying software to places where they're not using the most recent perl.

In Javascript, they have a concept of polyfills, so that, if your page is loaded on an older browser with no HTML5 support, you can install a module that gives your old browser the capabilities it needs to do that.

Now, honestly, that seems a nicer way to handle things than
use 5.14 ; # Here's a nickel, buy yourself a Modern Perl
Doesn't it?

# pseudocode ahead. I'm just thinking through this
# polyfill_say.pl
use Config ;

use lib '$HOME/lib/Tools' ;

BEGIN {

if ( $Config{ 'api_revision' } < 5 || 
     $Config{ 'api_revision' } == 5 && $Config{ 'api_version' } < 10 ) {
        require Tools::JankySayReplacement qw{ say };
    }
}

say 'OK' ;

Of course there's perlbrew, plenv and just putting a modern perl in /opt/bin/perl or ~/bin/perl and being done with it. Just because I'm jazzed by an idea, that doesn't mean it's a good idea. But aren't a lot of the new cool things in Perl 5 just polyfilled back from Perl 6 these days? Shouldn't we be as nice to those stuck in 5.old as the Perl 6 people are to us?

Anyway, Config. It is in Core and it is cool.

2015/07/26

What I learned from perlbrew

I signed up for Neil Bowers' CPAN Pull Request Challenge, and the first module I got was App::perlbrew. After some looking and guessing, gugod pointed me to one of his problems, and after some time reading and understanding how things work, I got it done.

It took me a while to figure out how it worked. I had seen and used something like it — I had found out about dispatch tables from my local friendly neighborhood Perl Mongers — and I have started to use old-school Perl object orientation on occasion, but this combined them in a very interesting way.

A lot of the clever, however, isn't where I thought it was, which I didn't realize until now. The symbol-table manipulation isn't about making the commands work, but rather guessing what you meant if you give a command it can't handle. The "magic" is all about $s = $self->can($command) and $self->$s(@$args).

I wrote a quick stub of an application that would show off how to this works, with lots of comments that are meant to explain what's meant to happen instead of how it's supposed to work, as "Most comments in code are in fact a pernicious form of code duplication".

If you try symtest.pl foo, it will print 1 and foo. If you try symtest.pl food, it'll just print 1. If you instead try symtest.pl fod, it'll print "unknown command" and suggest foo and food as alternate suggestions. Like a boss.

One of the coolest things, I think is that you can put your user-facing methods in a second module. Or, perhaps I just have a low threshold for cool.

If you have questions about the code, or understand the things I handwave and know you can do better, please comment below.


2015/07/11

Interview-Style Coding Problem: Estimate Pi

Saw this as an example of the kind of programming test you get in interviews, so I decided to give it a try.

Just to report, it gets there at $i = 130657.

#!/usr/bin/env perl

use feature qw{ say  } ;
use strict ;
use warnings ;
use utf8 ;

# Given that Pi can be estimated using the function 
#   4 * (1 – 1/3 + 1/5 – 1/7 + …) 
# with more terms giving greater accuracy, 
# write a function that calculates Pi 
# to an accuracy of 5 decimal places.

my $pi = '3.14159' ;

my $c ;
for my $i ( 0..1_000_000 ) {
    my $j = 2 * $i + 1 ;
    if ( $i % 2 == 1 ) { $c -= 1 / $j  ; }
    else { $c += 1 / $j ; }
    my $p = 4 * $c ;
    my $p2 = sprintf '%.05f' , $p ;
    say join ' ' , $i , $pi , $p2 , $p  ;
    exit if $p2 eq $pi ;
    }

2015/07/08

Because everything can jump the rails...


I will have to do a write up. (While you wait for me, read Net::OpenSSH on MetaCPAN and know the key is keypath.) The thing to remember is that this means I can write complex programs that connect to other machines while I'm not there.

I've been able to do similar things with bash scripts for a while, but there's a complexity you can only get once you step away from a shell and move to a language.

That complexity has consequences. If you've never written a thing that went out of control and had unexpected destructive consequences, you're not a programmer. I'd go as far as to say that everyone has written rm -rf foo. * instead of rm -rf foo.* at least once.

This is why computer people strive to be very detail oriented. We wanted remove all the things, be they file or directory, if they start with the string "foo.", not REMOVE ALL THE THINGS!!! BUT GET THAT 'foo.' THING ESPECIALLY!!!! The stereotypical geek response starts with "Well, actually...", because "Well, actually, there's a space in there that'll ruin everyone's day" keeps everyone from spending the next few hours pulling everything off tape backup, or potentially never having those pictures from your wedding ever again.

One of the arguments toward "AI means we're doomed" is that of the stamp collector. Watch the Computerphile video, but the collector wants stamps and tells his AI "I want more stamps to fill out my collection". This is clearly a general statement, conversationally a wildcard, and the AI can take this several different ways, going from going to eBay and buying a few things with your credit card to hacking several printing presses and printing billions and billions of stamps, and to harvesting living beings to be turned into paper ink and glue.

I have a response to this thought experiment, but a part of my problem that I didn't get into is that deleting all your files is easy, spending all your money on eBay is slightly harder, but controlling things on another computer is far more difficult. If you have an open API on a machine, all I can do is things that the API lets me do, and if you have a delete everything option, you've probably done it wrong. (Or you're a Snowdenesque paranoid hacker, in which case, you know what you're doing and that's fine.)

Which brings us back to Net::OpenSSH. The first step is "connect to that server", and once you realize it's going to prompt you for a password, the second step becomes "hard code your password to make it work" and the quick follow up is "Use config files or enable ssh keys or anything that allows you to not have your password in clear text, you idiot!"

Because, with an SSH shell controlled by a program, you grant the program permissions to do every command you're capable of on that system, and for many systems, you have the ability to be very destructive.

And I have that between a number of systems, because I'm trying to make a thing work that has SSH not AJAX and JSON as the API and need to know it works outside of that. I do know, however, that it means I have the capability to run code on another machine.

Which I'm not necessarily logged on and not necessarily monitoring.

Where I'm not the admin, nor the sole user.

Where I can ruin the days of myself and many many others.

So while I code, I feel the same fear I feel while standing in line for that rickety-looking wooden roller coaster at an amusement park. 

2015/07/01

Unstuck in Time and Space: An Investigation into Location over WiFi.

I track my location with Google and my phone, because I lack sufficient paranoia. To the right is my June 30.

I swear that I didn't leave the Greater Lafayette area. I certainly didn't teleport to the southern suburbs of Indianapolis.

This happens to me all the time, and it has bugged me a lot. But, normally I've just looked and griped, rather than trying to work it out.

Today, however, I'm watching a compiler or two, so I have some time I can use to work this out.

The protocol is KML, and this is what it looks like:

That isn't all day's results, merely the point in time I jumped 67 miles to the southeast. I was going to try to use a KML-specific Perl module, but found that the ones I could find were more about generating it than parsing it, and it's XML, so I figured what the heck.

I had previous code to work out the distance between two points, so it was an easy case of parsing to find the jump:

Breaking it down, at 2015-06-30T13:13:05.103-07:00 I go 67 miles to Greenwood, and at 2015-06-30T13:53:31.467-07:00 I pop back.

Let me bring up an other map.

 I didn't have any mapping software going, and I was using wifi, so this data is location via wifi not GPS. I know, though, that the group that runs my servers has a weekly "coffee break" on Tuesdays, that I met with my admin there, and I walked around toward his office before goign back to mine. His building is off S. Grant St., and I walked next to Hawkins Hall, in front of Pao Hall, near the Forestry Building and down to my office in Whistler.

So, question is, how does location over WiFi work? I recall hearing that routers and access points report location, but I'm not sure of the protocols involved. I can imagine two possible scenarios that cause this.

First is that one of Purdue's routers is misreporting location, either in Forestry or Pao. This is possible; I have another issue that I haven't worked through yet where I leap instantly to the EE building, and it seems that's entirely based on location within my building.

The second scenario, one I'm taking more and more seriously, is that there's a replicated MAC address or something between the apartments across from Pao Hall. I say "or something" because MAC addresses should be unique. The thing that makes me suspect this is that it points me to a residential neighborhood south of Indy, and I could see that mistake happening with two residential routers or two experimental electronics projects.

I'm curious about how to test this, because I do want to know it has something to do with Purdue's networking gear before I complain. I'm also curious about how these things actually work. I could very much see me walking around, looking at Google Maps and tripping over things, then trying to dump my ARP tables or something.