Cookie Notice

As far as I know, and as far as I remember, nothing in this page does anything with Cookies.

2015/08/01

What Language Should I Learn? Three Answers

A friend of mine, who works in IT but is not a developer, asked me a question during lunch today.

"I want to learn to program. What language should I learn first?"

This is a common question, one I have answered before. But, because I've blogged a lot recently, I've decided to write it up and post here.

I told him I have three answers.

The first is somewhat sarcastic. "I'm going to Europe. What language should I learn?" There just is not enough information to really answer that question, because many languages are very context-specific. If you're hoping to get into programming apps for the iPhone, your best choice is Objective C. If you want to code for Arduino microcontrollers, you'll want to start with the Arduino IDE and it's very C-like language. And, of course, there's JavaScript, your only choice on web browsers.

Where you want to go determines what you should learn.

But there's more to it than that.

There's a thing called the Church-Turing Theory, which states that any real-world calculation can be computed. Turing postulated using a Turing Machine, while Church referenced the Lambda calculus.

We get to a concept called Turing Completeness. A thing that can be computed in one Turing-Complete machine can be simulated in another Turing-Complete machine. The first real use of this was the creation of compilers, of higher-level languages that developers can use which compile to machine code that the hardware itself can run. What it means, for those learning, is that it doesn't really matter what language you learn, that anything one language does can be done by another language.

So the second answer is, Alan Turing would tell you it just doesn't matter which language you choose, that what you do and learn in one language can be simulated or applied in another language. So it doesn't really matter which you choose.

When Jeff Atwood of Coding Horror coined Atwood's Law -- any application that can be written in JavaScript, will eventually be written in JavaScript -- he didn't know the half of it. He knew that graphical applications were starting to be done within web browsers, like Gmail, He didn't know that web server applications and even command-line applications could be written in JavaScript via Node.js. He didn't know that a framework for creating cross-platform mobile applications using web technologies including JavaScript called Cordova would come along. He didn't know that Microsoft would allow developers to create Windows applications using HTML and JavaScript. He didn't know that Open Source microcontrollers such as Arduino would be developed, and frameworks such as Johnny Five would come to allow you to develop Internet of Things projects and even robots with Javascript. It might be a bit more complex to set it up to do these things with JavaScript, but they are possible.

Plus, if your code plans are more functional and computer-theoretical, you'd be glad to know that JavaScript is a Lisp.

If you want to code Objective-C, you need a Mac and the Apple development tools. If you want to code C#, you'll need to install Visual Studio tools from Microsoft (or Mono on Linux). If you want to code JavaScript, you'll need a text editor (and one comes with your computer, I promise) and a web browser (and one comes with your computer, I promise), plus there are places like CodeBin where you can enter your code into the browser itself .

If you're going to be writing an operating system, device drivers, you will want something that compiles to native machine code. If you're looking to get into a specific project, you'll want to know the language of that project. But the corners of the development landscape where JavaScript is the wrong choice are small and shrinking. So, the third answer is, it might as well be JavaScript.

This rubs me a bit wrong. I've set my rep as a Perl Monger, and I always feel like that's where you should start. But while my heart feels that, my mind argues the above, that the greater forces of modern computing are pushing to give JavaScript a front-row seat in the language arena.

But I'm willing to be wrong, and if I am, I want to know. Where am I wrong? What would you tell someone wanting to learn to program?

2015/07/29

Justifying My Existence: Indent Style

I just got dinked for my indent style on StackOverflow.

I wrote an answer to someone's question, even as I figured there are much better ways to handle his greater issue than store everything in text files like that.

Nearly immediately, I see this: For uncuddled elses the practice is to let the closing block brace } be vertically aligned with the following elsif.

I remember when I started coding oh so many years ago. I remember looking at GNU style and not liking it.

if ($boolean)
{
    ...
}

"You've disconnected the beginning brace from the conditional", I thought. "I can't code like that."

The other primary style people talk about is K&R.

if ($boolean) {
    ...
}

"Better", I thought. "The beginning of the block is connected to the conditional, so that's good. But the end. The brace at the end of the block won't tell you it's connected to the block at all. Nope."

It's about the readability. The part that's in the main part of the code is the if statement. The block should be separate from the sounding code, and this style (I'm told it's Ratliff style) is what I've been using since.

if ($boolean) {
    ...
    }

My first degree was in journalism, and part of how I proved myself in my first computing jobs is making the large text blocks of the early web attractive and readable. At least, as far as early web browsers let you. And, while I am a vocal Python hater, and a hater of significant white space in programming languages in general, by-the-books Python is close to how I like to format code. (Just be sure to tell your editor that \t is the devil's work and should be shunned.)

Below is my .perltidyrc. I believe I first started using that tool soon after I read Perl Best Practices by Damian Conway. Ironically, perhaps, I moved to the long form because I found the terse form found in PBP to be unreadable and uneditable, anyway.


If you have a problem with my code alignment, perltidy exists. Use it.

I'd rather offend you with substance than with style, anyway.

2015/07/28

Threads Unspooling, or "What's my problem NOW?"

I have never done much with access control lists (or ACLs), as most of my time as a Linux and Unix user has been in positions where everything needed to control access could be done with standard unix permissions: owner/group/all and read/write/execute.

Also, most of the file systems were not set up to support them, which makes the barrier to entry enough that I never got beyond "I wonder how ACLs work".

I work with genomics data on the top university supercomputing cluster in the US, and we generate lots of data for lots of users, and we had been doing some ugly hacks to share data with our users, but with the new file system, we have ACLs, which makes it as easy as setfacl -R -m "u:username:r-x" /path/to/research.

ACLs are not actually my problem.

The length of time it takes to set ACLs on a large data set is my problem.

Running the tool to set everything takes five minutes. With a subset of our total data. Which is only going to get bigger. If we're talking about a daily "get everything back to proper shape", that's well within bounds. If it's something a user is supposed to run, then no.

So, I'm looking into threads, and I can set all my ACLs in parallel using Parallel::ForkManager, and while I'm not sure threads are the asynchronous solution for Modern Perl, they work and I can get a number of directories getting their ACLs recursively set at once.

Sometimes, however, because machines go down or NFS mounts get hosed or you kill a process just to watch it die, the setting process gets interrupted. Or, you do more work and generate more data, and that goes into a subdirectory. Then, the ACLs at the top of the directory tree may be correct, but the deep nodes will be wrong, and it's best to not wait until the end-of-the day "set everything" process to get your bits in order.

So you want to set a flag. If the flag is set, you do it all. And when I try to set flags in the threaded version, I get an error.

Threads are not actually my problem.

I have the threads in the database, which makes both the incomplete-pass and the add-new-data options equally easy to handle. And, to make databases easier to handle, I have a module I call oDB which handles database access so I don't have to worry about correctness or having passwords in plaintext in my code. It uses another module I wrote, called MyDB, to connect to MySQL in the first place. I share the gist above, but I cut to the chase below.

my $_dbh ;               # Save the handle.

sub db_connect {
    my ( $param_ptr, $attr_ptr ) = @_ ;
    my $port = '3306' ;

    # ...

    if ( defined $_dbh
        && ( !defined $param_ptr || $param_ptr eq '' ) ) {
        return $_dbh ;
        }

    # ...

    if ( defined $_dbh && $new_db_params eq $_db_params ) {
        return $_dbh ;
        }

    # ...

    $_dbh = DBI->connect( 
        $source, 
        $params{ user }, 
        $params{ password }, \%attr )
        or croak $DBI::errstr ;

    return $_dbh ;
    }    # End of db_connect


Essentially , the "right thing" in this case is to generate a new DB handle each and every time, and my code is doing everything in it's power to avoid creating a new DB handle.

My problem is that I didn't write this as thread-safe. Because doing so was the furthest thing from my mind.

My problem is a failure of imagination.

2015/07/27

Things I learned for perlbrew: Config

Config.

Mostly, I haven't developed for Perl, I've developed for the perl on this machine. Sometimes, my perl on this machine, with this set of modules.

With perlbrew, you're starting with whatever Perl is available, and sometimes, upgrading the perl you're using. So, it's good to know something about that perl, isn't it?

So, how do we do that?

Config.

Look in there and you get api_revision, api_version and api_subversion, which allows you to know which version you are running.

Which makes me think that there are options here, if you're deploying software to places where they're not using the most recent perl.

In Javascript, they have a concept of polyfills, so that, if your page is loaded on an older browser with no HTML5 support, you can install a module that gives your old browser the capabilities it needs to do that.

Now, honestly, that seems a nicer way to handle things than
use 5.14 ; # Here's a nickel, buy yourself a Modern Perl
Doesn't it?

# pseudocode ahead. I'm just thinking through this
# polyfill_say.pl
use Config ;

use lib '$HOME/lib/Tools' ;

BEGIN {

if ( $Config{ 'api_revision' } < 5 || 
     $Config{ 'api_revision' } == 5 && $Config{ 'api_version' } < 10 ) {
        require Tools::JankySayReplacement qw{ say };
    }
}

say 'OK' ;

Of course there's perlbrew, plenv and just putting a modern perl in /opt/bin/perl or ~/bin/perl and being done with it. Just because I'm jazzed by an idea, that doesn't mean it's a good idea. But aren't a lot of the new cool things in Perl 5 just polyfilled back from Perl 6 these days? Shouldn't we be as nice to those stuck in 5.old as the Perl 6 people are to us?

Anyway, Config. It is in Core and it is cool.

2015/07/26

What I learned from perlbrew

I signed up for Neil Bowers' CPAN Pull Request Challenge, and the first module I got was App::perlbrew. After some looking and guessing, gugod pointed me to one of his problems, and after some time reading and understanding how things work, I got it done.

It took me a while to figure out how it worked. I had seen and used something like it — I had found out about dispatch tables from my local friendly neighborhood Perl Mongers — and I have started to use old-school Perl object orientation on occasion, but this combined them in a very interesting way.

A lot of the clever, however, isn't where I thought it was, which I didn't realize until now. The symbol-table manipulation isn't about making the commands work, but rather guessing what you meant if you give a command it can't handle. The "magic" is all about $s = $self->can($command) and $self->$s(@$args).

I wrote a quick stub of an application that would show off how to this works, with lots of comments that are meant to explain what's meant to happen instead of how it's supposed to work, as "Most comments in code are in fact a pernicious form of code duplication".

If you try symtest.pl foo, it will print 1 and foo. If you try symtest.pl food, it'll just print 1. If you instead try symtest.pl fod, it'll print "unknown command" and suggest foo and food as alternate suggestions. Like a boss.

One of the coolest things, I think is that you can put your user-facing methods in a second module. Or, perhaps I just have a low threshold for cool.

If you have questions about the code, or understand the things I handwave and know you can do better, please comment below.


2015/07/11

Interview-Style Coding Problem: Estimate Pi

Saw this as an example of the kind of programming test you get in interviews, so I decided to give it a try.

Just to report, it gets there at $i = 130657.

#!/usr/bin/env perl

use feature qw{ say  } ;
use strict ;
use warnings ;
use utf8 ;

# Given that Pi can be estimated using the function 
#   4 * (1 – 1/3 + 1/5 – 1/7 + …) 
# with more terms giving greater accuracy, 
# write a function that calculates Pi 
# to an accuracy of 5 decimal places.

my $pi = '3.14159' ;

my $c ;
for my $i ( 0..1_000_000 ) {
    my $j = 2 * $i + 1 ;
    if ( $i % 2 == 1 ) { $c -= 1 / $j  ; }
    else { $c += 1 / $j ; }
    my $p = 4 * $c ;
    my $p2 = sprintf '%.05f' , $p ;
    say join ' ' , $i , $pi , $p2 , $p  ;
    exit if $p2 eq $pi ;
    }

2015/07/08

Because everything can jump the rails...


I will have to do a write up. (While you wait for me, read Net::OpenSSH on MetaCPAN and know the key is keypath.) The thing to remember is that this means I can write complex programs that connect to other machines while I'm not there.

I've been able to do similar things with bash scripts for a while, but there's a complexity you can only get once you step away from a shell and move to a language.

That complexity has consequences. If you've never written a thing that went out of control and had unexpected destructive consequences, you're not a programmer. I'd go as far as to say that everyone has written rm -rf foo. * instead of rm -rf foo.* at least once.

This is why computer people strive to be very detail oriented. We wanted remove all the things, be they file or directory, if they start with the string "foo.", not REMOVE ALL THE THINGS!!! BUT GET THAT 'foo.' THING ESPECIALLY!!!! The stereotypical geek response starts with "Well, actually...", because "Well, actually, there's a space in there that'll ruin everyone's day" keeps everyone from spending the next few hours pulling everything off tape backup, or potentially never having those pictures from your wedding ever again.

One of the arguments toward "AI means we're doomed" is that of the stamp collector. Watch the Computerphile video, but the collector wants stamps and tells his AI "I want more stamps to fill out my collection". This is clearly a general statement, conversationally a wildcard, and the AI can take this several different ways, going from going to eBay and buying a few things with your credit card to hacking several printing presses and printing billions and billions of stamps, and to harvesting living beings to be turned into paper ink and glue.

I have a response to this thought experiment, but a part of my problem that I didn't get into is that deleting all your files is easy, spending all your money on eBay is slightly harder, but controlling things on another computer is far more difficult. If you have an open API on a machine, all I can do is things that the API lets me do, and if you have a delete everything option, you've probably done it wrong. (Or you're a Snowdenesque paranoid hacker, in which case, you know what you're doing and that's fine.)

Which brings us back to Net::OpenSSH. The first step is "connect to that server", and once you realize it's going to prompt you for a password, the second step becomes "hard code your password to make it work" and the quick follow up is "Use config files or enable ssh keys or anything that allows you to not have your password in clear text, you idiot!"

Because, with an SSH shell controlled by a program, you grant the program permissions to do every command you're capable of on that system, and for many systems, you have the ability to be very destructive.

And I have that between a number of systems, because I'm trying to make a thing work that has SSH not AJAX and JSON as the API and need to know it works outside of that. I do know, however, that it means I have the capability to run code on another machine.

Which I'm not necessarily logged on and not necessarily monitoring.

Where I'm not the admin, nor the sole user.

Where I can ruin the days of myself and many many others.

So while I code, I feel the same fear I feel while standing in line for that rickety-looking wooden roller coaster at an amusement park. 

2015/07/01

Unstuck in Time and Space: An Investigation into Location over WiFi.

I track my location with Google and my phone, because I lack sufficient paranoia. To the right is my June 30.

I swear that I didn't leave the Greater Lafayette area. I certainly didn't teleport to the southern suburbs of Indianapolis.

This happens to me all the time, and it has bugged me a lot. But, normally I've just looked and griped, rather than trying to work it out.

Today, however, I'm watching a compiler or two, so I have some time I can use to work this out.

The protocol is KML, and this is what it looks like:

That isn't all day's results, merely the point in time I jumped 67 miles to the southeast. I was going to try to use a KML-specific Perl module, but found that the ones I could find were more about generating it than parsing it, and it's XML, so I figured what the heck.

I had previous code to work out the distance between two points, so it was an easy case of parsing to find the jump:

Breaking it down, at 2015-06-30T13:13:05.103-07:00 I go 67 miles to Greenwood, and at 2015-06-30T13:53:31.467-07:00 I pop back.

Let me bring up an other map.

 I didn't have any mapping software going, and I was using wifi, so this data is location via wifi not GPS. I know, though, that the group that runs my servers has a weekly "coffee break" on Tuesdays, that I met with my admin there, and I walked around toward his office before goign back to mine. His building is off S. Grant St., and I walked next to Hawkins Hall, in front of Pao Hall, near the Forestry Building and down to my office in Whistler.

So, question is, how does location over WiFi work? I recall hearing that routers and access points report location, but I'm not sure of the protocols involved. I can imagine two possible scenarios that cause this.

First is that one of Purdue's routers is misreporting location, either in Forestry or Pao. This is possible; I have another issue that I haven't worked through yet where I leap instantly to the EE building, and it seems that's entirely based on location within my building.

The second scenario, one I'm taking more and more seriously, is that there's a replicated MAC address or something between the apartments across from Pao Hall. I say "or something" because MAC addresses should be unique. The thing that makes me suspect this is that it points me to a residential neighborhood south of Indy, and I could see that mistake happening with two residential routers or two experimental electronics projects.

I'm curious about how to test this, because I do want to know it has something to do with Purdue's networking gear before I complain. I'm also curious about how these things actually work. I could very much see me walking around, looking at Google Maps and tripping over things, then trying to dump my ARP tables or something.

2015/06/29

Fixing an old logic issue

I am not especially proud of the code below.
It does it's job. Give it a request and a number of accessions and the names you want them to go by, and it changes them in the database.

Except...

Accessions are defined as zero-padded six-digit numbers, so instead of 99999, you'd have 099999. If you're strict, everything's fine.

But user's are not always strict. Sometimes they just put in 99999, expecting it to just work.

Oh, if only it were that easy.

I have requests here for the purpose of ensuring that for request 09999, you can only change accessions associated with that request. This is what lines 27-29 are for, to get the set of accessions that are entered by the user and one of the given request's accessions.

Yes, requests are defined as zero-padded five-digit numbers.

If I don't zero-pad the accessions, I get nothing in @accessions.

But if I do zero pad, I get no library name from $param->{ $acc }.

There is a fix for it. I could go back to the source and ensure that this sees no un-padded numbers. I could run through the $param hashref again, But clearly, this is something I should've built in at first.

2015/06/22

"Well, That Was Strange": Hunting Gremlins in SQL and Perl

The query base is 90 lines.

Depending on what it's used for, one specific entry or the whole lot, it has different endings, but the main body is 90 lines. There are 20 left joins in it.

It is an ugly thing.

So ugly, in fact, that am loath to include it here.

So ugly that I felt it necessary to comment and explain my use of joins.

This is where the trouble started.

I noticed it when I was running tests, getting the following error.


Clearly, it needed a bind variable, but something along the line blocked it.

I had this problem on Friday morning on our newer server, then it stopped. Honestly, it was such a fire-fighting day that I lost track of what was happening with it.

Then the module was put on the old server and the problem rose up again.

Whether that makes me Shatner or Lithgow
remains an exercise for the reader.
I said "my code has gremlins" and went home at 5pm.

When I got back to the lab this morning, I made different test scripts, each identical except for the hashbang. I set one for system Perl, which is 5.10, one for the one we hardcode into most of our web and cron uses, which is 5.16, and the one we have as env perl, currently 5.20.

The cooler solution would've been to have several versions of Perl installed with Perlbrew, then running perlbrew exec perl myprogram.pl instead, but I don't have Perlbrew installed on that system.

The error occurs with 5.10. It does not with 5.16 or 5.20.

And when I run it against a version without the comments in the query, it works everywhere.

I don't have any clue if the issue is with Perl 5.10 or with the version of DBI currently installed with 5.10, and I don't expect to. The old system is a Sun machine that was off support before I was hired in, and the admin for it reminds us each time we talk to him that it's only a matter of time before it falls and can no longer get up. I haven't worked off that machine for around two years, and this query's move to the old server is part of the move of certain services to the new machine.

And, as everything is fine with Perls 5.16 or higher, I must regard this as a solved problem except with legacy installs.

I know that MySQL accepts # as the comment character, but Sublime Text prefers to make -- mean SQL comments, so when I commented the query, I used the double-dash, and our solution is to remove the comments when deploying to the old server. It's a temporary solution, to be sure, but deploying to the old server is only temporary, too.

It's a sad and strange situation where the solution is to uncomment code, but here, that seems to be it.

Update: Matt S. Trout pushed me to check into the DBD::mysql versions, to see which versions corresponded to the error. The offending 5.10 perl used DBD::mysql v. 4.013, and looking at the DBD::mysql change log, I see bug #30033: Fixed handling of comments to allow comments that contain characters that might otherwise cause placeholder detection to not work properly. Matt suggests adding "use DBD::mysql 4.014;", which is more than reasonable.