2015/07/01

Unstuck in Time and Space: An Investigation into Location over WiFi.

I track my location with Google and my phone, because I lack sufficient paranoia. To the right is my June 30.

I swear that I didn't leave the Greater Lafayette area. I certainly didn't teleport to the southern suburbs of Indianapolis.

This happens to me all the time, and it has bugged me a lot. But, normally I've just looked and griped, rather than trying to work it out.

Today, however, I'm watching a compiler or two, so I have some time I can use to work this out.

The protocol is KML, and this is what it looks like:

That isn't all day's results, merely the point in time I jumped 67 miles to the southeast. I was going to try to use a KML-specific Perl module, but found that the ones I could find were more about generating it than parsing it, and it's XML, so I figured what the heck.

I had previous code to work out the distance between two points, so it was an easy case of parsing to find the jump:

Breaking it down, at 2015-06-30T13:13:05.103-07:00 I go 67 miles to Greenwood, and at 2015-06-30T13:53:31.467-07:00 I pop back.

Let me bring up an other map.

 I didn't have any mapping software going, and I was using wifi, so this data is location via wifi not GPS. I know, though, that the group that runs my servers has a weekly "coffee break" on Tuesdays, that I met with my admin there, and I walked around toward his office before goign back to mine. His building is off S. Grant St., and I walked next to Hawkins Hall, in front of Pao Hall, near the Forestry Building and down to my office in Whistler.

So, question is, how does location over WiFi work? I recall hearing that routers and access points report location, but I'm not sure of the protocols involved. I can imagine two possible scenarios that cause this.

First is that one of Purdue's routers is misreporting location, either in Forestry or Pao. This is possible; I have another issue that I haven't worked through yet where I leap instantly to the EE building, and it seems that's entirely based on location within my building.

The second scenario, one I'm taking more and more seriously, is that there's a replicated MAC address or something between the apartments across from Pao Hall. I say "or something" because MAC addresses should be unique. The thing that makes me suspect this is that it points me to a residential neighborhood south of Indy, and I could see that mistake happening with two residential routers or two experimental electronics projects.

I'm curious about how to test this, because I do want to know it has something to do with Purdue's networking gear before I complain. I'm also curious about how these things actually work. I could very much see me walking around, looking at Google Maps and tripping over things, then trying to dump my ARP tables or something.

2015/06/29

Fixing an old logic issue

I am not especially proud of the code below.
It does it's job. Give it a request and a number of accessions and the names you want them to go by, and it changes them in the database.

Except...

Accessions are defined as zero-padded six-digit numbers, so instead of 99999, you'd have 099999. If you're strict, everything's fine.

But user's are not always strict. Sometimes they just put in 99999, expecting it to just work.

Oh, if only it were that easy.

I have requests here for the purpose of ensuring that for request 09999, you can only change accessions associated with that request. This is what lines 27-29 are for, to get the set of accessions that are entered by the user and one of the given request's accessions.

Yes, requests are defined as zero-padded five-digit numbers.

If I don't zero-pad the accessions, I get nothing in @accessions.

But if I do zero pad, I get no library name from $param->{ $acc }.

There is a fix for it. I could go back to the source and ensure that this sees no un-padded numbers. I could run through the $param hashref again, But clearly, this is something I should've built in at first.

2015/06/22

"Well, That Was Strange": Hunting Gremlins in SQL and Perl

The query base is 90 lines.

Depending on what it's used for, one specific entry or the whole lot, it has different endings, but the main body is 90 lines. There are 20 left joins in it.

It is an ugly thing.

So ugly, in fact, that am loath to include it here.

So ugly that I felt it necessary to comment and explain my use of joins.

This is where the trouble started.

I noticed it when I was running tests, getting the following error.


Clearly, it needed a bind variable, but something along the line blocked it.

I had this problem on Friday morning on our newer server, then it stopped. Honestly, it was such a fire-fighting day that I lost track of what was happening with it.

Then the module was put on the old server and the problem rose up again.

Whether that makes me Shatner or Lithgow
remains an exercise for the reader.
I said "my code has gremlins" and went home at 5pm.

When I got back to the lab this morning, I made different test scripts, each identical except for the hashbang. I set one for system Perl, which is 5.10, one for the one we hardcode into most of our web and cron uses, which is 5.16, and the one we have as env perl, currently 5.20.

The cooler solution would've been to have several versions of Perl installed with Perlbrew, then running perlbrew exec perl myprogram.pl instead, but I don't have Perlbrew installed on that system.

The error occurs with 5.10. It does not with 5.16 or 5.20.

And when I run it against a version without the comments in the query, it works everywhere.

I don't have any clue if the issue is with Perl 5.10 or with the version of DBI currently installed with 5.10, and I don't expect to. The old system is a Sun machine that was off support before I was hired in, and the admin for it reminds us each time we talk to him that it's only a matter of time before it falls and can no longer get up. I haven't worked off that machine for around two years, and this query's move to the old server is part of the move of certain services to the new machine.

And, as everything is fine with Perls 5.16 or higher, I must regard this as a solved problem except with legacy installs.

I know that MySQL accepts # as the comment character, but Sublime Text prefers to make -- mean SQL comments, so when I commented the query, I used the double-dash, and our solution is to remove the comments when deploying to the old server. It's a temporary solution, to be sure, but deploying to the old server is only temporary, too.

It's a sad and strange situation where the solution is to uncomment code, but here, that seems to be it.

Update: Matt S. Trout pushed me to check into the DBD::mysql versions, to see which versions corresponded to the error. The offending 5.10 perl used DBD::mysql v. 4.013, and looking at the DBD::mysql change log, I see bug #30033: Fixed handling of comments to allow comments that contain characters that might otherwise cause placeholder detection to not work properly. Matt suggests adding "use DBD::mysql 4.014;", which is more than reasonable.

2015/06/17

Head-to-Head Web Scraping with Perl: Mojo::DOM vs Web::Query

In the last meeting of Purdue Perl Mongers, Joe Kline mentioned Sawyer X's YAPC::NA talk on Modern Web Scraping, where he talked about Web::Query, which uses CSS selectors, compared to the XPath selectors he uses for his own web scraping.

I had just written and posted code where I used Mojo::DOM to scrape YouTube. So decided to do a head-to-head parsing of the same corpus.

And found that, except for wq($file) and Mojo::DOM->new($file), the code is identical.

Seriously, only a small string that says it's using Web::Query or Mojo::DOM that's different.

In running, Mojo::DOM is a little bit faster, though.



2015/06/15

Making a list of YAPC::NA 2015 Videos using Mojo::DOM

Last week, Perl devs from all over North America, and some from other continents, met in Salt Lake City, Utah, for YAPC::NA 2015.

Last week, I took vacation. But I spent the time with family, going to Ohio.

So, of course, I wanted to get a list of all the talks that they were able to record and put on YouTube, to list for my local Mongers group, Purdue.pm, and to allow me to go back and watch at my leisure.

I could've parsed the HTML with regular expressions, but that isn't protocol, so I used this as an excuse to work with Mojo::DOM. I generally prefer finding code examples to reading documentation, so here's my code.

To get the HTML, I opened https://www.youtube.com/user/yapcna/videos in Chrome, clicked load more to get all of this years' videos ( and some of last year's), then grabbed the HTML from Chrome Dev Tools and pasted it into the __DATA__ section of the program. Grabbing the HTML with LWP or the like wouldn't have grabbed it all.

Enjoy!



2015/06/03

Testing AJAX APIs with Perl

In my lab, we have an AJAX-laden web tool which loads a certain JSON API on page load. It was judged that what we had was too slow, so I created a program that wrote that JSON to a static file on regular intervals. Problem with that, of course, is that changes to the data would not show up in the static file until the next scheduled update.

So, we created a third version, which checks the database for checksum, and if it changes, it regenerates the file and sends the data. Otherwise, it opens the file and sends the data.

I tested with Chrome Dev Tools, which told a bit of the story, but at the scale where it's closer to anecdotes than data. I wanted to go into the hundreds of hits, not just one. I pulled out Benchmark, which told a story, but wasn't quite what I wanted. It started the clock, ran it n times, then stopped the clock, while I wanted to get clock data on each GET.

I also realized I needed to test to be sure that the data I was getting was the same, so I used Test::Most to compare the object I pulled out of the JSON. That was useful, but most useful was the program I wrote using Time::HiRes to more accurately grab the times, then use Statistics::Basic and List::Util to take the collected arrays of sub-second response times and show me how much faster it is to cache.

And it is fairly significant. The best and worst performance were comparable, but the average case has the cached version being about twice as fast, and using the static file being about 7 times faster. With, of course, the same problems.

If I wasn't about to take time out of the office, I'd start looking into other methods to get things faster. Good to know, though, that I have the means to test and benchmark it once I get back next week.

2015/05/19

01010100 01101000 01100001 01110100 00100111 01110011 00100000 01110111 01100101 01101001 01110010 01100100 00101110 00101110 00101110

Slight language warning.

The PaleoFuture blog on Gizmodo had a review of Tomorrowland from a Robot from the Future today, and of course it's in binary. I saw that shortly after coming in, and I thought "Hey, that'd be an interesting exercise to use to prime my programming pump, get my head in the right mindset."

So, I wrote code to decode it. I read perldoc -f pack and thought "Nope", so I found Data::Translate, which I now know is a very simple wrapper.

Once I wrote that, I decided the thing to do was to write b2a.pl and a2p.pl, because why not? But I found that this
a2b.pl Bite my shiny metal ass
gave me this
01000010 01101001 01110100 01100101 00100000 01101101 01111001 00100000 01110011 01101000 01101001 01101110 01111001 00100000 01101101 01100101 01110100 01100001 01101100 00100000 01100001 01110011 01110011
but this
b2a.pl 01000010 01101001 01110100 01100101 00100000 01101101 01111001 00100000 01110011 01101000 01101001 01101110 01111001 00100000 01101101 01100101 01110100 01100001 01101100 00100000 01100001 01110011 01110011
gave me this
Bitemyshinymetalass
So, what went wrong? Where did the spaces go?

This will take some time with me trying to grok pack, which is so unrelated to my daily responsibilities that I installed Data::Translate simply so I wouldn't have to learn a thing, but it seems that, if you pass pack the space character in binary (00100000), it returns an empty string. This is a somewhat strange thing, but once you understand the behavior, you can react to it.


(The catchphrase for Bender seemed the most appropriate text to use.)


2015/05/18

Some Wisdom From Twitter

2015/05/14

There are Two Javas

So, this happened...

On the one hand, I wrote this on a day when I woke up after watching this fantastic conference talk on Neo4J and RNeo4J from Nicole White before bed. This had served to stoke the fires of my Big Data / Data Science jones and had moved graph databases from the corner of NoSQL that I just didn't get to something I had the concepts for but needed to play with to understand.

On the other hand, I wrote this on a day where a tech in my lab was upgrading an instrument machine (a computer that controls a data-generating scientific instrument), and in the process, upgraded Java. This lead to that corner of the lab to enter a time of complete shutdown while nothing worked.

I told someone that day, saying "one of my users updated Java", and without any further explanation, he knew exactly what the problem was, exactly how bad it could be. "They didn't", he said.

This is not a case of me hating Java as a language. I mean, I do, but nobody is asking me to write it, at least not in this context. With Neo4J, I there are modules such as RNeo4J, serving as an interface between the database and the language you're doing your work in. I don't care what language MySQL or MongoDB are written in; I just care that they'll store my data and give it back when I need it.

The one thing that distinguishes Java is "write once, run anywhere"; you write it and compile it so it runs in a VM, abstracting away all connections to underlying systems. Except, there are systems created years ago, where the software was written for a version of the JVM that has been replaced and abandoned a long time ago, where they even deploy computers running XP that were built for Windows Vista or even Windows 7. These exist to some extent because for schools of engineering and computer science, you have to teach the basics of computing using something, and somewhere along the line, the go-to something for introductory courses became Java. So, if you're making complex, computer-controlled scientific instruments, you're needing to design for what students know, rather than spend time and money teaching them a new language as well as the interface to your science. And, because once you sell it, you don't get money for upgrading the interface, so you just don't.

That is my biggest fear. That running a big program to run a big program will tend to choke a computer is an issue, but not as big an issue in this case. It'd run on an older computer, but it's a far newer computer than I started to develop my Java hate on, decades ago. I mostly think that, in context of modern, well-backed projects like Neo4J and Cassandra, I wouldn't end up where I can't upgrade the JVM because it'll break my tools and but every time I use the mouse, a popup shows up saying I'm several major versions behind, and I couldn't download and store the "right" version for my software if I wanted to.

Those are the two Javas: The bright shiny one that promises a good future, and the previous version that keeps you stuck in the past.

2015/05/13

Being Open, Being Wrong, Being Corrected, Being Smarter: Response about MongoDB

Just heard a podcast from Developer Tea where Ben Lesh of Netflix explain how he likes it when he makes mistakes, because then he learns, and we should always be learning. (I listened to it when driving, so I didn't write down the quote, and I'm not sure if it's in part 1 or part 2 of the interview.)

Yesterday, I posted on issues I had with MongoDB and Perl. I had done what I had wanted before, I was sure, but my code no longer reflected it, and Shub-Internet was not responding with the answers I desired. If you don't recall:

# This worked to get one value:
my $find_one = $collection->find_one( { date => $today }) ;

# but if I wanted the whole thing, this didn't work:
my $find = $collection->find ;

As it turns out, there's a specific piece of information I did not find, but David Golden of MongoDB set me straight.

# but if I wanted the whole thing, this didn't work:
my $find = $collection->find ;
my @all = $find->all ;
# and in the case I'd want:
my %all =
    map { $_->{ date } => $_ } #map, grep and sort are all-powerful
    $find->all ;
say Dumper \%all ;

This is great. Wonderful, even. I had a failure in understanding, I was open about it, I was corrected, and now I am a better programmer for it.

I do sorta wish I had given Stack Overflow a deeper search first, though....