Cookie Notice

As far as I know, and as far as I remember, nothing in this page does anything with Cookies.


Considering committing Clever with jQuery XHR code

A code snippet

    if ( 'xhr' ) ) { ; 
        } = $.ajax({
        url: url ,
        data: object ,
        method: 'POST',
        dataType: 'json'
        }) ;

Seems that Syntax Highlighter is happy with JavaScript. OK.

I'm using jQuery's .ajax() interface, and what this does is, should the function that does this get called again, aborts an unfinished AJAX call. This is wonderful when this will be called from one place, but it sucks if you call it from several places to do a few different things.

And, by wonderful, I mean useful for the UX. If you tell the web server to do a thing, you cannot go back and say "No, I didn't mean it." You can tell it "Do something else to get us back to the first state", but you cannot stop it once it has started, unless you're logged into the server and can kill -9.

So, I am considering making xhr into an object, based on a unique identifier for what's actually being done, which should give me[id], so I could have several XHR tasks going on in parallel.


Trying to Read MongoDB dump files using BSON

I've been looking at increasing the amount of MongoDB we use at work, and this includes backing up the data. Due to my own confusion, I had a little issue getting mongodump to work, but I have been able to dump from one Mongo database and restore to another.

mongodump writes in a format called Binary JSON, or BSON. I installed with the intention of reading the BSON file and ensuring it works. with small tests, I was able to put objects into BSON, write to file, read from file, and use Data::Dumper to display the object is what I wanted it to be.

But, I find I cannot read the file, because reports it as having an incorrect length.

I fully expect that I'm doing something subtly stupid, but what it could be isn't immediately obvious. the contents of $bson should be exactly as written to file, and mongorestore was happy with it. encode() and decode() had worked acceptably, but admittedly on a much smaller dataset than the one I'm working with here, which contains several months of status updates.

I suppose I don't need to read the BSON files, but I do like being able to check the contents of a database before restoring.


Long, Boring To-Do Post

I'm considering stepping up my web presence, and as a precursor to that, I've created a twitter account specifically connected to this blog, @varlogrant. So, I need to do things to make it look more like I'm using it and less like it's an egg. (If I made the account picture the cover of that one Wilco album, would people get the joke?)

I certainly can automate a lot of stuff, going through the back catalog and retweeting old posts, but the question is, how much of that is just spammy? And, to what extent should I move my opinionated tech tweeting from @jacobydave to @varlogrant?

Beyond that, it strikes me that blogs where the blogger is more-or-less talking to himself are self-limiting, so I should start blogging more about certain subjects and less about things that are annoying me right this minute. 

That being said:
  • I am involved in a group creating an annual event. Specifically, I'm the web guy. There are some administrivia things going on, creating the pages for the team. This is a small matter of Wordpress time, so not hard. 
  • A harder thing is this: We have photos of previous presenters, which were existing head-shots of them, from before their presentations. We also have a large library of photos from the events. I've decided that the smart move is to use Flickr's API and WebService::Simple to grab all the old photos, use smartcrop.js to crop them to the appropriate size, and either personally chose a good one or make a web tool to crowdsource that amongst the group. This process seems more fun to me than the other thing.
  • I promised a while ago to contribute some R-specific stuff to Bobby Tables, and have done jack since. I made some progress on it recently, but need to install a more modern version of R to do appropriate testing before I make a PR. When I first looked into it, I saw no SQL escaping and no placeholders, but now I'm seeing some progress. Nothing's quite up to snuff, in my opinion, but it's better. 
  • A side project I'm involved in has need of Bluetooth Low Energy support, and I've done the slightest poking with it. I need to do more. It seems that a few necessary tools for debugging are Unix/Linux/Mac only, and my dev laptop is Windows, so I need to either get going with VirtualBox, figure things out in Visual Studio or let it go.
  • There's also need for a smartphone app, and my experiences with Eclipse and Android Studio haven't been pleasant. I know there's Cordova integration with Visual Studio, so that seems to be the quick way in. I don't know if I can do any BLE stuff within a Cordova app, but we'll get there when we get there.
  • There's another side project I'm involved in, called Lafayettech. Specifically, I'm in the proto-makerspace corner, Lafayettech Labs. And it seems like I'm the only one involved in it. So, I am thinking of stopping. Right now, there's a few self-perpetuating scripts in a crontab file that do much of the work. I need to decide something about this.
There's a few more things that should be here, but I don't have together enough to even make a lame bullet point for.


Thinking Aloud: Power Issues for a Raspberry Pi as a Car Computer

We could switch from a Raspberry Pi to an oDroid or another sort of low-power computer-on-a-board. My Pi has a task right now, so if I was to go forward with this, I'll have to get something new anyway, but for sake of this discussion, we'll assume this is it.

I own a USB GPS unit. I own a OBDII-to-USB unit. I own a small VGA monitor for Pi use. A thing that would be useful is a thing that does some networking over the cellphone network, but if it just dumps to my home network when I get home, that'd be good enough.

Here's a niggly bit or me: I start the vehicle and the Pi gets power. I stop the vehicle and the power cuts, leading the computer shutting down suddenly. This is not a happy thing with computers. In fact, I think I can say they hate that, and eventually, the SD card will say enough with this and not boot.

So, the proper solution is to have a power circuit with a battery, that allows it to boot when the car starts and sends the shutdown signal when it stops, but providing enough juice in the battery for the Pi to shut down nicely.

Google told me how to trigger the shutdown when wanted. Just need to figure out how to know what's going on with power.

Overkill II: The Quickening

Previously on /var/log/rant, I talked about using recursion to brute-force a middle-school math problem. Because I learned a little bit about using Xeon Phi co-processor (the part formerly called video cards), I thought I'd try C++. And found that, while the Perl version ran for about a minute and a half, the C++ version took about a second and a half.

I then tried a Python version, using the same workflow as with the C++. I backed off on the clever for the testing because I am not as sure about using multidimensional arrays in C++ and Python as I am in Perl. When you only code in a language about once every 15 years, you begin to forget the finer points.

Anyway. the code follows. I don't believe I'm doing a particularly stupid thing with my Perl here, but it's hard to ID particularly stupid things in languages sometimes. Here's the code, now with your USDA requirement of Node.js.


Overkill: Using the Awesome Power of Modern Computing to Solve Middle School Math Problems

I was helping my son with his math the other night and we hit a question called The Magic Box. You are given a 3x3 square and the digits 3,4,5,6,7,8,9,10,11, and are expected to find a way of placing them such that each row, each column, and each diagonal adds up to 21.

I'm a programmer, so my first thought was, hey, I'll make a recursive algorithm to do this. The previous question, measuring 4 quarts when you have a 3 quart measure and a 5 quart measure, was solved due to insights remembered from Die Hard With A Vengeance, so clearly, I'm not coming at these questions from the textbook.

With a little more insight, I solved it. 7 is a third of 21, and it the center of an odd-numbered sequence of numbers, so clearly, it is meant to be the center. There is only one way you can use 11 on a side, with 4 and 6, so that a center column or row will be 3 7 11. If you know that column and the 4 11 6 row, you know at least this:

.  3  .
.  7  .
4 11  6

And because you know the diagonals, you know that it'll be

8  3 10 
.  7  .
4 11  6

And you only have 5 and 9 left, and they're easy to just plug in

8  3 10
9  7  5
4 11  6

So, that's the logical way to to solve it. Clearly, order isn't important; it could be reversed on the x and y axis and still be a solution. But, once the thought of solving with a recursive algorithm came into my head, I could not leave well enough alone. So, here's a recursive program that finds all possible solutions for this problem.


You Know Where You Stand In Your Hellhole

Sawyer X tweeted this:
I said "Deep".

This can be a variation of "He says he has 5 years of experience, but he really has 1 year of experience 5 times." Except not really.

I've worked as a developer for years, and it took me years before I started writing modules. It took a while after before I started having those modules being more than bags of vaguely related functions. And it was just this year before I looked into and started contributing patches to open source projects..

So, one way of looking at this is "I have one year experience as a newbie which I repeated for five years, one year of being a coder which I repeated for five years, and I've just finished a year of being a developer making modern tools for other developers, which I haven't repeated." Or the like.

There isn't necessarily anything wrong with this. In the year where you've been coding, you're doing a thing. You aren't growing, you aren't taking it to the next level, but you are creating and maintaining code, and you are making something that provides value to someone.

Or, you can think of Sawyer's statement more like I've been coding, working at the a well-trod level, bit-twiddling and the like, but not doing anything interesting. This is the feeling I get when I get close to more computer-theoretical things. I have access to computers with massive amounts of cores, massive amounts of memory, but don't see where my problems map to those resources. Until I do interesting things with interesting data on interesting hardware, I'm a coder, not a programmer.

I'm interested in improving, in coding less and programming more. Or, perhaps, interested in aspects of improvement but less interested in doing the work. There's a certain safety in knowing that you're doing what you're experienced with, not reaching out. Perhaps David St. Hubbins and Nigel Tufnel say it best in the last chorus: "you know where you stand in a hell hole".

I'm trying to take steps to move forward. Maybe find myself a new, more interesting hell hole to program in. 



I've been working on a tool. I discussed a lot of it yesterday.

I had a model to get the information based on PI, and I wanted to get to what I considered the interesting bit, so it was only after the performance went from suck to SUCK that I dove back, which is what I did yesterday. Starting with the project instead of the PI made the whole thing easier. But now I'm stuck.

The only differences between these queries are on lines 14, 19 and 20. Which gets to the problem. I know that I don't need half of what I get in lines 1-11, but when I pull stuff out, I now have two places to pull it.

I have a great 90-line majesty of a query that includes six left joins, and I append different endings depending on whether I want to get everything, or a segment defined by A or B or something. I could probably tighten it up so I have SELECT FROM, the different blocks, then AND AND ORDER BY. But there we're adding complexity, and we're butting Don't Repeat Yourself (DRY) against Keep It Simple, Stupid (KISS).

I'm happy to keep it as-is, as a pair of multi-line variables inside my program. I think I'd rather have the two like this than gin up a way to give me both, so KISS over DRY, in part because I cannot imagine a third way I'd want to access this data, so we hit You Ain't Gonna Need It (YAGNI).

But if there's strong reasons why I should make the change, rather than package it up and put it in cron, feel free to tell me.


This is only kinda like thinking in algorithms

I have a program. It has two parts: get the data and use the data.

"Get the data" involves several queries to the database to gather the data, then I munge it into the form I need. Specifically, it's about people who generate samples of DNA data (called "primary investigator" or PI for those not involved in research), a little about the samples themselves, and those that the data are shared with.

"Use the data" involves seeing how closely the reality of the file system ACLs is aligned with the ideal as expressed by the database.

I expected that I'd spend 5% of my time, at worst, in "get the data" and 95% of my time in "use the data". So much so, I found a way to parallelize that part so I could do it n projects at a time.

In reality, it's running 50-50.

It might have something to do with the lag I've added, trying to throw in debugging code. That might've made it worse.

It might have something to do with database access. For this, I think we take a step back.

We have several database tables, and while each one rarely changes, they might. So, instead of having queries all over the place, we write dump_that_table() or the like. That way, instead of digging all over the code base for SELECT * FROM that_table (which, in and of itself, is a bug waiting to happen) (also), you go to one function and get it from one place.

So, I have get_all_pi_ids() and get_pi(), which could not be pulled into a single function until I rewrote the DB handling code, which now allows me to make [ 1 : { id:i, name:'Aaron A. Aaronson", ... }, ... ] to put it in JSON terms. Right now, though, this means I make 1 + 475 database calls to get that list.

Then I get all that PI's share info. This is done in two forms: when a PI shares a project and when a PI shares everything. I start with get_own_projects() and get_other_pi_projects(), which get both cases (a project is owned by PI and a project is shared with PI). That makes it 1 + ( 3 * 475) database calls.

I think I'll stop now, because the amount of shame I feel is still (barely) surmountable, and I'm now trying to look at the solutions.

A solution is to start with the projects themselves. Many projects are on an old system and we cannot do this mess with, and there's a nice boolean where we can say AND project.is_old_system = 0 and just ignore them. Each project has an owner, and so, if we add PI to the query, we lose having to get it special. Come to think of it, if we make each PI share with herself, we say goodbye to special cases altogether.

I'm suspecting that we cannot meaningfully handle both the "share all" and the "share one" parts in one query. I'm beginning to want to add joins to MongoDB or something, which might just be possible, but my data is in MySQL. Anyway, if we get this down to 2 queries instead of the nearly 1500, that should fix a lot of the issues with DB access lag.

As, of course, will making sure the script keeps DB handles alive, which I think I did with my first interface but removed due to a forgotten bug.

So, the first step in fixing this mess is to make better "get this" interfaces, which will allow me to get it all with as few steps as possible.

(As an aside, I'll say I wish Blogger had a "code" button along with the "B" "I" and "U" buttons.)


Not Done, But Done For Now

I spent some more time on it, and I figured something out.

I looked at the data, and instead of getting 1 2 3 4 NULL NULL 5 6 7, I was getting 1 2 3 4 NULL NULL 7 1 2, starting at the beginning again. So, I figured out how to do loops and made a series of vectors, containing the dates in one, and load averages per VM.

Lev suggested that this is not how a real R person would do it. True. But this works, and I know how to plot vectors but not data tables. So, a few more changes (having the date in the title is good) and I can finish it up and put it into my workflow. Yay me.