Cookie Notice

As far as I know, and as far as I remember, nothing in this page does anything with Cookies.


Logging, Plotting and Shoshin: A Developer's Journey

I heard about Log::Log4perl and decided that this would be a good thing to learn and to integrate into the lab's workflow.

We were having problems with our VMs and it was suggested I start logging performance metrics, so when we go to our support people, we have something more than "this sucks" and "everything's slow".

So, I had a reason and I had an excuse, so I started logging. But logs without graphs are boring. I mean, some logs can tell you "here, right here, is the line number where you are an idiot", but this log is just performance metrics, so if you don't get a graph, you're not seeing it.

That tells a story. That tells us that there was something goofy going on with genomics-test (worse, I can tell you, because we had nothing going on with genomics-test, because the software we want to test is not working yet. There was a kernel bug and a few other things that had fixed the other VMs, but not that one, so our admin took it down and started from scratch.

Look at that graph. Do you see the downtime? No?

That's the problem. This shows the last 100 records for each VM, but for hours where is no record, there should be -1 or a discontinuity or something.

I generally use R as a plotting library, because all the preformatting is something I know how to do in Perl, my language of choice, but I've been trying to do more and more in R, and I'm hitting the limits of my knowledge. My code, including commented-out bad trails, follows.

My thought was, hey, we have all the dates in the column final$datetime, and I can make a unique array of them. My next step would be to go through each entry for dates, and, if there was no genomicstest$datetime that equalled that date, I would throw in a null or a -1 or something. That's what the ifelse() stuff is all about.

But, I found, this removes the association between the datetime and the load average, and the plots I was getting were not the above with gaps, as it should be, but one where I'm still getting high loads today.

Clearly, I am looking at R as an experienced Perl programmer. You can write FORTRAN in any language, they say, but you cannot write Perl in R. The disjunct between a Perl coder codes Perl and an R coder codes R is significant. As a Perl person, I want to create a system that's repeatable and packaged, to get the data I know is there and put it into the form I want it to be. The lineage of Perl is from shell tools like sed and awk, but it has aspirations toward being a systems programming language.

R users are about the opposite, I think. R is usable as a scripting language but the general case is as an interactive environment. Like a data shell. You start munging and plotting data in order to discover what it tells you. In this case, I can tell you that I was expecting a general high (but not terribly high; we have had load averages into the hundreds from these VMs), and that the periodicity of the load came as a complete surprise to me.

(There are other other differences, of course. Perl thinks of everything as a scalar, meaning either a string or a number, or an array of scalars, or a special fast array of scalars. R thinks of everything as data structures and vectors and such. Things I need to integrate into my head, but not things I wish to blog on right now.)

The difference between making a tool to give you expected results and using a tool to identify unexpected aspects is the difference, I believe, between a computer programmer and a data scientist. And I'm finding the latter is more where I want to be.

So, I want to try to learn R as she is spoke, to allow myself to think in it's terms. Certainly there's a simple transform that gives me what I want, but I do not know it yet. As I do it, I will have to let go of my feelings of mastery of Perl and allow myself to become a beginner in R.

But seriously, if anyone has the answer, I'll take that, too.