/var/log/rant: Threads Unspooling, or "What's my problem NOW?"

2015/07/28

Threads Unspooling, or "What's my problem NOW?"

I have never done much with access control lists (or ACLs), as most of my time as a Linux and Unix user has been in positions where everything needed to control access could be done with standard unix permissions: owner/group/all and read/write/execute.

Also, most of the file systems were not set up to support them, which makes the barrier to entry enough that I never got beyond "I wonder how ACLs work".

I work with genomics data on the top university supercomputing cluster in the US, and we generate lots of data for lots of users, and we had been doing some ugly hacks to share data with our users, but with the new file system, we have ACLs, which makes it as easy as setfacl -R -m "u:username:r-x" /path/to/research.

ACLs are not actually my problem.

The length of time it takes to set ACLs on a large data set is my problem.

Running the tool to set everything takes five minutes. With a subset of our total data. Which is only going to get bigger. If we're talking about a daily "get everything back to proper shape", that's well within bounds. If it's something a user is supposed to run, then no.

So, I'm looking into threads, and I can set all my ACLs in parallel using Parallel::ForkManager, and while I'm not sure threads are the asynchronous solution for Modern Perl, they work and I can get a number of directories getting their ACLs recursively set at once.

Sometimes, however, because machines go down or NFS mounts get hosed or you kill a process just to watch it die, the setting process gets interrupted. Or, you do more work and generate more data, and that goes into a subdirectory. Then, the ACLs at the top of the directory tree may be correct, but the deep nodes will be wrong, and it's best to not wait until the end-of-the day "set everything" process to get your bits in order.

So you want to set a flag. If the flag is set, you do it all. And when I try to set flags in the threaded version, I get an error.

Threads are not actually my problem.

I have the threads in the database, which makes both the incomplete-pass and the add-new-data options equally easy to handle. And, to make databases easier to handle, I have a module I call oDB which handles database access so I don't have to worry about correctness or having passwords in plaintext in my code. It uses another module I wrote, called MyDB, to connect to MySQL in the first place. I share the gist above, but I cut to the chase below.

my $_dbh ;               # Save the handle.

sub db_connect {
    my ( $param_ptr, $attr_ptr ) = @_ ;
    my $port = '3306' ;

    # ...

    if ( defined $_dbh
        && ( !defined $param_ptr || $param_ptr eq '' ) ) {
        return $_dbh ;
        }

    # ...

    if ( defined $_dbh && $new_db_params eq $_db_params ) {
        return $_dbh ;
        }

    # ...

    $_dbh = DBI->connect( 
        $source, 
        $params{ user }, 
        $params{ password }, \%attr )
        or croak $DBI::errstr ;

    return $_dbh ;
    }    # End of db_connect

Essentially , the "right thing" in this case is to generate a new DB handle each and every time, and my code is doing everything in it's power to avoid creating a new DB handle.

My problem is that I didn't write this as thread-safe. Because doing so was the furthest thing from my mind.

My problem is a failure of imagination.

/var/log/rant

Cookie Notice

2015/07/28

Threads Unspooling, or "What's my problem NOW?"

No comments:

Post a Comment