We had a program that would go and get all the data off said machine in a serial manner. It looked pretty much like this:
my $ftp = Net::FTP->new(login stuff) ; $ftp->binary() ; my @list = $ftp->ls() ; my @files ; for ( @list ) { if( ! $downloaded{$_} ) { $ftp->get($_) ; push @files , $_ ; } } for my $file ( @files ) { open my $fh , '<' , $file ; while (<$fh>) { chomp ; my @line = split m{\t} , $_ ; dump_into_database( $line[0] , $line[3] , $line[27] , $line[491] ) ; } $downloaded{ $file }++ ; }Yes, it has been substantially simplified.
There are two problems with this program. The original, not the simplified version shown here. First, there's no clear separation between sections, which makes figuring out what each specific part is doing awful confusing. Second, it was modified and hacked upon several times by a clueless idiot.
(waves hands.)
My thought is to have everything in subroutines, each small enough to fit in my head and screen-height window. And I am doing it now.
sub handle_ftp { my $ftp = Net::FTP->new(login stuff) ; $ftp->binary() ; my @list = $ftp->ls() ; my @files ; for ( @list ) { download( $ftp , $_ ) ; } } sub download { my ( $ftp , $file ) = @_ ; if ( !$downloaded{ $file } { $ftp->get( $file ) ; split( $file ) ; $downloaded{ $file } ++ ; } } sub split { my ( $ftp , $file ) = @_ ; open my $fh , '<' , $file ; while (<$fh>) { chomp ; my @line = split m{\t} , $_ ; dump_into_database( $line[0] , $line[3] , $line[27] , $line[491] ) ; } }My concern is that, previously, we're going through the FTP session as fast as possible, while in the second case, the FTP session is going until the last split is done. Here, where my car is parked further away from the instrument machine than the server room is, and it's all reasonable small text files, so run time shouldn't bee too complex. There' actually three variations on
dump_into_database
, including one that generates an image in R.I think I like the second better because it seems more recursive, even through it really isn't.
Without dumping the whole thing here, does anyone see a problem with my approach?
No comments:
Post a Comment