Cookie Notice

As far as I know, and as far as I remember, nothing in this page does anything with Cookies.

2012/02/08

Talking Through A Problem

Current task is as follows: We have a series of barcodes. Barcodes are genetic markers that are put at the beginnings of samples to mark them as being of that sample, so you can do the same thing with several different samples and know what's what. My analogy is the different colored shirts schools give kids when going to the zoo or something, so the children from Turing Elementary don't get mixed up with those from Watson Elementary.

Specifically, we're looking at words of six letters, each being A, C, G or T. A and C are m while G and T are k. We're looking for groups of two, three and four where any of the six spaces are not exclusively m or k. From a corpus of 19 barcodes of a possible 48, using the sequencers available, we have developed a set of 247,528 triplets of triplets where:

  • the barcodes within each triplet do not conflict on the m and k thing
  • a barcode used within triplet 1 is not used in triplet 2 or triplet 3, and so on 
Now, we want to cut that list down to where, if, for example, samples A and B from triplet 1 and sample C from triplet 3 fail, can those samples with those barcodes be used together? My coworker believes that it should be possible to find groups where all 9 can be used interchangably, but his code so far fails to select them.

I look at that and think that subdividing them into groups that work together and giving up the ones with the longest list is probably the way to go. 

Which is what I'm now about to implement.