diff --git a/readme.md b/readme.md index 42a91ce..31cb0d3 100644 --- a/readme.md +++ b/readme.md @@ -331,7 +331,6 @@ Options when making a Sample Plate file: * Standard deviation size * Exponential * Lambda value - * *(Based on the slope of the graph in Figure 4C of the pairSEQ paper, the distribution of the original experiment was very roughly exponential with a lambda ~0.6. (Howie, et al. 2015) The actual distribution was certainly quite different.)* * Total number of wells on the plate * Well populations random or fixed * If random, minimum and maximum population sizes @@ -474,28 +473,7 @@ Several BiGpairSEQ simulations were performed on a home computer with the follow * 2TB PCIe 3.0 SSD * Linux Mint 21 (5.15 kernel) -### Simulation 1 -This simulation was an attempt to replicate the conditions of experiment 1 from the 2015 pairSEQ paper: a matching was found for a -96-well sample plate with 4,000 T cells/well comprising ~11,900 TCRAs and TCRBs, taken from a sample of 8,400,000 -distinct cells with an exponential frequency distribution (lambda 0.6). The sequence dropout rate was 10%, as the analysis -from the original paper concluded that most TCR sequences "have less than a 10% chance of going unobserved." (Howie, et al. 2015) - -The original paper does not contain (or the author of this document failed to identify) information on sequencing depth, -read error probability, or the probabilities of different kinds of read error collisions. As the pre-filtering of BiGpairSEQ -has successfully filtered out all such errors for any reasonable error rates the author has yet tested, this simulation was -done without any sequencing errors, to reduce the processing time. - -With min/max occupancy thresholds of 3 and 95 wells respectively for matching, BiGpairSEQ identified: -* 8,495 correct pairings -* 5 incorrect pairings - -for an overall pairing accuracy of 99.9992%. - -The total simulation time (excluding file I/O) was 28m52. The total elapsed time with file I/O was 41m23s. -Calculation of p-values was enabled for this simulation, increasing the overall processing time. - - -## BEHAVIOR WITH RANDOMIZED WELL POPULATIONS (old results, need updating for new version of the simulator (though resilience to varying well populations is unchanged)) +### SAMPLE PLATES WITH VARYING NUMBERS OF CELLS PER WELL (old results, need updating for new version of the simulator (though resilience to varying well populations is unchanged)) A series of BiGpairSEQ simulations were conducted using a cell sample file of 3.5 million unique T cells. From these cells, 10 sample plate files were created. All of these sample plates had 96 wells, used an exponential distribution with a lambda of 0.6, and @@ -540,6 +518,32 @@ The average results for the randomized plates are closest to the constant plate This and several other tests indicate that BiGpairSEQ treats a sample plate with a highly variable number of T cells/well roughly as though it had a constant well population equal to the plate's average well population. +### EXPERIMENTS FROM THE 2015 pairSEQ PAPER +#### Experiment 1 +This simulation was an attempt to replicate the conditions of experiment 1 from the 2015 pairSEQ paper: a matching was found for a +96-well sample plate with 4,000 T cells/well comprising ~11,900 TCRAs and TCRBs, taken from a sample of 8,400,000 +distinct cells with an exponential frequency distribution (lambda 0.6). The sequence dropout rate was 10%, as the analysis +from the original paper concluded that most TCR sequences "have less than a 10% chance of going unobserved." (Howie, et al. 2015) + +The original paper does not contain (or the author of this document failed to identify) information on sequencing depth, +read error probability, or the probabilities of different kinds of read error collisions. As the pre-filtering of BiGpairSEQ +has successfully filtered out all such errors for any reasonable error rates the author has yet tested, this simulation was +done without any sequencing errors, to reduce the processing time. + +With min/max occupancy thresholds of 3 and 95 wells respectively for matching, BiGpairSEQ identified: +* 8,495 correct pairings +* 5 incorrect pairings + +for an overall pairing accuracy of 99.9992%. + +The total simulation time (excluding file I/O) was 28m52. The total elapsed time with file I/O was 41m23s. +Calculation of p-values was enabled for this simulation, increasing the overall processing time. + +Note that the frequency distribution of T cell clones in this simulation is only roughly that of + +#### Experiment 2 + + ## TODO * ~~Try invoking GC at end of workloads to reduce paging to disk~~ DONE