Remove questionable claim, reorder simulation experiments

2022-10-01 15:46:22 -05:00
parent e7e85a4542
commit 98ce708825
1 changed files with 27 additions and 23 deletions
--- a/readme.md
+++ b/readme.md
@@ -331,7 +331,6 @@ Options when making a Sample Plate file:
    * Standard deviation size 
  * Exponential
    * Lambda value
-      * *(Based on the slope of the graph in Figure 4C of the pairSEQ paper, the distribution of the original experiment was very roughly exponential with a lambda ~0.6. (Howie, et al. 2015) The actual distribution was certainly quite different.)*
 * Total number of wells on the plate
 * Well populations random or fixed
  * If random, minimum and maximum population sizes
@@ -474,28 +473,7 @@ Several BiGpairSEQ simulations were performed on a home computer with the follow
 * 2TB PCIe 3.0 SSD
 * Linux Mint 21 (5.15 kernel)

-### Simulation 1
-This simulation was an attempt to replicate the conditions of experiment 1 from the 2015 pairSEQ paper: a matching was found for a 
-96-well sample plate with 4,000 T cells/well comprising ~11,900 TCRAs and TCRBs, taken from a sample of 8,400,000 
-distinct cells with an exponential frequency distribution (lambda 0.6). The sequence dropout rate was 10%, as the analysis
-from the original paper concluded that most TCR sequences "have less than a 10% chance of going unobserved." (Howie, et al. 2015)
-
-The original paper does not contain (or the author of this document failed to identify) information on sequencing depth, 
-read error probability, or the probabilities of different kinds of read error collisions. As the pre-filtering of BiGpairSEQ
-has successfully filtered out all such errors for any reasonable error rates the author has yet tested, this simulation was
-done without any sequencing errors, to reduce the processing time.
-
-With min/max occupancy thresholds of 3 and 95 wells respectively for matching, BiGpairSEQ identified:
-* 8,495 correct pairings 
-* 5 incorrect pairings 
-
-for an overall pairing accuracy of 99.9992%.
-
-The total simulation time (excluding file I/O) was 28m52. The total elapsed time with file I/O was 41m23s. 
-Calculation of p-values was enabled for this simulation, increasing the overall processing time.
-
-
-## BEHAVIOR WITH RANDOMIZED WELL POPULATIONS (old results, need updating for new version of the simulator (though resilience to varying well populations is unchanged))
+### SAMPLE PLATES WITH VARYING NUMBERS OF CELLS PER WELL (old results, need updating for new version of the simulator (though resilience to varying well populations is unchanged))

 A series of BiGpairSEQ simulations were conducted using a cell sample file of 3.5 million unique T cells. From these cells,
 10 sample plate files were created. All of these sample plates had 96 wells, used an exponential distribution with a lambda of 0.6, and
@@ -540,6 +518,32 @@ The average results for the randomized plates are closest to the constant plate
 This and several other tests indicate that BiGpairSEQ treats a sample plate with a highly variable number of T cells/well
 roughly as though it had a constant well population equal to the plate's average well population.

+### EXPERIMENTS FROM THE 2015 pairSEQ PAPER
+#### Experiment 1
+This simulation was an attempt to replicate the conditions of experiment 1 from the 2015 pairSEQ paper: a matching was found for a 
+96-well sample plate with 4,000 T cells/well comprising ~11,900 TCRAs and TCRBs, taken from a sample of 8,400,000 
+distinct cells with an exponential frequency distribution (lambda 0.6). The sequence dropout rate was 10%, as the analysis
+from the original paper concluded that most TCR sequences "have less than a 10% chance of going unobserved." (Howie, et al. 2015)
+
+The original paper does not contain (or the author of this document failed to identify) information on sequencing depth, 
+read error probability, or the probabilities of different kinds of read error collisions. As the pre-filtering of BiGpairSEQ
+has successfully filtered out all such errors for any reasonable error rates the author has yet tested, this simulation was
+done without any sequencing errors, to reduce the processing time.
+
+With min/max occupancy thresholds of 3 and 95 wells respectively for matching, BiGpairSEQ identified:
+* 8,495 correct pairings 
+* 5 incorrect pairings 
+
+for an overall pairing accuracy of 99.9992%.
+
+The total simulation time (excluding file I/O) was 28m52. The total elapsed time with file I/O was 41m23s. 
+Calculation of p-values was enabled for this simulation, increasing the overall processing time.
+
+Note that the frequency distribution of T cell clones in this simulation is only roughly that of 
+
+#### Experiment 2
+
+
 ## TODO

 * ~~Try invoking GC at end of workloads to reduce paging to disk~~ DONE