Remove questionable claim, reorder simulation experiments

This commit is contained in:
eugenefischer
2022-10-01 15:46:22 -05:00
parent e7e85a4542
commit 98ce708825

View File

@@ -331,7 +331,6 @@ Options when making a Sample Plate file:
* Standard deviation size
* Exponential
* Lambda value
* *(Based on the slope of the graph in Figure 4C of the pairSEQ paper, the distribution of the original experiment was very roughly exponential with a lambda ~0.6. (Howie, et al. 2015) The actual distribution was certainly quite different.)*
* Total number of wells on the plate
* Well populations random or fixed
* If random, minimum and maximum population sizes
@@ -474,28 +473,7 @@ Several BiGpairSEQ simulations were performed on a home computer with the follow
* 2TB PCIe 3.0 SSD
* Linux Mint 21 (5.15 kernel)
### Simulation 1
This simulation was an attempt to replicate the conditions of experiment 1 from the 2015 pairSEQ paper: a matching was found for a
96-well sample plate with 4,000 T cells/well comprising ~11,900 TCRAs and TCRBs, taken from a sample of 8,400,000
distinct cells with an exponential frequency distribution (lambda 0.6). The sequence dropout rate was 10%, as the analysis
from the original paper concluded that most TCR sequences "have less than a 10% chance of going unobserved." (Howie, et al. 2015)
The original paper does not contain (or the author of this document failed to identify) information on sequencing depth,
read error probability, or the probabilities of different kinds of read error collisions. As the pre-filtering of BiGpairSEQ
has successfully filtered out all such errors for any reasonable error rates the author has yet tested, this simulation was
done without any sequencing errors, to reduce the processing time.
With min/max occupancy thresholds of 3 and 95 wells respectively for matching, BiGpairSEQ identified:
* 8,495 correct pairings
* 5 incorrect pairings
for an overall pairing accuracy of 99.9992%.
The total simulation time (excluding file I/O) was 28m52. The total elapsed time with file I/O was 41m23s.
Calculation of p-values was enabled for this simulation, increasing the overall processing time.
## BEHAVIOR WITH RANDOMIZED WELL POPULATIONS (old results, need updating for new version of the simulator (though resilience to varying well populations is unchanged))
### SAMPLE PLATES WITH VARYING NUMBERS OF CELLS PER WELL (old results, need updating for new version of the simulator (though resilience to varying well populations is unchanged))
A series of BiGpairSEQ simulations were conducted using a cell sample file of 3.5 million unique T cells. From these cells,
10 sample plate files were created. All of these sample plates had 96 wells, used an exponential distribution with a lambda of 0.6, and
@@ -540,6 +518,32 @@ The average results for the randomized plates are closest to the constant plate
This and several other tests indicate that BiGpairSEQ treats a sample plate with a highly variable number of T cells/well
roughly as though it had a constant well population equal to the plate's average well population.
### EXPERIMENTS FROM THE 2015 pairSEQ PAPER
#### Experiment 1
This simulation was an attempt to replicate the conditions of experiment 1 from the 2015 pairSEQ paper: a matching was found for a
96-well sample plate with 4,000 T cells/well comprising ~11,900 TCRAs and TCRBs, taken from a sample of 8,400,000
distinct cells with an exponential frequency distribution (lambda 0.6). The sequence dropout rate was 10%, as the analysis
from the original paper concluded that most TCR sequences "have less than a 10% chance of going unobserved." (Howie, et al. 2015)
The original paper does not contain (or the author of this document failed to identify) information on sequencing depth,
read error probability, or the probabilities of different kinds of read error collisions. As the pre-filtering of BiGpairSEQ
has successfully filtered out all such errors for any reasonable error rates the author has yet tested, this simulation was
done without any sequencing errors, to reduce the processing time.
With min/max occupancy thresholds of 3 and 95 wells respectively for matching, BiGpairSEQ identified:
* 8,495 correct pairings
* 5 incorrect pairings
for an overall pairing accuracy of 99.9992%.
The total simulation time (excluding file I/O) was 28m52. The total elapsed time with file I/O was 41m23s.
Calculation of p-values was enabled for this simulation, increasing the overall processing time.
Note that the frequency distribution of T cell clones in this simulation is only roughly that of
#### Experiment 2
## TODO
* ~~Try invoking GC at end of workloads to reduce paging to disk~~ DONE