From 582dc3ef4086516723e1372805952a2ac116c8ba Mon Sep 17 00:00:00 2001 From: efischer Date: Wed, 2 Mar 2022 12:39:40 -0600 Subject: [PATCH] Update readme --- readme.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/readme.md b/readme.md index 2c07a33..adc07ae 100644 --- a/readme.md +++ b/readme.md @@ -264,17 +264,16 @@ Example output: P-values are calculated *after* BiGpairSEQ matching is completed, for purposes of comparison only, using the (2021 corrected) formula from the original pairSEQ paper. (Howie, et al. 2015) -### PERFORMANCE -Performance details of the example excerpted above: +## PERFORMANCE On a home computer with a Ryzen 5600X CPU, 64GB of 3200MHz DDR4 RAM (half of which was allocated to the Java Virtual Machine), and a PCIe 3.0 SSD, running Linux Mint 20.3 Edge (5.13 kernel), the author ran a BiGpairSEQ simulation of a 96-well sample plate with 30,000 T cells/well comprising ~11,800 alphas and betas, -taken from a sample of 4,000,000 distinct cells with an exponential frequency distribution. +taken from a sample of 4,000,000 distinct cells with an exponential frequency distribution (lambda 0.6). With min/max occupancy threshold of 3 and 94 wells for matching, and no other pre-filtering, BiGpairSEQ identified 5,151 correct pairings and 18 incorrect pairings, for an accuracy of 99.652%. -The simulation time was 14'22". If intermediate results were held in memory, this would be equivalent to the total elapsed time. +The total simulation time was 14'22". If intermediate results were held in memory, this would be equivalent to the total elapsed time. Since this implementation of BiGpairSEQ writes intermediate results to disk (to improve the efficiency of *repeated* simulations with different filtering options), the actual elapsed time was greater. File I/O time was not measured, but took @@ -286,7 +285,7 @@ slightly less time than the simulation itself. Real elapsed time from start to f * ~~Hold graph data in memory until another graph is read-in? ABANDONED UNABANDONED~~ DONE * ~~*No, this won't work, because BiGpairSEQ simulations alter the underlying graph based on filtering constraints. Changes would cascade with multiple experiments.*~~ * Might have figured out a way to do it, by taking edges out and then putting them back into the graph. This may actually be possible. - * It is possible, though the modifications to the graph incur their own performance penalties. Need testing to see which option is best. + * It is possible, though the modifications to the graph incur their own performance penalties. Need testing to see which option is best. It may be computer-specific. * ~~Test whether pairing heap (currently used) or Fibonacci heap is more efficient for priority queue in current matching algorithm~~ DONE * ~~in theory Fibonacci heap should be more efficient, but complexity overhead may eliminate theoretical advantage~~ * ~~Add controllable heap-type parameter?~~ @@ -300,6 +299,7 @@ slightly less time than the simulation itself. Real elapsed time from start to f * _Got this working, but at the cost of a profoundly strange bug in graph occupancy filtering. Have reverted the repo until I can figure out what caused that. Given how easily Thingiverse transposes CSV matrices in R, might not even be worth fixing. * ~~Enable GraphML output in addition to serialized object binaries, for data portability~~ DONE * ~~Custom vertex type with attribute for sequence occupancy?~~ ABANDONED + * Advantage: would eliminate the need to use maps to associate vertices with sequences, which would make the code easier to understand. * Have a branch where this is implemented, but there's a bug that broke matching. Don't currently have time to fix. * ~~Re-implement command line arguments, to enable scripting and statistical simulation studies~~ DONE * Re-implement CDR1 matching method @@ -319,7 +319,7 @@ slightly less time than the simulation itself. Real elapsed time from start to f * [JGraphT](https://jgrapht.org) -- Graph theory data structures and algorithms * [JHeaps](https://www.jheaps.org) -- For pairing heap priority queue used in maximum weight matching algorithm * [Apache Commons CSV](https://commons.apache.org/proper/commons-csv/) -- For CSV file output -* [Apache Commons CLI](https://commons.apache.org/proper/commons-cli/) -- To enable command line arguments for scripting. (**Awaiting re-implementation**.) +* [Apache Commons CLI](https://commons.apache.org/proper/commons-cli/) -- To enable command line arguments for scripting. ## ACKNOWLEDGEMENTS BiGpairSEQ was conceived in collaboration with Dr. Alice MacQueen, who brought the original