Add performance section to readme

2022-02-20 23:31:25 -06:00
parent 601e141fd0
commit 94b54b3416
1 changed files with 21 additions and 5 deletions
--- a/readme.md
+++ b/readme.md
@@ -20,12 +20,12 @@ The problem of pairing TCRA/TCRB sequences thus reduces to the "assignment probl
 matching on a bipartite graph--the subset of vertex-disjoint edges whose weights sum to the maximum possible value.

 This is a well-studied combinatorial optimization problem, with many known solutions.
-The best currently-known algorithm for bipartite graphs with integer weights--which is what BiGpairSEQ uses--is 
-from Duan and Su (2012). For a graph with m edges, n vertices per side, and maximum integer edge weight N, 
-their algorithm runs in **O(m sqrt(n) log(N))** time. This is the best known efficiency for finding a maximum weight
-matching on a bipartite graph, and the integer edge weight requirement makes it ideal for BiGpairSEQ.
+The most efficient known algorithm for maximum weight matching is from Duan and Su (2012), and requires a bipartite graph
+with strictly integer edge weights. For a graph with m edges, n vertices per side, and maximum integer edge weight N, 
+their algorithm runs in **O(m sqrt(n) log(N))** time. As the graph representation of a pairSEQ experiment is 
+bipartite with integer weights, this algorithm is ideal for BiGpairSEQ.

-Unfortunately, it's a fairly new algorithm. It is not implemented by the graph theory library used in this simulator.
+Unfortunately, it's a fairly new algorithm, and not yet implemented by the graph theory library used in this simulator.
 So this program instead uses the Fibonacci heap-based algorithm of Fredman and Tarjan (1987), which has a worst-case
 runtime of **O(n (n log(n) + m))**. The algorithm is implemented as described in Melhorn and Näher (1999).

@@ -218,6 +218,22 @@ Example output:
 P-values are calculated *after* BiGpairSEQ matching is completed, for purposes of comparison, 
 using the (2021 corrected) formula from the original pairSEQ paper. (Howie, et al. 2015)

+### PERFORMANCE
+Performance details of the example excerpted above:
+
+On a home computer with a Ryzen 5600X CPU, 64GB of 3200MHz DDR4 RAM, and a PCIe 3.0 SSD, running Linux Mint 20.3 Edge (5.13 kernel), 
+the author ran a BiGpairSEQ simulation of a 96-well sample plate with 30,000 T cells/well comprising ~11,800 alphas and betas,
+taken from a sample of 4,000,000 distinct cells with an exponential frequency distribution.
+
+With min/max occupancy threshold of 3 and 94 wells for matching, and no other pre-filtering, BiGpairSEQ identified 5,151 
+correct pairings and 18 incorrect pairings, for an accuracy of 99.652%.
+
+The simulation time was 14'22". If intermediate results were held in memory, this would be equivalent to the total elapsed time.
+
+Since this implementation of BiGpairSEQ writes intermediate results to improve the efficiency of *repeated* simulations,
+the actual elapsed time was greater. File I/O time was not measured, but took slightly less time than the simulation itself.
+Real elapsed time from start to finish was under 30 minutes.
+
 ## TODO

 * ~~Try invoking GC at end of workloads to reduce paging to disk~~ DONE