Add performance section to readme
This commit is contained in:
26
readme.md
26
readme.md
@@ -20,12 +20,12 @@ The problem of pairing TCRA/TCRB sequences thus reduces to the "assignment probl
|
||||
matching on a bipartite graph--the subset of vertex-disjoint edges whose weights sum to the maximum possible value.
|
||||
|
||||
This is a well-studied combinatorial optimization problem, with many known solutions.
|
||||
The best currently-known algorithm for bipartite graphs with integer weights--which is what BiGpairSEQ uses--is
|
||||
from Duan and Su (2012). For a graph with m edges, n vertices per side, and maximum integer edge weight N,
|
||||
their algorithm runs in **O(m sqrt(n) log(N))** time. This is the best known efficiency for finding a maximum weight
|
||||
matching on a bipartite graph, and the integer edge weight requirement makes it ideal for BiGpairSEQ.
|
||||
The most efficient known algorithm for maximum weight matching is from Duan and Su (2012), and requires a bipartite graph
|
||||
with strictly integer edge weights. For a graph with m edges, n vertices per side, and maximum integer edge weight N,
|
||||
their algorithm runs in **O(m sqrt(n) log(N))** time. As the graph representation of a pairSEQ experiment is
|
||||
bipartite with integer weights, this algorithm is ideal for BiGpairSEQ.
|
||||
|
||||
Unfortunately, it's a fairly new algorithm. It is not implemented by the graph theory library used in this simulator.
|
||||
Unfortunately, it's a fairly new algorithm, and not yet implemented by the graph theory library used in this simulator.
|
||||
So this program instead uses the Fibonacci heap-based algorithm of Fredman and Tarjan (1987), which has a worst-case
|
||||
runtime of **O(n (n log(n) + m))**. The algorithm is implemented as described in Melhorn and Näher (1999).
|
||||
|
||||
@@ -218,6 +218,22 @@ Example output:
|
||||
P-values are calculated *after* BiGpairSEQ matching is completed, for purposes of comparison,
|
||||
using the (2021 corrected) formula from the original pairSEQ paper. (Howie, et al. 2015)
|
||||
|
||||
### PERFORMANCE
|
||||
Performance details of the example excerpted above:
|
||||
|
||||
On a home computer with a Ryzen 5600X CPU, 64GB of 3200MHz DDR4 RAM, and a PCIe 3.0 SSD, running Linux Mint 20.3 Edge (5.13 kernel),
|
||||
the author ran a BiGpairSEQ simulation of a 96-well sample plate with 30,000 T cells/well comprising ~11,800 alphas and betas,
|
||||
taken from a sample of 4,000,000 distinct cells with an exponential frequency distribution.
|
||||
|
||||
With min/max occupancy threshold of 3 and 94 wells for matching, and no other pre-filtering, BiGpairSEQ identified 5,151
|
||||
correct pairings and 18 incorrect pairings, for an accuracy of 99.652%.
|
||||
|
||||
The simulation time was 14'22". If intermediate results were held in memory, this would be equivalent to the total elapsed time.
|
||||
|
||||
Since this implementation of BiGpairSEQ writes intermediate results to improve the efficiency of *repeated* simulations,
|
||||
the actual elapsed time was greater. File I/O time was not measured, but took slightly less time than the simulation itself.
|
||||
Real elapsed time from start to finish was under 30 minutes.
|
||||
|
||||
## TODO
|
||||
|
||||
* ~~Try invoking GC at end of workloads to reduce paging to disk~~ DONE
|
||||
|
||||
Reference in New Issue
Block a user