From df047267ee8c40c63dcf64857842dd1f5d550e01 Mon Sep 17 00:00:00 2001 From: efischer Date: Wed, 2 Mar 2022 22:54:17 -0600 Subject: [PATCH] Add data on randomized well population behavior --- readme.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/readme.md b/readme.md index ece0552..894ea2d 100644 --- a/readme.md +++ b/readme.md @@ -264,6 +264,7 @@ Example output: P-values are calculated *after* BiGpairSEQ matching is completed, for purposes of comparison only, using the (2021 corrected) formula from the original pairSEQ paper. (Howie, et al. 2015) + ## PERFORMANCE On a home computer with a Ryzen 5600X CPU, 64GB of 3200MHz DDR4 RAM (half of which was allocated to the Java Virtual Machine), and a PCIe 3.0 SSD, running Linux Mint 20.3 Edge (5.13 kernel), @@ -279,6 +280,9 @@ Since this implementation of BiGpairSEQ writes intermediate results to disk (to with different filtering options), the actual elapsed time was greater. File I/O time was not measured, but took slightly less time than the simulation itself. Real elapsed time from start to finish was under 30 minutes. +As mentioned in the theory section, performance could be improved by implementing a more efficient algorithm for finding +the maximum weighted matching. + ## BEHAVIOR WITH RANDOMIZED WELL POPULATIONS A series of BiGpairSEQ simulations were conducted using a cell sample file of 3.5 million unique T cells. From these cells, @@ -294,6 +298,7 @@ The well populations of the plates were: * Five sample plates with each individual well's population randomized, from 1000 to 5000 T cells. (Average population ~3000 T cells/well.) All BiGpairSEQ simulations were run with a low overlap threshold of 3 and a high overlap threshold of 94. +No optional filters were used, so pairing was attempted for all sequences with overlaps within the threshold values. Constant well population plate results: