diff --git a/readme.md b/readme.md index 7d89ad8..6a442f8 100644 --- a/readme.md +++ b/readme.md @@ -267,6 +267,8 @@ using the (2021 corrected) formula from the original pairSEQ paper. (Howie, et a ## PERFORMANCE +(NOTE: These results are from an older, less efficient version of the simulator, and need to be updated.) + On a home computer with a Ryzen 5600X CPU, 64GB of 3200MHz DDR4 RAM (half of which was allocated to the Java Virtual Machine), and a PCIe 3.0 SSD, running Linux Mint 20.3 Edge (5.13 kernel), the author ran a BiGpairSEQ simulation of a 96-well sample plate with 30,000 T cells/well comprising ~11,800 alphas and betas, taken from a sample of 4,000,000 distinct cells with an exponential frequency distribution (lambda 0.6). @@ -353,10 +355,17 @@ roughly as though it had a constant well population equal to the plate's average * Advantage: would eliminate the need to use maps to associate vertices with sequences, which would make the code easier to understand. * This also seems to be faster when using the same algorithm than the version with lots of maps, which is a nice bonus! * Re-implement CDR1 matching method -* Implement simulation of read depth, and of read errors. Pre-filter graph for difference in read count to eliminate spurious sequences. +* ~~Implement simulation of read depth, and of read errors. Pre-filter graph for difference in read count to eliminate spurious sequences.~~ DONE + * Pre-filtering based on comparing (read depth) * (occupancy) to (read count) for each sequence works extremely well +* Add read depth simulation options to CLI +* Update performance data in this readme +* Refactor simulator code to collect all needed data in a single scan of the plate + * Currently it scans once for the vertices and then again for the edge weights. This made simulating read depth awkward, and incompatible with caching of plate files. + * This would be a fairly major rewrite of the simulator code, but could make things faster, and would definitely make them cleaner. * Implement Duan and Su's maximum weight matching algorithm * Add controllable algorithm-type parameter? * This would be fun and valuable, but probably take more time than I have for a hobby project. +* Implement an auction algorithm for maximum weight matching * Implement an algorithm for approximating a maximum weight matching * Some of these run in linear or near-linear time * given that the underlying biological samples have many, many sources of error, this would probably be the most useful option in practice. It seems less mathematically elegant, though, and so less fun for me.