From 3ba305abdb2324ef42fe4d87e29804c181b04e35 Mon Sep 17 00:00:00 2001 From: eugenefischer <66030419+eugenefischer@users.noreply.github.com> Date: Wed, 21 Sep 2022 13:30:30 -0500 Subject: [PATCH] Update ToDo --- readme.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/readme.md b/readme.md index 6347ba4..4908318 100644 --- a/readme.md +++ b/readme.md @@ -330,7 +330,6 @@ roughly as though it had a constant well population equal to the plate's average ## TODO -* Enable post-filtering instead of pre-filtering. Pre-filtering of things like singleton sequences or saturating-occupancy sequences reduces graph size, but could conceivably reduce pairing accuracy by throwing away data. While these sequences have very little signal, it would be interesting to compare unfiltered results to filtered results. This would require a much, much faster MWM algorithm, though, to handle the much larger graphs. Possible one of the linear-time approximation algorithms. * ~~Try invoking GC at end of workloads to reduce paging to disk~~ DONE * ~~Hold graph data in memory until another graph is read-in? ABANDONED UNABANDONED~~ DONE * ~~*No, this won't work, because BiGpairSEQ simulations alter the underlying graph based on filtering constraints. Changes would cascade with multiple experiments.*~~ @@ -356,9 +355,13 @@ roughly as though it had a constant well population equal to the plate's average * Implement Duan and Su's maximum weight matching algorithm * Add controllable algorithm-type parameter? * This would be fun and valuable, but probably take more time than I have for a hobby project. +* Implement an algorithm for approximating a maximum weight matching + * Some of these run in linear or near-linear time + * given that the underlying biological samples have many, many sources of error, this would probably be the most useful option in practice. It seems less mathematically elegant, though, and so less fun for me. * Implement Vose's alias method for arbitrary statistical distributions of cells * Should probably refactor to use apache commons rng for this * Use commons JCS for caching +* Enable post-filtering instead of pre-filtering. Pre-filtering of things like singleton sequences or saturating-occupancy sequences reduces graph size, but could conceivably reduce pairing accuracy by throwing away data. While these sequences have very little signal, it would be interesting to compare unfiltered results to filtered results. This would require a much, much faster MWM algorithm, though, to handle the much larger graphs. Possible one of the linear-time approximation algorithms. ## CITATIONS