From 3a47efd361aa377d66e65544d8f755e320b81d9f Mon Sep 17 00:00:00 2001 From: eugenefischer <66030419+eugenefischer@users.noreply.github.com> Date: Wed, 28 Sep 2022 03:01:03 -0500 Subject: [PATCH] Update TODO --- readme.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/readme.md b/readme.md index bbc6f90..4d4f616 100644 --- a/readme.md +++ b/readme.md @@ -200,6 +200,10 @@ then use it for multiple different BiGpairSEQ simulations. Options for creating a Graph/Data file: * The Cell Sample file to use * The Sample Plate file to use. (This must have been generated from the selected Cell Sample file.) +* Whether to simulate sequence read depth. If simulated: + * The read depth (number of times each sequence is read) + * The read error rate (probability a sequence is misread) + * The error collision rate (probability two misreads produce the same spurious sequence) These files do not have a human-readable structure, and are not portable to other programs. @@ -265,9 +269,7 @@ P-values are calculated *after* BiGpairSEQ matching is completed, for purposes o using the (2021 corrected) formula from the original pairSEQ paper. (Howie, et al. 2015) -## PERFORMANCE - -(NOTE: These results are from an older, less efficient version of the simulator, and need to be updated.) +## PERFORMANCE (old results; need updating to reflect current, improved simulator performance) On a home computer with a Ryzen 5600X CPU, 64GB of 3200MHz DDR4 RAM (half of which was allocated to the Java Virtual Machine), and a PCIe 3.0 SSD, running Linux Mint 20.3 Edge (5.13 kernel), the author ran a BiGpairSEQ simulation of a 96-well sample plate with 30,000 T cells/well comprising ~11,800 alphas and betas, @@ -357,8 +359,9 @@ roughly as though it had a constant well population equal to the plate's average * ~~Implement simulation of read depth, and of read errors. Pre-filter graph for difference in read count to eliminate spurious sequences.~~ DONE * Pre-filtering based on comparing (read depth) * (occupancy) to (read count) for each sequence works extremely well * ~~Add read depth simulation options to CLI~~ DONE +* ~~Update graphml output to reflect current Vertex class attributes~~ DONE + * Individual well data from the SequenceRecords could be included, if there's ever a reason for it * Update matching metadata output options in CLI -* Update graphml output to reflect current Vertex class attributes * Update performance data in this readme * Re-implement CDR1 matching method * Refactor simulator code to collect all needed data in a single scan of the plate