Update TODO

This commit is contained in:
eugenefischer
2022-09-28 03:01:03 -05:00
parent 58bb04c431
commit 3a47efd361

View File

@@ -200,6 +200,10 @@ then use it for multiple different BiGpairSEQ simulations.
Options for creating a Graph/Data file: Options for creating a Graph/Data file:
* The Cell Sample file to use * The Cell Sample file to use
* The Sample Plate file to use. (This must have been generated from the selected Cell Sample file.) * The Sample Plate file to use. (This must have been generated from the selected Cell Sample file.)
* Whether to simulate sequence read depth. If simulated:
* The read depth (number of times each sequence is read)
* The read error rate (probability a sequence is misread)
* The error collision rate (probability two misreads produce the same spurious sequence)
These files do not have a human-readable structure, and are not portable to other programs. These files do not have a human-readable structure, and are not portable to other programs.
@@ -265,9 +269,7 @@ P-values are calculated *after* BiGpairSEQ matching is completed, for purposes o
using the (2021 corrected) formula from the original pairSEQ paper. (Howie, et al. 2015) using the (2021 corrected) formula from the original pairSEQ paper. (Howie, et al. 2015)
## PERFORMANCE ## PERFORMANCE (old results; need updating to reflect current, improved simulator performance)
(NOTE: These results are from an older, less efficient version of the simulator, and need to be updated.)
On a home computer with a Ryzen 5600X CPU, 64GB of 3200MHz DDR4 RAM (half of which was allocated to the Java Virtual Machine), and a PCIe 3.0 SSD, running Linux Mint 20.3 Edge (5.13 kernel), On a home computer with a Ryzen 5600X CPU, 64GB of 3200MHz DDR4 RAM (half of which was allocated to the Java Virtual Machine), and a PCIe 3.0 SSD, running Linux Mint 20.3 Edge (5.13 kernel),
the author ran a BiGpairSEQ simulation of a 96-well sample plate with 30,000 T cells/well comprising ~11,800 alphas and betas, the author ran a BiGpairSEQ simulation of a 96-well sample plate with 30,000 T cells/well comprising ~11,800 alphas and betas,
@@ -357,8 +359,9 @@ roughly as though it had a constant well population equal to the plate's average
* ~~Implement simulation of read depth, and of read errors. Pre-filter graph for difference in read count to eliminate spurious sequences.~~ DONE * ~~Implement simulation of read depth, and of read errors. Pre-filter graph for difference in read count to eliminate spurious sequences.~~ DONE
* Pre-filtering based on comparing (read depth) * (occupancy) to (read count) for each sequence works extremely well * Pre-filtering based on comparing (read depth) * (occupancy) to (read count) for each sequence works extremely well
* ~~Add read depth simulation options to CLI~~ DONE * ~~Add read depth simulation options to CLI~~ DONE
* ~~Update graphml output to reflect current Vertex class attributes~~ DONE
* Individual well data from the SequenceRecords could be included, if there's ever a reason for it
* Update matching metadata output options in CLI * Update matching metadata output options in CLI
* Update graphml output to reflect current Vertex class attributes
* Update performance data in this readme * Update performance data in this readme
* Re-implement CDR1 matching method * Re-implement CDR1 matching method
* Refactor simulator code to collect all needed data in a single scan of the plate * Refactor simulator code to collect all needed data in a single scan of the plate