Add read depth attributes to graphml output

Update TODO
2022-09-28 03:01:52 -05:00 · 2022-09-28 03:01:03 -05:00
2 changed files with 12 additions and 4 deletions
--- a/readme.md
+++ b/readme.md
@@ -200,6 +200,10 @@ then use it for multiple different BiGpairSEQ simulations.
 Options for creating a Graph/Data file:
 * The Cell Sample file to use
 * The Sample Plate file to use. (This must have been generated from the selected Cell Sample file.)
+* Whether to simulate sequence read depth. If simulated:
+  * The read depth (number of times each sequence is read)
+  * The read error rate (probability a sequence is misread)
+  * The error collision rate (probability two misreads produce the same spurious sequence)

 These files do not have a human-readable structure, and are not portable to other programs.

@@ -265,9 +269,7 @@ P-values are calculated *after* BiGpairSEQ matching is completed, for purposes o
 using the (2021 corrected) formula from the original pairSEQ paper. (Howie, et al. 2015)


-## PERFORMANCE
-
-(NOTE: These results are from an older, less efficient version of the simulator, and need to be updated.)
+## PERFORMANCE (old results; need updating to reflect current, improved simulator performance)

 On a home computer with a Ryzen 5600X CPU, 64GB of 3200MHz DDR4 RAM (half of which was allocated to the Java Virtual Machine), and a PCIe 3.0 SSD, running Linux Mint 20.3 Edge (5.13 kernel), 
 the author ran a BiGpairSEQ simulation of a 96-well sample plate with 30,000 T cells/well comprising ~11,800 alphas and betas,
@@ -357,8 +359,9 @@ roughly as though it had a constant well population equal to the plate's average
 * ~~Implement simulation of read depth, and of read errors. Pre-filter graph for difference in read count to eliminate spurious sequences.~~ DONE
  * Pre-filtering based on comparing (read depth) * (occupancy) to (read count) for each sequence works extremely well
 * ~~Add read depth simulation options to CLI~~ DONE
+* ~~Update graphml output to reflect current Vertex class attributes~~ DONE
+  * Individual well data from the SequenceRecords could be included, if there's ever a reason for it
 * Update matching metadata output options in CLI
-* Update graphml output to reflect current Vertex class attributes
 * Update performance data in this readme
 * Re-implement CDR1 matching method
 * Refactor simulator code to collect all needed data in a single scan of the plate
--- a/src/main/java/GraphMLFileWriter.java
+++ b/src/main/java/GraphMLFileWriter.java
@@ -56,6 +56,9 @@ public class GraphMLFileWriter {
        }
        String wellPopulationsString = populationsStringBuilder.toString();
        ga.put("well populations", DefaultAttribute.createAttribute(wellPopulationsString));
+        ga.put("read depth", DefaultAttribute.createAttribute(data.getReadDepth().toString()));
+        ga.put("read error rate", DefaultAttribute.createAttribute(data.getReadErrorRate().toString()));
+        ga.put("error collision rate", DefaultAttribute.createAttribute(data.getErrorCollisionRate().toString()));
        return ga;
    }

@@ -75,6 +78,7 @@ public class GraphMLFileWriter {
                attributes.put("type", DefaultAttribute.createAttribute(v.getType().name()));
                attributes.put("sequence", DefaultAttribute.createAttribute(v.getSequence()));
                attributes.put("occupancy", DefaultAttribute.createAttribute(v.getOccupancy()));
+                attributes.put("read count", DefaultAttribute.createAttribute(v.getReadCount()));
                return attributes;
            });
            //register the attributes
@@ -84,6 +88,7 @@ public class GraphMLFileWriter {
            exporter.registerAttribute("type", AttributeCategory.NODE, AttributeType.STRING);
            exporter.registerAttribute("sequence", AttributeCategory.NODE, AttributeType.STRING);
            exporter.registerAttribute("occupancy", AttributeCategory.NODE, AttributeType.STRING);
+            exporter.registerAttribute("read count", AttributeCategory.NODE, AttributeType.STRING);
            //export the graph
            exporter.exportGraph(graph, writer);
        } catch(IOException ex){
Author	SHA1	Message	Date
eugenefischer	b7c86f20b3	Add read depth attributes to graphml output	2022-09-28 03:01:52 -05:00
eugenefischer	3a47efd361	Update TODO	2022-09-28 03:01:03 -05:00