2 Commits

Author SHA1 Message Date
eugenefischer
b7c86f20b3 Add read depth attributes to graphml output 2022-09-28 03:01:52 -05:00
eugenefischer
3a47efd361 Update TODO 2022-09-28 03:01:03 -05:00
2 changed files with 12 additions and 4 deletions

View File

@@ -200,6 +200,10 @@ then use it for multiple different BiGpairSEQ simulations.
Options for creating a Graph/Data file:
* The Cell Sample file to use
* The Sample Plate file to use. (This must have been generated from the selected Cell Sample file.)
* Whether to simulate sequence read depth. If simulated:
* The read depth (number of times each sequence is read)
* The read error rate (probability a sequence is misread)
* The error collision rate (probability two misreads produce the same spurious sequence)
These files do not have a human-readable structure, and are not portable to other programs.
@@ -265,9 +269,7 @@ P-values are calculated *after* BiGpairSEQ matching is completed, for purposes o
using the (2021 corrected) formula from the original pairSEQ paper. (Howie, et al. 2015)
## PERFORMANCE
(NOTE: These results are from an older, less efficient version of the simulator, and need to be updated.)
## PERFORMANCE (old results; need updating to reflect current, improved simulator performance)
On a home computer with a Ryzen 5600X CPU, 64GB of 3200MHz DDR4 RAM (half of which was allocated to the Java Virtual Machine), and a PCIe 3.0 SSD, running Linux Mint 20.3 Edge (5.13 kernel),
the author ran a BiGpairSEQ simulation of a 96-well sample plate with 30,000 T cells/well comprising ~11,800 alphas and betas,
@@ -357,8 +359,9 @@ roughly as though it had a constant well population equal to the plate's average
* ~~Implement simulation of read depth, and of read errors. Pre-filter graph for difference in read count to eliminate spurious sequences.~~ DONE
* Pre-filtering based on comparing (read depth) * (occupancy) to (read count) for each sequence works extremely well
* ~~Add read depth simulation options to CLI~~ DONE
* ~~Update graphml output to reflect current Vertex class attributes~~ DONE
* Individual well data from the SequenceRecords could be included, if there's ever a reason for it
* Update matching metadata output options in CLI
* Update graphml output to reflect current Vertex class attributes
* Update performance data in this readme
* Re-implement CDR1 matching method
* Refactor simulator code to collect all needed data in a single scan of the plate

View File

@@ -56,6 +56,9 @@ public class GraphMLFileWriter {
}
String wellPopulationsString = populationsStringBuilder.toString();
ga.put("well populations", DefaultAttribute.createAttribute(wellPopulationsString));
ga.put("read depth", DefaultAttribute.createAttribute(data.getReadDepth().toString()));
ga.put("read error rate", DefaultAttribute.createAttribute(data.getReadErrorRate().toString()));
ga.put("error collision rate", DefaultAttribute.createAttribute(data.getErrorCollisionRate().toString()));
return ga;
}
@@ -75,6 +78,7 @@ public class GraphMLFileWriter {
attributes.put("type", DefaultAttribute.createAttribute(v.getType().name()));
attributes.put("sequence", DefaultAttribute.createAttribute(v.getSequence()));
attributes.put("occupancy", DefaultAttribute.createAttribute(v.getOccupancy()));
attributes.put("read count", DefaultAttribute.createAttribute(v.getReadCount()));
return attributes;
});
//register the attributes
@@ -84,6 +88,7 @@ public class GraphMLFileWriter {
exporter.registerAttribute("type", AttributeCategory.NODE, AttributeType.STRING);
exporter.registerAttribute("sequence", AttributeCategory.NODE, AttributeType.STRING);
exporter.registerAttribute("occupancy", AttributeCategory.NODE, AttributeType.STRING);
exporter.registerAttribute("read count", AttributeCategory.NODE, AttributeType.STRING);
//export the graph
exporter.exportGraph(graph, writer);
} catch(IOException ex){