4 Commits

View File

@@ -12,7 +12,7 @@ Unlike pairSEQ, which calculates p-values for every TCR alpha/beta overlap and c
against a null distribution, BiGpairSEQ does not do any statistical calculations against a null distribution, BiGpairSEQ does not do any statistical calculations
directly. directly.
BiGpairSEQ creates a [weightd bipartite graph](https://en.wikipedia.org/wiki/Bipartite_graph) representing the sample plate. BiGpairSEQ creates a [weighted bipartite graph](https://en.wikipedia.org/wiki/Bipartite_graph) representing the sample plate.
The distinct TCRA and TCRB sequences form the two sets of vertices. Every TCRA/TCRB pair that share a well The distinct TCRA and TCRB sequences form the two sets of vertices. Every TCRA/TCRB pair that share a well
are connected by an edge, with the edge weight set to the number of wells in which both sequences appear. are connected by an edge, with the edge weight set to the number of wells in which both sequences appear.
(Sequences present in *all* wells are filtered out prior to creating the graph, as there is no signal in their occupancy pattern.) (Sequences present in *all* wells are filtered out prior to creating the graph, as there is no signal in their occupancy pattern.)
@@ -131,11 +131,14 @@ Options when making a Sample Plate file:
* Standard deviation size * Standard deviation size
* Exponential * Exponential
* Lambda value * Lambda value
* *(Based on the slope of the graph in Figure 4C of the pairSEQ paper, the distribution of the original experiment was exponential with a lambda of approximately 0.6. (Howie, et al. 2015))* * *(Based on the slope of the graph in Figure 4C of the pairSEQ paper, the distribution of the original experiment was approximately exponential with a lambda ~0.6. (Howie, et al. 2015))*
* Total number of wells on the plate * Total number of wells on the plate
* Number of sections on plate * Well populations random or fixed
* Number of T cells per well * If random, minimum and maximum population sizes
* per section, if more than one section * If fixed
* Number of sections on plate
* Number of T cells per well
* per section, if more than one section
* Dropout rate * Dropout rate
Files are in CSV format. There are no header labels. Every row represents a well. Files are in CSV format. There are no header labels. Every row represents a well.
@@ -251,7 +254,7 @@ slightly less time than the simulation itself. Real elapsed time from start to f
## TODO ## TODO
* ~~Try invoking GC at end of workloads to reduce paging to disk~~ DONE * ~~Try invoking GC at end of workloads to reduce paging to disk~~ DONE
* Hold graph data in memory until another graph is read-in? ~~ABANDONED~~ ~~UNABANDONED~~ DONE * ~~Hold graph data in memory until another graph is read-in? ABANDONED UNABANDONED~~ DONE
* ~~*No, this won't work, because BiGpairSEQ simulations alter the underlying graph based on filtering constraints. Changes would cascade with multiple experiments.*~~ * ~~*No, this won't work, because BiGpairSEQ simulations alter the underlying graph based on filtering constraints. Changes would cascade with multiple experiments.*~~
* Might have figured out a way to do it, by taking edges out and then putting them back into the graph. This may actually be possible. * Might have figured out a way to do it, by taking edges out and then putting them back into the graph. This may actually be possible.
* It is possible, though the modifications to the graph incur their own performance penalties. Need testing to see which option is best. * It is possible, though the modifications to the graph incur their own performance penalties. Need testing to see which option is best.
@@ -260,7 +263,7 @@ slightly less time than the simulation itself. Real elapsed time from start to f
* ~~Apache Commons CSV library writes entries a row at a time~~ * ~~Apache Commons CSV library writes entries a row at a time~~
* _Got this working, but at the cost of a profoundly strange bug in graph occupancy filtering. Have reverted the repo until I can figure out what caused that. Given how easily Thingiverse transposes CSV matrices in R, might not even be worth fixing._ * _Got this working, but at the cost of a profoundly strange bug in graph occupancy filtering. Have reverted the repo until I can figure out what caused that. Given how easily Thingiverse transposes CSV matrices in R, might not even be worth fixing._
* Re-implement command line arguments, to enable scripting and statistical simulation studies * Re-implement command line arguments, to enable scripting and statistical simulation studies
* Implement sample plates with random numbers of T cells per well. * ~~Implement sample plates with random numbers of T cells per well.~~ DONE
* Possible BiGpairSEQ advantage over pairSEQ: BiGpairSEQ is resilient to variations in well population sizes on a sample plate; pairSEQ is not. * Possible BiGpairSEQ advantage over pairSEQ: BiGpairSEQ is resilient to variations in well population sizes on a sample plate; pairSEQ is not.
* preliminary data suggests that BiGpairSEQ behaves roughly as though the whole plate had whatever the *average* well concentration is, but that's still speculative. * preliminary data suggests that BiGpairSEQ behaves roughly as though the whole plate had whatever the *average* well concentration is, but that's still speculative.
* Enable GraphML output in addition to serialized object binaries, for data portability * Enable GraphML output in addition to serialized object binaries, for data portability