Revert attempt to switch plate output format. It worked, but introduced a bug in graph filtering I don't want to chase down

2022-02-20 20:45:35 -06:00
parent 7558455f39
commit 63ef6aa7a0
5 changed files with 30 additions and 62 deletions
--- a/readme.md
+++ b/readme.md
@@ -94,7 +94,7 @@ Options when making a Cell Sample file:
 Files are in CSV format. Rows are distinct T cells, columns are sequences within the cells.
 Comments are preceded by `#`

-Structure example:
+Structure:

 ---
    # Sample contains 1 unique CDR1 for every 4 unique CDR3s.
@@ -136,20 +136,20 @@ Every column represents an individual cell, containing four sequences, represent
 Notice that the Alpha CDR1 is missing in the cell above, due to sequence dropout.
 Dropouts are represented by replacing sequences with the value `-1`. Comments are preceded by `#`

-Structure Example:
+Structure:

 ---
 ```
-# Cell source file name: 4MilCells.csv
-# Plate size: 96
-# Error rate: 0.1
-# Concentrations: 10000 5000 500
-# Lambda: 0.6
+# Cell source file name:
+# Each row represents one well on the plate
+# Plate size:
+# Concentrations:
+# Lambda: 
 ```
-| well 1 | well 2 | well 3| ... |
+| Well 1, cell 1 | Well 1, cell 2 | Well 1, cell 3| ... |
 |---|---|---|---|
-| [105383, 786528, 959247, 925928] | [525902, 791533, -1, 866282] | [409236, 132303, 804465, 942261]| ... |
-| [249930, 301502, 970003, 881099] | [523787, 552952, 997194, 970507]| [425363, 417411, 845399, -1]| ... |
+| **Well 2, cell 1** | **Well 2, cell 2** | **Well 2, cell 3**| ... |
+| **Well 3, cell 1** | **Well 3, cell 2** | **Well 3, cell 3**| ... |
 | ... | ... | ... | ... |

 ---
@@ -222,10 +222,9 @@ using the (2021 corrected) formula from the original pairSEQ paper. (Howie, et a

 ## TODO

-* ~~Try invoking GC at end of workloads to reduce paging to disk~~ DONE
+* Try invoking GC at end of workloads to reduce paging to disk
 * ~~Hold graph data in memory until another graph is read-in?~~
  * No, this won't work, because BiGpairSEQ simulations alter the underlying graph based on filtering constraints. Changes would cascade with multiple experiments.
-* ~~See if there's a reasonable way to reformat Sample Plate files so that wells are columns instead of rows~~ DONE
 * Enable GraphML output in addition to serialized object binaries, for data portability
  * Custom vertex type with attribute for sequence occupancy?
 * Re-implement CDR1 matching method
@@ -238,7 +237,10 @@ using the (2021 corrected) formula from the original pairSEQ paper. (Howie, et a
 * Implement sample plates with random numbers of T cells per well
  * Possible BiGpairSEQ advantage over pairSEQ: BiGpairSEQ is resilient to variations in well populations; pairSEQ is not.
    * preliminary data suggests that BiGpairSEQ behaves roughly as though the whole plate had whatever the *average* well concentration is, but that's still speculative.
-
+* See if there's a reasonable way to reformat Sample Plate files so that wells are columns instead of rows
+  * Problem is variable number of cells in a well
+  * Apache Commons CSV library writes entries a row at a time
+    * Can possibly sort the wells by length first, then construct entries 
  
 ## CITATIONS
 * Howie, B., Sherwood, A. M., et al. ["High-throughput pairing of T cell receptor alpha and beta sequences."](https://pubmed.ncbi.nlm.nih.gov/26290413/) Sci. Transl. Med. 7, 301ra131 (2015)