Remove artifacts from tracking to stop repo bloat.

update readme
2022-02-22 16:14:50 -06:00 · 2022-02-22 16:10:22 -06:00 · 2022-02-22 16:09:50 -06:00 · 2022-02-21 11:19:54 -06:00 · 2022-02-21 06:53:30 +00:00 · 2022-02-21 00:29:34 -06:00
9 changed files with 128 additions and 144 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+/out/
--- a/out/artifacts/BiGpairSEQ_Sim_jar/BiGpairSEQ_Sim.jar
+++ b/out/artifacts/BiGpairSEQ_Sim_jar/BiGpairSEQ_Sim.jar
--- a/readme.md
+++ b/readme.md
@@ -4,7 +4,7 @@
 ## ABOUT

 This program simulates BiGpairSEQ (Bipartite Graph pairSEQ), a graph theory-based adaptation
-of the pairSEQ algorithm (Howie et al. 2015) for pairing T cell receptor sequences.
+of the pairSEQ algorithm (Howie, et al. 2015) for pairing T cell receptor sequences.

 ## THEORY

@@ -15,19 +15,18 @@ directly.
 BiGpairSEQ creates a [simple bipartite weighted graph](https://en.wikipedia.org/wiki/Bipartite_graph) representing the sample plate.
 The distinct TCRA and TCRB sequences form the two sets of vertices. Every TCRA/TCRB pair that share a well
 are connected by an edge, with the edge weight set to the number of wells in which both sequences appear.
-(Sequences in all wells are filtered out prior to creating the graph, as there is no signal in their occupancy pattern.)
+(Sequences present in *all* wells are filtered out prior to creating the graph, as there is no signal in their occupancy pattern.)
 The problem of pairing TCRA/TCRB sequences thus reduces to the "assignment problem" of finding a maximum weight
 matching on a bipartite graph--the subset of vertex-disjoint edges whose weights sum to the maximum possible value.

 This is a well-studied combinatorial optimization problem, with many known solutions.
-The best currently-known algorithm for bipartite graphs with integer weights--which is what BiGpairSEQ uses--
+The most efficient algorithm known to the author for maximum weight matching of a bipartite graph with strictly integral weights
 is from Duan and Su (2012). For a graph with m edges, n vertices per side, and maximum integer edge weight N, 
-their algorithm runs in **O(m sqrt(n) log(N))** time. This is the best known efficiency for finding a maximum weight
-matching on a bipartite graph, and the integer edge weight requirement makes it ideal for BiGpairSEQ.
+their algorithm runs in **O(m sqrt(n) log(N))** time. As the graph representation of a pairSEQ experiment is 
+bipartite with integer weights, this algorithm is ideal for BiGpairSEQ.

-Unfortunately, it's a fairly new algorithm, and the integer edge weight requirement makes it less generically useful.
-It is not implemented by the graph theory library used in this simulator. So this program
-instead uses the Fibonacci heap-based algorithm of Fredman and Tarjan (1987), which has a worst-case
+Unfortunately, it's a fairly new algorithm, and not yet implemented by the graph theory library used in this simulator.
+So this program instead uses the Fibonacci heap-based algorithm of Fredman and Tarjan (1987), which has a worst-case
 runtime of **O(n (n log(n) + m))**. The algorithm is implemented as described in Melhorn and Näher (1999).

 The current version of the program uses a pairing heap instead of a Fibonacci heap for its priority queue,
@@ -56,7 +55,7 @@ main menu looks like this:

 ```
 --------BiGPairSEQ SIMULATOR--------
-ALPHA/BETA T-CELL RECEPTOR MATCHING
+ALPHA/BETA T CELL RECEPTOR MATCHING
  USING WEIGHTED BIPARTITE GRAPHS
 ------------------------------------
 Please select an option:
@@ -81,9 +80,9 @@ writing files, the program will automatically add the correct extension to any f

 #### Cell Sample Files
 Cell Sample files consist of any number of distinct "T cells." Every cell contains 
-four sequences: Alpha CDR3, Beta CDR, Alpha CDR1, Beta CDR1. The sequences are represented by
-random integers. CDR3 Alpha and Beta sequences are all unique. CDR1 Alpha and Beta sequences
-are not necessarily unique; the relative diversity can be set when making a Cell Sample file.
+four sequences: Alpha CDR3, Beta CDR3, Alpha CDR1, Beta CDR1. The sequences are represented by
+random integers. CDR3 Alpha and Beta sequences are all unique within a given Cell Sample file. CDR1 Alpha and Beta sequences
+are not necessarily unique; the relative diversity can be set when making the file.

 (Note: though cells still have CDR1 sequences, matching of CDR1s is currently awaiting re-implementation.)

@@ -94,7 +93,7 @@ Options when making a Cell Sample file:
 Files are in CSV format. Rows are distinct T cells, columns are sequences within the cells.
 Comments are preceded by `#`

-Structure example:
+Structure:

 ---
    # Sample contains 1 unique CDR1 for every 4 unique CDR3s.
@@ -108,9 +107,9 @@ Structure example:
 Sample Plate files consist of any number of "wells" containing any number of T cells (as 
 described above). The wells are filled randomly from a Cell Sample file, according to a selected
 frequency distribution. Additionally, every individual sequence within each cell may, with some
-given dropout probability, be omitted from the file. This simulates the effect of amplification errors
-prior to sequencing. Plates can also be partitioned into any number of (approximately) evenly-sized
-sections, each of which can have a different number of T cells per well.
+given dropout probability, be omitted from the file; this simulates the effect of amplification errors
+prior to sequencing. Plates can also be partitioned into any number of sections, each of which can have a 
+different concentration of T cells per well.

 Options when making a Sample Plate file:
 * Cell Sample file to use
@@ -120,7 +119,7 @@ Options when making a Sample Plate file:
    * Standard deviation size 
  * Exponential
    * Lambda value
-      * Based on the slope of the graph in Figure 4C of the pairSEQ paper, the distribution of the original experiment was exponential with a lambda of approximately 0.6. (Howie et al. 2015)
+      * (Based on the slope of the graph in Figure 4C of the pairSEQ paper, the distribution of the original experiment was exponential with a lambda of approximately 0.6. (Howie, et al. 2015))
 * Total number of wells on the plate
 * Number of sections on plate
 * Number of T cells per well
@@ -128,42 +127,42 @@ Options when making a Sample Plate file:
 * Dropout rate

 Files are in CSV format. There are no header labels. Every row represents a well. 
-Every column represents an individual cell, containing four sequences, represented by an array string:
+Every column represents an individual cell, containing four sequences, depicted as an array string:
 `[CDR3A, CDR3B, CDR1A, CDR1B]`. So a representative cell might look like this: 

 `[525902, 791533, -1, 866282]`

-Notice that the Alpha CDR1 is missing in the cell above, due to sequence dropout.
-Dropouts are represented by replacing sequences with the value `-1`. Comments are preceded by `#`
+Notice that the CDR1 Alpha is missing in the cell above--sequence dropout from simulated amplification error.
+Dropout sequences are replaced with the value `-1`. Comments are preceded by `#`

-Structure Example:
+Structure:

 ---
 ```
-# Cell source file name: 4MilCells.csv
-# Plate size: 96
-# Error rate: 0.1
-# Concentrations: 10000 5000 500
-# Lambda: 0.6
+# Cell source file name:
+# Each row represents one well on the plate
+# Plate size:
+# Concentrations:
+# Lambda -or- StdDev: 
 ```
-| well 1 | well 2 | well 3| ... |
+| Well 1, cell 1 | Well 1, cell 2 | Well 1, cell 3| ... |
 |---|---|---|---|
-| [105383, 786528, 959247, 925928] | [525902, 791533, -1, 866282] | [409236, 132303, 804465, 942261]| ... |
-| [249930, 301502, 970003, 881099] | [523787, 552952, 997194, 970507]| [425363, 417411, 845399, -1]| ... |
-| ... | ... | ... | ... |
+| **Well 2, cell 1** | **Well 2, cell 2** | **Well 2, cell 3**| **...** |
+| **Well 3, cell 1** | **Well 3, cell 2** | **Well 3, cell 3**| **...** |
+| **...** | **...** | **...** | **...** |

 ---

 #### Graph and Data Files
-Graph and Data files are serialized binaries of a Java object containing the graph representation of a
-Sample Plate and necessary metadata for matching and results output. Making them requires a Cell Sample file (to construct a list of correct sequence pairs for checking
-the accuracy of BiGpairSEQ simulations) and a Sample Plate file (to construct the associated
-occupancy graph). These files can be several gigabytes in size. Writing them to a file lets us generate a graph and 
-its metadata once, then use it for multiple different BiGpairSEQ simulations.
+Graph and Data files are serialized binaries of a Java object containing the weigthed bipartite graph representation of a
+Sample Plate, along with the necessary metadata for matching and results output. Making them requires a Cell Sample file 
+(to construct a list of correct sequence pairs for checking the accuracy of BiGpairSEQ simulations) and a 
+Sample Plate file (to construct the associated occupancy graph). These files can be several gigabytes in size.
+Writing them to a file lets us generate a graph and its metadata once, then use it for multiple different BiGpairSEQ simulations.

 Options for creating a Graph and Data file:
 * The Cell Sample file to use
-* The Sample Plate file (generated from the given Cell Sample file) to use.
+* The Sample Plate file to use. (This must have been generated from the selected Cell Sample file.)

 These files do not have a human-readable structure, and are not portable to other programs. (Export of graphs in a 
 portable data format may be implemented in the future. The tricky part is encoding the necessary metadata.)
@@ -181,10 +180,9 @@ Options when running a BiGpairSEQ simulation of CDR3 alpha/beta matching:
 * The maximum number of alpha/beta overlap wells to attempt to match
  * (must be <= the number of wells on the plate - 1)
 * The maximum difference in alpha/beta occupancy to attempt to match
-  * (To skip using this filter, enter a value >= the number of wells on the plate)
-* The minimum percentage of a sequence's occupied wells shared by another sequence to attempt to match
-  * given value from 0 to 100
-  * (To skip using this filter, enter 0)
+  * (Optional. To skip using this filter, enter a value >= the number of wells on the plate)
+* The minimum overlap percentage--the percentage of a sequence's occupied wells shared by another sequence--to attempt to match. Given as value in range 0 - 100.
+  * (Optional. To skip using this filter, enter 0)

 Example output:

@@ -217,34 +215,54 @@ Example output:
 ---

 **NOTE: The p-values in the output are not used for matching**—they aren't part of the BiGpairSEQ algorithm at all. 
-P-values are calculated *after* BiGpairSEQ matching is completed, for purposes of comparison, 
+P-values are calculated *after* BiGpairSEQ matching is completed, for purposes of comparison only, 
 using the (2021 corrected) formula from the original pairSEQ paper. (Howie, et al. 2015)

+### PERFORMANCE
+Performance details of the example excerpted above:
+
+On a home computer with a Ryzen 5600X CPU, 64GB of 3200MHz DDR4 RAM (half of which was allocated to the Java Virtual Machine), and a PCIe 3.0 SSD, running Linux Mint 20.3 Edge (5.13 kernel), 
+the author ran a BiGpairSEQ simulation of a 96-well sample plate with 30,000 T cells/well comprising ~11,800 alphas and betas,
+taken from a sample of 4,000,000 distinct cells with an exponential frequency distribution.
+
+With min/max occupancy threshold of 3 and 94 wells for matching, and no other pre-filtering, BiGpairSEQ identified 5,151 
+correct pairings and 18 incorrect pairings, for an accuracy of 99.652%.
+
+The simulation time was 14'22". If intermediate results were held in memory, this would be equivalent to the total elapsed time.
+
+Since this implementation of BiGpairSEQ writes intermediate results to disk (to improve the efficiency of *repeated* simulations
+with different filtering options), the actual elapsed time was greater. File I/O time was not measured, but took 
+slightly less time than the simulation itself. Real elapsed time from start to finish was under 30 minutes.
+
 ## TODO

 * ~~Try invoking GC at end of workloads to reduce paging to disk~~ DONE
-* ~~Hold graph data in memory until another graph is read-in?~~
-  * No, this won't work, because BiGpairSEQ simulations alter the underlying graph based on filtering constraints. Changes would cascade with multiple experiments.
-* ~~See if there's a reasonable way to reformat Sample Plate files so that wells are columns instead of rows~~ DONE
+* ~~Hold graph data in memory until another graph is read-in?~~ ABANDONED
+  * *No, this won't work, because BiGpairSEQ simulations alter the underlying graph based on filtering constraints. Changes would cascade with multiple experiments.*
+* See if there's a reasonable way to reformat Sample Plate files so that wells are columns instead of rows. 
+  * ~~Problem is variable number of cells in a well~~
+  * ~~Apache Commons CSV library writes entries a row at a time~~ 
+    * _Got this working, but at the cost of a profoundly strange bug in graph occupancy filtering. Have reverted the repo until I can figure out what caused that. Given how easily Thingiverse transposes CSV matrices in R, might not even be worth fixing._
+* Re-implement command line arguments, to enable scripting and statistical simulation studies
+* Implement sample plates with random numbers of T cells per well.
+  * Possible BiGpairSEQ advantage over pairSEQ: BiGpairSEQ is resilient to variations in well population sizes on a sample plate; pairSEQ is not.
+    * preliminary data suggests that BiGpairSEQ behaves roughly as though the whole plate had whatever the *average* well concentration is, but that's still speculative.
 * Enable GraphML output in addition to serialized object binaries, for data portability
  * Custom vertex type with attribute for sequence occupancy?
 * Re-implement CDR1 matching method
-* Re-implement command line arguments, to enable scripting and statistical simulation studies
-* Implement Duan and Su's maximum weight matching algorithms
+* Implement Duan and Su's maximum weight matching algorithm
  * Add controllable algorithm-type parameter?
-* Test whether pairing heap (currently used) or Fibonacci heap is more efficient for current matching algorithm
+* Test whether pairing heap (currently used) or Fibonacci heap is more efficient for priority queue in current matching algorithm
  * in theory Fibonacci heap should be more efficient, but complexity overhead may eliminate theoretical advantage
  * Add controllable heap-type parameter?
-* Implement sample plates with random numbers of T cells per well
-  * Possible BiGpairSEQ advantage over pairSEQ: BiGpairSEQ is resilient to variations in well populations; pairSEQ is not.
-    * preliminary data suggests that BiGpairSEQ behaves roughly as though the whole plate had whatever the *average* well concentration is, but that's still speculative.
+

  
 ## CITATIONS
 * Howie, B., Sherwood, A. M., et al. ["High-throughput pairing of T cell receptor alpha and beta sequences."](https://pubmed.ncbi.nlm.nih.gov/26290413/) Sci. Transl. Med. 7, 301ra131 (2015)
 * Duan, R., Su H. ["A Scaling Algorithm for Maximum Weight Matching in Bipartite Graphs."](https://web.eecs.umich.edu/~pettie/matching/Duan-Su-scaling-bipartite-matching.pdf) Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, p. 1413-1424. (2012)
-* K. Melhorn, St. Näher. [The LEDA Platform of Combinatorial and Geometric Computing.](https://people.mpi-inf.mpg.de/~mehlhorn/LEDAbook.html) Cambridge University Press. Chapter 7, Graph Algorithms; p. 132-162 (1999)
-* M. Fredman, R. Tarjan. ["Fibonacci heaps and their uses in improved network optimization algorithms."](https://www.cl.cam.ac.uk/teaching/1011/AlgorithII/1987-FredmanTar-fibonacci.pdf) J. ACM, 34(3):596–615 (1987))
+* Melhorn, K., Näher, St. [The LEDA Platform of Combinatorial and Geometric Computing.](https://people.mpi-inf.mpg.de/~mehlhorn/LEDAbook.html) Cambridge University Press. Chapter 7, Graph Algorithms; p. 132-162 (1999)
+* Fredman, M., Tarjan, R. ["Fibonacci heaps and their uses in improved network optimization algorithms."](https://www.cl.cam.ac.uk/teaching/1011/AlgorithII/1987-FredmanTar-fibonacci.pdf) J. ACM, 34(3):596–615 (1987))

 ## EXTERNAL LIBRARIES USED
 * [JGraphT](https://jgrapht.org) -- Graph theory data structures and algorithms
@@ -257,4 +275,4 @@ BiGpairSEQ was conceived in collaboration with Dr. Alice MacQueen, who brought t
 pairSEQ paper to the author's attention and explained all the biology terms he didn't know.

 ## AUTHOR
-Eugene Fischer, 2021. UI improvements and documentation, 2022.
+BiGpairSEQ algorithm and simulation by Eugene Fischer, 2021. UI improvements and documentation, 2022.
--- a/src/main/java/Equations.java
+++ b/src/main/java/Equations.java
@@ -8,6 +8,9 @@ public abstract class Equations {
        return (int) ((Math.random() * (max - min)) + min);
    }

+    //pValue calculation as described in original pairSEQ paper.
+    //Included for comparison with original results.
+    //Not used by BiGpairSEQ for matching.
    public static double pValue(Integer w, Integer w_a, Integer w_b, double w_ab_d) {
        int w_ab = (int) w_ab_d;
        double pv = 0.0;
@@ -18,6 +21,9 @@ public abstract class Equations {
        return pv;
    }

+    //Implementation of the (corrected) probability equation from pairSEQ paper.
+    //Included for comparison with original results.
+    //Not used by BiGpairSEQ for matching.
    private static double probPairedByChance(Integer w, Integer w_a, Integer w_b, Integer w_ab){
        BigInteger numer1 = choose(w, w_ab);
        BigInteger numer2 = choose(w - w_ab, w_a - w_ab);
@@ -30,10 +36,9 @@ public abstract class Equations {
        return prob.doubleValue();
    }

-    /*
-     * This works because nC(k+1) = nCk * (n-k)/(k+1)
-     * Since nC0 = 1, can start there and generate all the rest.
-     */
+
+     //This works because nC(k+1) = nCk * (n-k)/(k+1)
+     //Since nC0 = 1, can start there and generate all the rest.
     public static BigInteger choose(final int N, final int K) {
         BigInteger nCk = BigInteger.ONE;
         for (int k = 0; k < K; k++) {
--- a/src/main/java/Plate.java
+++ b/src/main/java/Plate.java
@@ -68,7 +68,7 @@ public class Plate {
                    }
                    Integer[] cellToAdd = cells.get(n).clone();
                    for(int k = 0; k < cellToAdd.length; k++){
-                        if(Math.abs(rand.nextDouble()) < error){//error applied to each peptide
+                        if(Math.abs(rand.nextDouble()) < error){//error applied to each seqeunce
                            cellToAdd[k] = -1;
                        }
                    }
@@ -98,7 +98,7 @@ public class Plate {
                    n = (int) Math.floor(m);
                    Integer[] cellToAdd = cells.get(n).clone();
                    for(int k = 0; k < cellToAdd.length; k++){
-                        if(Math.abs(rand.nextDouble()) < error){//error applied to each peptide
+                        if(Math.abs(rand.nextDouble()) < error){//error applied to each sequence
                            cellToAdd[k] = -1;
                        }
                    }
@@ -134,30 +134,30 @@ public class Plate {
        return wells;
    }

-    //returns a map of the counts of the peptide at cell index pIndex, in all wells
-    public Map<Integer, Integer> assayWellsPeptideP(int... pIndices){
-        return this.assayWellsPeptideP(0, size, pIndices);
+    //returns a map of the counts of the sequence at cell index sIndex, in all wells
+    public Map<Integer, Integer> assayWellsSequenceS(int... sIndices){
+        return this.assayWellsSequenceS(0, size, sIndices);
    }

-    //returns a map of the counts of the peptide at cell index pIndex, in a specific well
-    public Map<Integer, Integer> assayWellsPeptideP(int n, int... pIndices) { return this.assayWellsPeptideP(n, n+1, pIndices);}
+    //returns a map of the counts of the sequence at cell index sIndex, in a specific well
+    public Map<Integer, Integer> assayWellsSequenceS(int n, int... sIndices) { return this.assayWellsSequenceS(n, n+1, sIndices);}

-    //returns a map of the counts of the peptide at cell index pIndex, in a range of wells
-    public Map<Integer, Integer> assayWellsPeptideP(int start, int end, int... pIndices) {
+    //returns a map of the counts of the sequence at cell index sIndex, in a range of wells
+    public Map<Integer, Integer> assayWellsSequenceS(int start, int end, int... sIndices) {
        Map<Integer,Integer> assay = new HashMap<>();
-        for(int pIndex: pIndices){
+        for(int pIndex: sIndices){
            for(int i = start; i < end; i++){
-                countPeptides(assay, wells.get(i), pIndex);
+                countSequences(assay, wells.get(i), pIndex);
            }
        }
        return assay;
    }
-    //For the peptides at cell indices pIndices, counts number of unique peptides in the given well into the given map
-    private void countPeptides(Map<Integer, Integer> wellMap, List<Integer[]> well, int... pIndices) {
+    //For the sequences at cell indices sIndices, counts number of unique sequences in the given well into the given map
+    private void countSequences(Map<Integer, Integer> wellMap, List<Integer[]> well, int... sIndices) {
        for(Integer[] cell : well) {
-            for(int pIndex: pIndices){
-                if(cell[pIndex] != -1){
-                    wellMap.merge(cell[pIndex], 1, (oldValue, newValue) -> oldValue + newValue);
+            for(int sIndex: sIndices){
+                if(cell[sIndex] != -1){
+                    wellMap.merge(cell[sIndex], 1, (oldValue, newValue) -> oldValue + newValue);
                }
            }
        }
--- a/src/main/java/PlateFileReader.java
+++ b/src/main/java/PlateFileReader.java
@@ -31,54 +31,23 @@ public class PlateFileReader {
            BufferedReader reader = Files.newBufferedReader(Path.of(filename));
            CSVParser parser = new CSVParser(reader, plateFileFormat);
        ){
-            //old code for wells as rows
-//            for(CSVRecord record: parser.getRecords()) {
-//                List<Integer[]> well = new ArrayList<>();
-//                for(String s: record) {
-//                    if(!"".equals(s)) {
-//                        String[] intString = s.replaceAll("\\[", "")
-//                                .replaceAll("]", "")
-//                                .replaceAll(" ", "")
-//                                .split(",");
-//                        //System.out.println(intString);
-//                        Integer[] arr = new Integer[intString.length];
-//                        for (int i = 0; i < intString.length; i++) {
-//                            arr[i] = Integer.valueOf(intString[i]);
-//                        }
-//                        well.add(arr);
-//                    }
-//                }
-//                wells.add(well);
            for(CSVRecord record: parser.getRecords()) {
-                if (wells.size() == 0) {
-                    int num = 0;
-                    for (String s: record) {
-                        num++;
-                    }
-                    for (int i = 0; i < num; i++) {
-                        wells.add(new ArrayList<>());
-                    }
-                } else {
-                    int i = 0;
-                    for (String s : record) {
-                        if (!"".equals(s)) { //if value isn't the empty string
-                            //get rid of brackets, split at commas into a string array
-                            String[] intsAsStrings = s.replaceAll("\\[", "")
+                List<Integer[]> well = new ArrayList<>();
+                for(String s: record) {
+                    if(!"".equals(s)) {
+                        String[] intString = s.replaceAll("\\[", "")
                                .replaceAll("]", "")
                                .replaceAll(" ", "")
                                .split(",");
-                            //Make Integer array with the same values
-                            Integer[] arr = new Integer[intsAsStrings.length];
-                            for (int j = 0; j < intsAsStrings.length; j++) {
-                                arr[j] = Integer.valueOf(intsAsStrings[j]);
+                        //System.out.println(intString);
+                        Integer[] arr = new Integer[intString.length];
+                        for (int i = 0; i < intString.length; i++) {
+                            arr[i] = Integer.valueOf(intString[i]);
                        }
-                            //Add Integer array to the correct well
-                            wells.get(i).add(arr);
-                            i++;
+                        well.add(arr);
                    }
                }
-
-                }
+                wells.add(well);
            }
        } catch(IOException ex){
            System.out.println("plate file " + filename + " not found.");
--- a/src/main/java/PlateFileWriter.java
+++ b/src/main/java/PlateFileWriter.java
@@ -59,7 +59,7 @@ public class PlateFileWriter {
            }
        }

-        //this took forever, and I don't use it, because it makes reading data in a huge pain
+        //this took forever
        List<List<String>> rows = new ArrayList<>();
        List<String> tmp = new ArrayList<>();
        for(int i = 0; i < wellsAsStrings.size(); i++){//List<Integer[]> w: wells){
@@ -89,6 +89,7 @@ public class PlateFileWriter {
            CSVPrinter printer = new CSVPrinter(writer, plateFileFormat);
        ){
            printer.printComment("Cell source file name: " + sourceFileName);
+            printer.printComment("Each row represents one well on the plate.");
            printer.printComment("Plate size: " + size);
            printer.printComment("Error rate: " + error);
            printer.printComment("Concentrations: " + concenString);
@@ -98,7 +99,7 @@ public class PlateFileWriter {
            else {
                printer.printComment("Std. dev.: " + stdDev);
            }
-            printer.printRecords(rows);
+            printer.printRecords(wellsAsStrings);
        } catch(IOException ex){
            System.out.println("Could not make new file named "+filename);
            System.err.println(ex);
--- a/src/main/java/Simulator.java
+++ b/src/main/java/Simulator.java
@@ -57,8 +57,8 @@ public class Simulator {
        if(verbose){System.out.println("Cell maps made");}

        if(verbose){System.out.println("Making well maps");}
-        Map<Integer, Integer> allAlphas = samplePlate.assayWellsPeptideP(alphaIndex);
-        Map<Integer, Integer> allBetas = samplePlate.assayWellsPeptideP(betaIndex);
+        Map<Integer, Integer> allAlphas = samplePlate.assayWellsSequenceS(alphaIndex);
+        Map<Integer, Integer> allBetas = samplePlate.assayWellsSequenceS(betaIndex);
        int alphaCount = allAlphas.size();
        if(verbose){System.out.println("All alphas count: " + alphaCount);}
        int betaCount = allBetas.size();
@@ -296,8 +296,8 @@ public class Simulator {
        System.out.println("Cell maps made");

        System.out.println("Making well maps");
-        Map<Integer, Integer> allCDR3s = samplePlate.assayWellsPeptideP(cdr3Indices);
-        Map<Integer, Integer> allCDR1s = samplePlate.assayWellsPeptideP(cdr1Indices);
+        Map<Integer, Integer> allCDR3s = samplePlate.assayWellsSequenceS(cdr3Indices);
+        Map<Integer, Integer> allCDR1s = samplePlate.assayWellsSequenceS(cdr1Indices);
        int CDR3Count = allCDR3s.size();
        System.out.println("all CDR3s count: " + CDR3Count);
        int CDR1Count = allCDR1s.size();
@@ -591,26 +591,26 @@ public class Simulator {
                                                    Map<Integer, Integer> rowSequenceCounts,
                                                    Map<Integer,Integer> columnSequenceCounts,
                                                    double[][] weights){
-        Map<Integer, Integer> wellNRowPeptides = null;
-        Map<Integer, Integer> wellNColumnPeptides = null;
+        Map<Integer, Integer> wellNRowSequences = null;
+        Map<Integer, Integer> wellNColumnSequences = null;
        int vertexStartValue = rowSequenceToVertexMap.size();
        int numWells = samplePlate.getSize();
        for (int n = 0; n < numWells; n++) {
-            wellNRowPeptides = samplePlate.assayWellsPeptideP(n, rowSequenceIndices);
-            for (Integer a : wellNRowPeptides.keySet()) {
+            wellNRowSequences = samplePlate.assayWellsSequenceS(n, rowSequenceIndices);
+            for (Integer a : wellNRowSequences.keySet()) {
                if(allRowSequences.containsKey(a)){
                    rowSequenceCounts.merge(a, 1, (oldValue, newValue) -> oldValue + newValue);
                }
            }
-            wellNColumnPeptides = samplePlate.assayWellsPeptideP(n, colSequenceIndices);
-            for (Integer b : wellNColumnPeptides.keySet()) {
+            wellNColumnSequences = samplePlate.assayWellsSequenceS(n, colSequenceIndices);
+            for (Integer b : wellNColumnSequences.keySet()) {
                if(allColumnSequences.containsKey(b)){
                    columnSequenceCounts.merge(b, 1, (oldValue, newValue) -> oldValue + newValue);
                }
            }
-            for (Integer i : wellNRowPeptides.keySet()) {
+            for (Integer i : wellNRowSequences.keySet()) {
                if(allRowSequences.containsKey(i)){
-                    for (Integer j : wellNColumnPeptides.keySet()) {
+                    for (Integer j : wellNColumnSequences.keySet()) {
                        if(allColumnSequences.containsKey(j)){
                            weights[rowSequenceToVertexMap.get(i)][columnSequenceToVertexMap.get(j) - vertexStartValue] += 1.0;
                        }
--- a/src/main/java/UserInterface.java
+++ b/src/main/java/UserInterface.java
@@ -258,7 +258,7 @@ public class UserInterface {
            while (!quit) {
                System.out.println();
                System.out.println("--------BiGPairSEQ SIMULATOR--------");
-                System.out.println("ALPHA/BETA T-CELL RECEPTOR MATCHING");
+                System.out.println("ALPHA/BETA T CELL RECEPTOR MATCHING");
                System.out.println("  USING WEIGHTED BIPARTITE GRAPHS  ");
                System.out.println("------------------------------------");
                System.out.println("Please select an option:");
@@ -681,18 +681,8 @@ public class UserInterface {
        System.out.println("This program simulates BiGpairSEQ, a graph theory based adaptation");
        System.out.println("of the pairSEQ algorithm for pairing T cell receptor sequences.");
        System.out.println();
-        System.out.println("Unlike pairSEQ, which calculates p-values for every TCR alpha/beta overlap and compares");
-        System.out.println("against a null distribution, BiGpairSEQ does not do any statistical calculations");
-        System.out.println("directly. Instead, BiGpairSEQ creates a simple bipartite weighted graph representing");
-        System.out.println("the sample plate. The distinct TCRA and TCRB sequences form the two sets of vertices.");
-        System.out.println("Every TCRA/TCRB pair that share a well are connected by an edge, with the edge weight");
-        System.out.println("set to the number of wells in which both sequences appear. (Sequences in all wells are");
-        System.out.println("filtered out prior to creating the graph, as there is no signal in their occupancy");
-        System.out.println("pattern.) The problem of pairing TCRA/TCRB sequences thus reduces to the \"assignment");
-        System.out.println("problem\" of finding a maximum weight matching on a bipartite graph--the subset of");
-        System.out.println("vertex-disjoint edges whose weights sum to the maximum possible value.");
-        System.out.println();
-        System.out.println("For full documentation, see: https://gitea.ejsf.synology.me/efischer/BiGpairSEQ");
+        System.out.println("For full documentation, view readme.md file distributed with this code");
+        System.out.println("or visit https://gitea.ejsf.synology.me/efischer/BiGpairSEQ.");
        System.out.println();
        System.out.println("pairSEQ citation:");
        System.out.println("Howie, B., Sherwood, A. M., et. al.");
Author	SHA1	Message	Date
efischer	4b053e6ec4	Remove artifacts from tracking to stop repo bloat.	2022-02-22 16:14:50 -06:00
efischer	44784b7976	Remove artifacts from tracking to stop repo bloat.	2022-02-22 16:10:22 -06:00
efischer	7c19896dc9	update readme	2022-02-22 16:09:50 -06:00
efischer	aec7e3016f	Typos in documentation	2022-02-21 11:19:54 -06:00
efischer	5c75c1ac09	Update readme.md	2022-02-21 06:53:30 +00:00
efischer	cb1f7adece	Change "peptide" references in code to "sequence", adding comments	2022-02-21 00:29:34 -06:00
efischer	370de79546	Add performance section to readme	2022-02-21 00:02:49 -06:00
efischer	a803336f56	Add performance section to readme	2022-02-21 00:01:20 -06:00
efischer	94b54b3416	Add performance section to readme	2022-02-20 23:31:25 -06:00
efischer	601e141fd0	Update readme	2022-02-20 22:51:49 -06:00
efischer	8f9c6b7d33	Update readme TODO	2022-02-20 20:59:05 -06:00
efischer	e5ddc73723	Finish reverting back to wells-as-rows	2022-02-20 20:54:44 -06:00
efischer	9b18fac74f	Invoke garbage collection	2022-02-20 20:47:12 -06:00
efischer	63ef6aa7a0	Revert attempt to switch plate output format. It worked, but introduced a bug in graph filtering I don't want to chase down	2022-02-20 20:45:35 -06:00