13 Commits
v1.0 ... v1.1

13 changed files with 988 additions and 758 deletions

View File

@@ -37,6 +37,8 @@ in practice.
### RUNNING THE PROGRAM
[Download the current version of BiGpairSEQ_Sim.](https://gitea.ejsf.synology.me/efischer/BiGpairSEQ/releases)
BiGpairSEQ_Sim is an executable .jar file. Requires Java 11 or higher. [OpenJDK 17](https://jdk.java.net/17/)
recommended.
@@ -157,8 +159,10 @@ Structure:
Graph and Data files are serialized binaries of a Java object containing the weigthed bipartite graph representation of a
Sample Plate, along with the necessary metadata for matching and results output. Making them requires a Cell Sample file
(to construct a list of correct sequence pairs for checking the accuracy of BiGpairSEQ simulations) and a
Sample Plate file (to construct the associated occupancy graph). These files can be several gigabytes in size.
Writing them to a file lets us generate a graph and its metadata once, then use it for multiple different BiGpairSEQ simulations.
Sample Plate file (to construct the associated occupancy graph).
These files can be several gigabytes in size. Writing them to a file lets us generate a graph and its metadata once,
then use it for multiple different BiGpairSEQ simulations.
Options for creating a Graph and Data file:
* The Cell Sample file to use
@@ -170,7 +174,11 @@ portable data format may be implemented in the future. The tricky part is encodi
---
#### Matching Results Files
Matching results files consist of the results of a BiGpairSEQ matching simulation.
Matching results files consist of the results of a BiGpairSEQ matching simulation. Making them requires a Graph and
Data file. To save file I/O time, the data from the most recent Graph and Data file read or generated is cached
by the simulator. Subsequent BiGpairSEQ simulations run with the same input filename will use the cached version
rather than reading in again from disk.
Files are in CSV format. Rows are sequence pairings with extra relevant data. Columns are pairing-specific details.
Metadata about the matching simulation is included as comments. Comments are preceded by `#`.
@@ -237,8 +245,9 @@ slightly less time than the simulation itself. Real elapsed time from start to f
## TODO
* ~~Try invoking GC at end of workloads to reduce paging to disk~~ DONE
* ~~Hold graph data in memory until another graph is read-in?~~ ABANDONED
* *No, this won't work, because BiGpairSEQ simulations alter the underlying graph based on filtering constraints. Changes would cascade with multiple experiments.*
* Hold graph data in memory until another graph is read-in? ~~ABANDONED~~ ~~UNABANDONED~~ DONE
* ~~*No, this won't work, because BiGpairSEQ simulations alter the underlying graph based on filtering constraints. Changes would cascade with multiple experiments.*~~
* Might have figured out a way to do it, by taking edges out and then putting them back into the graph. This may actually be possible. If so, awesome.
* See if there's a reasonable way to reformat Sample Plate files so that wells are columns instead of rows.
* ~~Problem is variable number of cells in a well~~
* ~~Apache Commons CSV library writes entries a row at a time~~

View File

@@ -0,0 +1,42 @@
//main class. Only job is to choose which interface to use, and hold graph data in memory
public class BiGpairSEQ {
private static GraphWithMapData graphInMemory = null;
private static String graphFilename = null;
public static void main(String[] args) {
if (args.length == 0) {
InteractiveInterface.startInteractive();
}
else {
//This will be uncommented when command line arguments are re-implemented.
//CommandLineInterface.startCLI(args);
System.out.println("Command line arguments are still being re-implemented.");
}
}
public static GraphWithMapData getGraph() {
return graphInMemory;
}
public static void setGraph(GraphWithMapData g) {
if (graphInMemory != null) {
clearGraph();
}
graphInMemory = g;
}
public static void clearGraph() {
graphInMemory = null;
System.gc();
}
public static String getGraphFilename() {
return graphFilename;
}
public static void setGraphFilename(String filename) {
graphFilename = filename;
}
}

View File

@@ -0,0 +1,328 @@
import org.apache.commons.cli.*;
/*
* Class for parsing options passed to program from command line
*
* Top-level flags:
* cells : to make a cell sample file
* plate : to make a sample plate file
* graph : to make a graph and data file
* match : to do a cdr3 matching (WITH OR WITHOUT MAKING A RESULTS FILE. May just want to print summary for piping.)
*
* Cell flags:
* count : number of cells to generate
* diversity factor : factor by which CDR3s are more diverse than CDR1s
* output : name of the output file
*
* Plate flags:
* cellfile : name of the cell sample file to use as input
* wells : the number of wells on the plate
* dist : the statistical distribution to use
* (if exponential) lambda : the lambda value of the exponential distribution
* (if gaussian) stddev : the standard deviation of the gaussian distribution
* rand : randomize well populations, take a minimum argument and a maximum argument
* populations : number of t cells per well per section (number of arguments determines number of sections)
* dropout : plate dropout rate, double from 0.0 to 1.0
* output : name of the output file
*
* Graph flags:
* cellfile : name of the cell sample file to use as input
* platefile : name of the sample plate file to use as input
* output : name of the output file
*
* Match flags:
* graphFile : name of graph and data file to use as input
* min : minimum number of overlap wells to attempt a matching
* max : the maximum number of overlap wells to attempt a matching
* maxdiff : (optional) the maximum difference in occupancy to attempt a matching
* minpercent : (optional) the minimum percent overlap to attempt a matching.
* writefile : (optional) the filename to write results to
* output : the values to print to System.out for piping
*
*/
public class CommandLineInterface {
public static void startCLI(String[] args) {
//These command line options are a big mess
//Really, I don't think command line tools are expected to work in this many different modes
//making cells, making plates, and matching are the sort of thing that UNIX philosophy would say
//should be three separate programs.
//There might be a way to do it with option parameters?
//main options set
Options mainOptions = new Options();
Option makeCells = Option.builder("cells")
.longOpt("make-cells")
.desc("Makes a file of distinct cells")
.build();
Option makePlate = Option.builder("plates")
.longOpt("make-plates")
.desc("Makes a sample plate file")
.build();
Option makeGraph = Option.builder("graph")
.longOpt("make-graph")
.desc("Makes a graph and data file")
.build();
Option matchCDR3 = Option.builder("match")
.longOpt("match-cdr3")
.desc("Match CDR3s. Requires a cell sample file and any number of plate files.")
.build();
OptionGroup mainGroup = new OptionGroup();
mainGroup.addOption(makeCells);
mainGroup.addOption(makePlate);
mainGroup.addOption(makeGraph);
mainGroup.addOption(matchCDR3);
mainGroup.setRequired(true);
mainOptions.addOptionGroup(mainGroup);
//Reuse clones of this for other options groups, rather than making it lots of times
Option outputFile = Option.builder("o")
.longOpt("output-file")
.hasArg()
.argName("filename")
.desc("Name of output file")
.build();
mainOptions.addOption(outputFile);
//Options cellOptions = new Options();
Option numCells = Option.builder("nc")
.longOpt("num-cells")
.desc("The number of distinct cells to generate")
.hasArg()
.argName("number")
.build();
mainOptions.addOption(numCells);
Option cdr1Freq = Option.builder("d")
.longOpt("peptide-diversity-factor")
.hasArg()
.argName("number")
.desc("Number of distinct CDR3s for every CDR1")
.build();
mainOptions.addOption(cdr1Freq);
//Option cellOutput = (Option) outputFile.clone();
//cellOutput.setRequired(true);
//mainOptions.addOption(cellOutput);
//Options plateOptions = new Options();
Option inputCells = Option.builder("c")
.longOpt("cell-file")
.hasArg()
.argName("file")
.desc("The cell sample file used for filling wells")
.build();
mainOptions.addOption(inputCells);
Option numWells = Option.builder("w")
.longOpt("num-wells")
.hasArg()
.argName("number")
.desc("The number of wells on each plate")
.build();
mainOptions.addOption(numWells);
Option numPlates = Option.builder("np")
.longOpt("num-plates")
.hasArg()
.argName("number")
.desc("The number of plate files to output")
.build();
mainOptions.addOption(numPlates);
//Option plateOutput = (Option) outputFile.clone();
//plateOutput.setRequired(true);
//plateOutput.setDescription("Prefix for plate output filenames");
//mainOptions.addOption(plateOutput);
Option plateErr = Option.builder("err")
.longOpt("drop-out-rate")
.hasArg()
.argName("number")
.desc("Well drop-out rate. (Probability between 0 and 1)")
.build();
mainOptions.addOption(plateErr);
Option plateConcentrations = Option.builder("t")
.longOpt("t-cells-per-well")
.hasArgs()
.argName("number 1, number 2, ...")
.desc("Number of T cells per well for each plate section")
.build();
mainOptions.addOption(plateConcentrations);
//different distributions, mutually exclusive
OptionGroup plateDistributions = new OptionGroup();
Option plateExp = Option.builder("exponential")
.desc("Sample from distinct cells with exponential frequency distribution")
.build();
plateDistributions.addOption(plateExp);
Option plateGaussian = Option.builder("gaussian")
.desc("Sample from distinct cells with gaussain frequency distribution")
.build();
plateDistributions.addOption(plateGaussian);
Option platePoisson = Option.builder("poisson")
.desc("Sample from distinct cells with poisson frequency distribution")
.build();
plateDistributions.addOption(platePoisson);
mainOptions.addOptionGroup(plateDistributions);
Option plateStdDev = Option.builder("stddev")
.desc("Standard deviation for gaussian distribution")
.hasArg()
.argName("number")
.build();
mainOptions.addOption(plateStdDev);
Option plateLambda = Option.builder("lambda")
.desc("Lambda for exponential distribution")
.hasArg()
.argName("number")
.build();
mainOptions.addOption(plateLambda);
//
// String cellFile, String filename, Double stdDev,
// Integer numWells, Integer numSections,
// Integer[] concentrations, Double dropOutRate
//
//Options matchOptions = new Options();
inputCells.setDescription("The cell sample file to be used for matching.");
mainOptions.addOption(inputCells);
Option lowThresh = Option.builder("low")
.longOpt("low-threshold")
.hasArg()
.argName("number")
.desc("Sets the minimum occupancy overlap to attempt matching")
.build();
mainOptions.addOption(lowThresh);
Option highThresh = Option.builder("high")
.longOpt("high-threshold")
.hasArg()
.argName("number")
.desc("Sets the maximum occupancy overlap to attempt matching")
.build();
mainOptions.addOption(highThresh);
Option occDiff = Option.builder("occdiff")
.longOpt("occupancy-difference")
.hasArg()
.argName("Number")
.desc("Maximum difference in alpha/beta occupancy to attempt matching")
.build();
mainOptions.addOption(occDiff);
Option overlapPer = Option.builder("ovper")
.longOpt("overlap-percent")
.hasArg()
.argName("Percent")
.desc("Minimum overlap percent to attempt matching (0 -100)")
.build();
mainOptions.addOption(overlapPer);
Option inputPlates = Option.builder("p")
.longOpt("plate-files")
.hasArgs()
.desc("Plate files to match")
.build();
mainOptions.addOption(inputPlates);
CommandLineParser parser = new DefaultParser();
try {
CommandLine line = parser.parse(mainOptions, args);
if(line.hasOption("match")){
//line = parser.parse(mainOptions, args);
//String cellFile = line.getOptionValue("c");
String graphFile = line.getOptionValue("g");
Integer lowThreshold = Integer.valueOf(line.getOptionValue(lowThresh));
Integer highThreshold = Integer.valueOf(line.getOptionValue(highThresh));
Integer occupancyDifference = Integer.valueOf(line.getOptionValue(occDiff));
Integer overlapPercent = Integer.valueOf(line.getOptionValue(overlapPer));
for(String plate: line.getOptionValues("p")) {
matchCDR3s(graphFile, lowThreshold, highThreshold, occupancyDifference, overlapPercent);
}
}
else if(line.hasOption("cells")){
//line = parser.parse(mainOptions, args);
String filename = line.getOptionValue("o");
Integer numDistCells = Integer.valueOf(line.getOptionValue("nc"));
Integer freq = Integer.valueOf(line.getOptionValue("d"));
makeCells(filename, numDistCells, freq);
}
else if(line.hasOption("plates")){
//line = parser.parse(mainOptions, args);
String cellFile = line.getOptionValue("c");
String filenamePrefix = line.getOptionValue("o");
Integer numWellsOnPlate = Integer.valueOf(line.getOptionValue("w"));
Integer numPlatesToMake = Integer.valueOf(line.getOptionValue("np"));
String[] concentrationsToUseString = line.getOptionValues("t");
Integer numSections = concentrationsToUseString.length;
Integer[] concentrationsToUse = new Integer[numSections];
for(int i = 0; i <numSections; i++){
concentrationsToUse[i] = Integer.valueOf(concentrationsToUseString[i]);
}
Double dropOutRate = Double.valueOf(line.getOptionValue("err"));
if(line.hasOption("exponential")){
Double lambda = Double.valueOf(line.getOptionValue("lambda"));
for(int i = 1; i <= numPlatesToMake; i++){
makePlateExp(cellFile, filenamePrefix + i, lambda, numWellsOnPlate,
concentrationsToUse,dropOutRate);
}
}
else if(line.hasOption("gaussian")){
Double stdDev = Double.valueOf(line.getOptionValue("std-dev"));
for(int i = 1; i <= numPlatesToMake; i++){
makePlate(cellFile, filenamePrefix + i, stdDev, numWellsOnPlate,
concentrationsToUse,dropOutRate);
}
}
else if(line.hasOption("poisson")){
for(int i = 1; i <= numPlatesToMake; i++){
makePlatePoisson(cellFile, filenamePrefix + i, numWellsOnPlate,
concentrationsToUse,dropOutRate);
}
}
}
}
catch (ParseException exp) {
System.err.println("Parsing failed. Reason: " + exp.getMessage());
}
}
//for calling from command line
public static void makeCells(String filename, Integer numCells, Integer cdr1Freq){
CellSample sample = Simulator.generateCellSample(numCells, cdr1Freq);
CellFileWriter writer = new CellFileWriter(filename, sample);
writer.writeCellsToFile();
}
public static void makePlateExp(String cellFile, String filename, Double lambda,
Integer numWells, Integer[] concentrations, Double dropOutRate){
CellFileReader cellReader = new CellFileReader(cellFile);
Plate samplePlate = new Plate(numWells, dropOutRate, concentrations);
samplePlate.fillWellsExponential(cellReader.getFilename(), cellReader.getCells(), lambda);
PlateFileWriter writer = new PlateFileWriter(filename, samplePlate);
writer.writePlateFile();
}
private static void makePlatePoisson(String cellFile, String filename, Integer numWells,
Integer[] concentrations, Double dropOutRate){
CellFileReader cellReader = new CellFileReader(cellFile);
Double stdDev = Math.sqrt(cellReader.getCellCount());
Plate samplePlate = new Plate(numWells, dropOutRate, concentrations);
samplePlate.fillWells(cellReader.getFilename(), cellReader.getCells(), stdDev);
PlateFileWriter writer = new PlateFileWriter(filename, samplePlate);
writer.writePlateFile();
}
private static void makePlate(String cellFile, String filename, Double stdDev,
Integer numWells, Integer[] concentrations, Double dropOutRate){
CellFileReader cellReader = new CellFileReader(cellFile);
Plate samplePlate = new Plate(numWells, dropOutRate, concentrations);
samplePlate.fillWells(cellReader.getFilename(), cellReader.getCells(), stdDev);
PlateFileWriter writer = new PlateFileWriter(filename, samplePlate);
writer.writePlateFile();
}
private static void matchCDR3s(String graphFile, Integer lowThreshold, Integer highThreshold,
Integer occupancyDifference, Integer overlapPercent) {
}
}

View File

@@ -13,6 +13,8 @@ public class GraphDataObjectReader {
BufferedInputStream fileIn = new BufferedInputStream(new FileInputStream(filename));
ObjectInputStream in = new ObjectInputStream(fileIn))
{
System.out.println("Reading graph data from file. This may take some time");
System.out.println("File I/O time is not included in results");
data = (GraphWithMapData) in.readObject();
} catch (FileNotFoundException | ClassNotFoundException ex) {
ex.printStackTrace();

View File

@@ -18,8 +18,11 @@ public class GraphDataObjectWriter {
public void writeDataToFile() {
try (BufferedOutputStream bufferedOut = new BufferedOutputStream(new FileOutputStream(filename));
ObjectOutputStream out = new ObjectOutputStream(bufferedOut);
){
System.out.println("Writing graph and occupancy data to file. This may take some time.");
System.out.println("File I/O time is not included in results.");
out.writeObject(data);
} catch (IOException ex) {
ex.printStackTrace();

View File

@@ -0,0 +1,90 @@
import org.jgrapht.graph.DefaultWeightedEdge;
import org.jgrapht.graph.SimpleWeightedGraph;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Set;
public abstract class GraphModificationFunctions {
//remove over- and under-weight edges
public static List<Integer[]> filterByOverlapThresholds(SimpleWeightedGraph<Integer, DefaultWeightedEdge> graph,
int low, int high) {
List<Integer[]> removedEdges = new ArrayList<>();
for(DefaultWeightedEdge e: graph.edgeSet()){
if ((graph.getEdgeWeight(e) > high) || (graph.getEdgeWeight(e) < low)){
Integer source = graph.getEdgeSource(e);
Integer target = graph.getEdgeTarget(e);
Integer weight = (int) graph.getEdgeWeight(e);
Integer[] edge = {source, target, weight};
removedEdges.add(edge);
}
}
for (Integer[] edge : removedEdges) {
graph.removeEdge(edge[0], edge[1]);
}
return removedEdges;
}
//Remove edges for pairs with large occupancy discrepancy
public static List<Integer[]> filterByRelativeOccupancy(SimpleWeightedGraph<Integer, DefaultWeightedEdge> graph,
Map<Integer, Integer> alphaWellCounts,
Map<Integer, Integer> betaWellCounts,
Map<Integer, Integer> plateVtoAMap,
Map<Integer, Integer> plateVtoBMap,
Integer maxOccupancyDifference) {
List<Integer[]> removedEdges = new ArrayList<>();
for (DefaultWeightedEdge e : graph.edgeSet()) {
Integer alphaOcc = alphaWellCounts.get(plateVtoAMap.get(graph.getEdgeSource(e)));
Integer betaOcc = betaWellCounts.get(plateVtoBMap.get(graph.getEdgeTarget(e)));
if (Math.abs(alphaOcc - betaOcc) >= maxOccupancyDifference) {
Integer source = graph.getEdgeSource(e);
Integer target = graph.getEdgeTarget(e);
Integer weight = (int) graph.getEdgeWeight(e);
Integer[] edge = {source, target, weight};
removedEdges.add(edge);
}
}
for (Integer[] edge : removedEdges) {
graph.removeEdge(edge[0], edge[1]);
}
return removedEdges;
}
//Remove edges for pairs where overlap size is significantly lower than the well occupancy
public static List<Integer[]> filterByOverlapPercent(SimpleWeightedGraph<Integer, DefaultWeightedEdge> graph,
Map<Integer, Integer> alphaWellCounts,
Map<Integer, Integer> betaWellCounts,
Map<Integer, Integer> plateVtoAMap,
Map<Integer, Integer> plateVtoBMap,
Integer minOverlapPercent) {
List<Integer[]> removedEdges = new ArrayList<>();
for (DefaultWeightedEdge e : graph.edgeSet()) {
Integer alphaOcc = alphaWellCounts.get(plateVtoAMap.get(graph.getEdgeSource(e)));
Integer betaOcc = betaWellCounts.get(plateVtoBMap.get(graph.getEdgeTarget(e)));
double weight = graph.getEdgeWeight(e);
double min = minOverlapPercent / 100.0;
if ((weight / alphaOcc < min) || (weight / betaOcc < min)) {
Integer source = graph.getEdgeSource(e);
Integer target = graph.getEdgeTarget(e);
Integer intWeight = (int) graph.getEdgeWeight(e);
Integer[] edge = {source, target, intWeight};
removedEdges.add(edge);
}
}
for (Integer[] edge : removedEdges) {
graph.removeEdge(edge[0], edge[1]);
}
return removedEdges;
}
public static void addRemovedEdges(SimpleWeightedGraph<Integer, DefaultWeightedEdge> graph,
List<Integer[]> removedEdges) {
for (Integer[] edge : removedEdges) {
DefaultWeightedEdge e = graph.addEdge(edge[0], edge[1]);
graph.setEdgeWeight(e, (double) edge[2]);
}
}
}

View File

@@ -1,260 +1,17 @@
import org.apache.commons.cli.*;
import java.io.IOException;
import java.util.List;
import java.util.Scanner;
import java.util.InputMismatchException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
//
public class UserInterface {
public class InteractiveInterface {
final static Scanner sc = new Scanner(System.in);
static int input;
static boolean quit = false;
public static void main(String[] args) {
//for now, commenting out all the command line argument stuff.
// Refactoring to output files of graphs, so it would all need to change anyway.
public static void startInteractive() {
// if(args.length != 0){
// //These command line options are a big mess
// //Really, I don't think command line tools are expected to work in this many different modes
// //making cells, making plates, and matching are the sort of thing that UNIX philosophy would say
// //should be three separate programs.
// //There might be a way to do it with option parameters?
//
// Options mainOptions = new Options();
// Option makeCells = Option.builder("cells")
// .longOpt("make-cells")
// .desc("Makes a file of distinct cells")
// .build();
// Option makePlate = Option.builder("plates")
// .longOpt("make-plates")
// .desc("Makes a sample plate file")
// .build();
// Option matchCDR3 = Option.builder("match")
// .longOpt("match-cdr3")
// .desc("Match CDR3s. Requires a cell sample file and any number of plate files.")
// .build();
// OptionGroup mainGroup = new OptionGroup();
// mainGroup.addOption(makeCells);
// mainGroup.addOption(makePlate);
// mainGroup.addOption(matchCDR3);
// mainGroup.setRequired(true);
// mainOptions.addOptionGroup(mainGroup);
//
// //Reuse clones of this for other options groups, rather than making it lots of times
// Option outputFile = Option.builder("o")
// .longOpt("output-file")
// .hasArg()
// .argName("filename")
// .desc("Name of output file")
// .build();
// mainOptions.addOption(outputFile);
//
// //Options cellOptions = new Options();
// Option numCells = Option.builder("nc")
// .longOpt("num-cells")
// .desc("The number of distinct cells to generate")
// .hasArg()
// .argName("number")
// .build();
// mainOptions.addOption(numCells);
// Option cdr1Freq = Option.builder("d")
// .longOpt("peptide-diversity-factor")
// .hasArg()
// .argName("number")
// .desc("Number of distinct CDR3s for every CDR1")
// .build();
// mainOptions.addOption(cdr1Freq);
// //Option cellOutput = (Option) outputFile.clone();
// //cellOutput.setRequired(true);
// //mainOptions.addOption(cellOutput);
//
// //Options plateOptions = new Options();
// Option inputCells = Option.builder("c")
// .longOpt("cell-file")
// .hasArg()
// .argName("file")
// .desc("The cell sample file used for filling wells")
// .build();
// mainOptions.addOption(inputCells);
// Option numWells = Option.builder("w")
// .longOpt("num-wells")
// .hasArg()
// .argName("number")
// .desc("The number of wells on each plate")
// .build();
// mainOptions.addOption(numWells);
// Option numPlates = Option.builder("np")
// .longOpt("num-plates")
// .hasArg()
// .argName("number")
// .desc("The number of plate files to output")
// .build();
// mainOptions.addOption(numPlates);
// //Option plateOutput = (Option) outputFile.clone();
// //plateOutput.setRequired(true);
// //plateOutput.setDescription("Prefix for plate output filenames");
// //mainOptions.addOption(plateOutput);
// Option plateErr = Option.builder("err")
// .longOpt("drop-out-rate")
// .hasArg()
// .argName("number")
// .desc("Well drop-out rate. (Probability between 0 and 1)")
// .build();
// mainOptions.addOption(plateErr);
// Option plateConcentrations = Option.builder("t")
// .longOpt("t-cells-per-well")
// .hasArgs()
// .argName("number 1, number 2, ...")
// .desc("Number of T cells per well for each plate section")
// .build();
// mainOptions.addOption(plateConcentrations);
//
////different distributions, mutually exclusive
// OptionGroup plateDistributions = new OptionGroup();
// Option plateExp = Option.builder("exponential")
// .desc("Sample from distinct cells with exponential frequency distribution")
// .build();
// plateDistributions.addOption(plateExp);
// Option plateGaussian = Option.builder("gaussian")
// .desc("Sample from distinct cells with gaussain frequency distribution")
// .build();
// plateDistributions.addOption(plateGaussian);
// Option platePoisson = Option.builder("poisson")
// .desc("Sample from distinct cells with poisson frequency distribution")
// .build();
// plateDistributions.addOption(platePoisson);
// mainOptions.addOptionGroup(plateDistributions);
//
// Option plateStdDev = Option.builder("stddev")
// .desc("Standard deviation for gaussian distribution")
// .hasArg()
// .argName("number")
// .build();
// mainOptions.addOption(plateStdDev);
//
// Option plateLambda = Option.builder("lambda")
// .desc("Lambda for exponential distribution")
// .hasArg()
// .argName("number")
// .build();
// mainOptions.addOption(plateLambda);
//
//
//
////
//// String cellFile, String filename, Double stdDev,
//// Integer numWells, Integer numSections,
//// Integer[] concentrations, Double dropOutRate
////
//
// //Options matchOptions = new Options();
// inputCells.setDescription("The cell sample file to be used for matching.");
// mainOptions.addOption(inputCells);
// Option lowThresh = Option.builder("low")
// .longOpt("low-threshold")
// .hasArg()
// .argName("number")
// .desc("Sets the minimum occupancy overlap to attempt matching")
// .build();
// mainOptions.addOption(lowThresh);
// Option highThresh = Option.builder("high")
// .longOpt("high-threshold")
// .hasArg()
// .argName("number")
// .desc("Sets the maximum occupancy overlap to attempt matching")
// .build();
// mainOptions.addOption(highThresh);
// Option occDiff = Option.builder("occdiff")
// .longOpt("occupancy-difference")
// .hasArg()
// .argName("Number")
// .desc("Maximum difference in alpha/beta occupancy to attempt matching")
// .build();
// mainOptions.addOption(occDiff);
// Option overlapPer = Option.builder("ovper")
// .longOpt("overlap-percent")
// .hasArg()
// .argName("Percent")
// .desc("Minimum overlap percent to attempt matching (0 -100)")
// .build();
// mainOptions.addOption(overlapPer);
// Option inputPlates = Option.builder("p")
// .longOpt("plate-files")
// .hasArgs()
// .desc("Plate files to match")
// .build();
// mainOptions.addOption(inputPlates);
//
//
//
// CommandLineParser parser = new DefaultParser();
// try {
// CommandLine line = parser.parse(mainOptions, args);
// if(line.hasOption("match")){
// //line = parser.parse(mainOptions, args);
// String cellFile = line.getOptionValue("c");
// Integer lowThreshold = Integer.valueOf(line.getOptionValue(lowThresh));
// Integer highThreshold = Integer.valueOf(line.getOptionValue(highThresh));
// Integer occupancyDifference = Integer.valueOf(line.getOptionValue(occDiff));
// Integer overlapPercent = Integer.valueOf(line.getOptionValue(overlapPer));
// for(String plate: line.getOptionValues("p")) {
// matchCDR3s(cellFile, plate, lowThreshold, highThreshold, occupancyDifference, overlapPercent);
// }
// }
// else if(line.hasOption("cells")){
// //line = parser.parse(mainOptions, args);
// String filename = line.getOptionValue("o");
// Integer numDistCells = Integer.valueOf(line.getOptionValue("nc"));
// Integer freq = Integer.valueOf(line.getOptionValue("d"));
// makeCells(filename, numDistCells, freq);
// }
// else if(line.hasOption("plates")){
// //line = parser.parse(mainOptions, args);
// String cellFile = line.getOptionValue("c");
// String filenamePrefix = line.getOptionValue("o");
// Integer numWellsOnPlate = Integer.valueOf(line.getOptionValue("w"));
// Integer numPlatesToMake = Integer.valueOf(line.getOptionValue("np"));
// String[] concentrationsToUseString = line.getOptionValues("t");
// Integer numSections = concentrationsToUseString.length;
//
// Integer[] concentrationsToUse = new Integer[numSections];
// for(int i = 0; i <numSections; i++){
// concentrationsToUse[i] = Integer.valueOf(concentrationsToUseString[i]);
// }
// Double dropOutRate = Double.valueOf(line.getOptionValue("err"));
// if(line.hasOption("exponential")){
// Double lambda = Double.valueOf(line.getOptionValue("lambda"));
// for(int i = 1; i <= numPlatesToMake; i++){
// makePlateExp(cellFile, filenamePrefix + i, lambda, numWellsOnPlate,
// concentrationsToUse,dropOutRate);
// }
// }
// else if(line.hasOption("gaussian")){
// Double stdDev = Double.valueOf(line.getOptionValue("std-dev"));
// for(int i = 1; i <= numPlatesToMake; i++){
// makePlate(cellFile, filenamePrefix + i, stdDev, numWellsOnPlate,
// concentrationsToUse,dropOutRate);
// }
//
// }
// else if(line.hasOption("poisson")){
// for(int i = 1; i <= numPlatesToMake; i++){
// makePlatePoisson(cellFile, filenamePrefix + i, numWellsOnPlate,
// concentrationsToUse,dropOutRate);
// }
// }
// }
// }
// catch (ParseException exp) {
// System.err.println("Parsing failed. Reason: " + exp.getMessage());
// }
// }
// else {
while (!quit) {
System.out.println();
System.out.println("--------BiGPairSEQ SIMULATOR--------");
@@ -289,7 +46,6 @@ public class UserInterface {
}
}
sc.close();
// }
}
private static void makeCells() {
@@ -319,44 +75,8 @@ public class UserInterface {
assert filename != null;
CellFileWriter writer = new CellFileWriter(filename, sample);
writer.writeCellsToFile();
System.gc();
}
// //for calling from command line
// private static void makeCells(String filename, Integer numCells, Integer cdr1Freq){
// CellSample sample = Simulator.generateCellSample(numCells, cdr1Freq);
// CellFileWriter writer = new CellFileWriter(filename, sample);
// writer.writeCellsToFile();
// }
//
// private static void makePlateExp(String cellFile, String filename, Double lambda,
// Integer numWells, Integer[] concentrations, Double dropOutRate){
// CellFileReader cellReader = new CellFileReader(cellFile);
// Plate samplePlate = new Plate(numWells, dropOutRate, concentrations);
// samplePlate.fillWellsExponential(cellReader.getFilename(), cellReader.getCells(), lambda);
// PlateFileWriter writer = new PlateFileWriter(filename, samplePlate);
// writer.writePlateFile();
// }
//
// private static void makePlatePoisson(String cellFile, String filename, Integer numWells,
// Integer[] concentrations, Double dropOutRate){
// CellFileReader cellReader = new CellFileReader(cellFile);
// Double stdDev = Math.sqrt(cellReader.getCellCount());
// Plate samplePlate = new Plate(numWells, dropOutRate, concentrations);
// samplePlate.fillWells(cellReader.getFilename(), cellReader.getCells(), stdDev);
// PlateFileWriter writer = new PlateFileWriter(filename, samplePlate);
// writer.writePlateFile();
// }
//
// private static void makePlate(String cellFile, String filename, Double stdDev,
// Integer numWells, Integer[] concentrations, Double dropOutRate){
// CellFileReader cellReader = new CellFileReader(cellFile);
// Plate samplePlate = new Plate(numWells, dropOutRate, concentrations);
// samplePlate.fillWells(cellReader.getFilename(), cellReader.getCells(), stdDev);
// PlateFileWriter writer = new PlateFileWriter(filename, samplePlate);
// writer.writePlateFile();
// }
//Output a CSV of sample plate
private static void makePlate() {
String cellFile = null;
@@ -466,7 +186,6 @@ public class UserInterface {
System.out.println("Writing Sample Plate to file");
writer.writePlateFile();
System.out.println("Sample Plate written to file: " + filename);
System.gc();
}
}
@@ -502,7 +221,7 @@ public class UserInterface {
System.out.println("No cell sample found.");
System.out.println("Returning to main menu.");
}
else if(plate.getWells().size() == 0 || plate.getConcentrations().length == 0){
else if(plate.getWells().size() == 0 || plate.getPopulations().length == 0){
System.out.println("No sample plate found.");
System.out.println("Returning to main menu.");
}
@@ -511,18 +230,18 @@ public class UserInterface {
GraphWithMapData data = Simulator.makeGraph(cells, plate, true);
assert filename != null;
GraphDataObjectWriter dataWriter = new GraphDataObjectWriter(filename, data);
System.out.println("Writing graph and occupancy data to file. This may take some time.");
System.out.println("File I/O time is not included in results.");
dataWriter.writeDataToFile();
System.out.println("Graph and Data file written to: " + filename);
System.gc();
BiGpairSEQ.setGraph(data);
BiGpairSEQ.setGraphFilename(filename);
System.out.println("Graph and Data file " + filename + " cached.");
}
}
//Simulate matching and output CSV file of results
private static void matchCDR3s() throws IOException {
String filename = null;
String dataFilename = null;
String graphFilename = null;
Integer lowThreshold = 0;
Integer highThreshold = Integer.MAX_VALUE;
Integer maxOccupancyDiff = Integer.MAX_VALUE;
@@ -530,7 +249,7 @@ public class UserInterface {
try {
System.out.println("\nBiGpairSEQ simulation requires an occupancy data and overlap graph file");
System.out.println("Please enter name of an existing graph and occupancy data file: ");
dataFilename = sc.next();
graphFilename = sc.next();
System.out.println("The matching results will be written to a file.");
System.out.print("Please enter a name for the output file: ");
filename = sc.next();
@@ -553,16 +272,23 @@ public class UserInterface {
System.out.println(ex);
sc.next();
}
//read object data from file
System.out.println("Reading graph data from file. This may take some time");
System.out.println("File I/O time is not included in results");
assert dataFilename != null;
GraphDataObjectReader dataReader = new GraphDataObjectReader(dataFilename);
GraphWithMapData data = dataReader.getData();
//set source file name
data.setSourceFilename(dataFilename);
assert graphFilename != null;
//check if this is the same graph we already have in memory.
GraphWithMapData data;
if(!(graphFilename.equals(BiGpairSEQ.getGraphFilename())) || BiGpairSEQ.getGraph() == null) {
BiGpairSEQ.clearGraph();
//read object data from file
GraphDataObjectReader dataReader = new GraphDataObjectReader(graphFilename);
data = dataReader.getData();
//set new graph in memory and new filename
BiGpairSEQ.setGraph(data);
BiGpairSEQ.setGraphFilename(graphFilename);
}
else {
data = BiGpairSEQ.getGraph();
}
//simulate matching
MatchingResult results = Simulator.matchCDR3s(data, dataFilename, lowThreshold, highThreshold, maxOccupancyDiff,
MatchingResult results = Simulator.matchCDR3s(data, graphFilename, lowThreshold, highThreshold, maxOccupancyDiff,
minOverlapPercent, true);
//write results to file
assert filename != null;
@@ -570,7 +296,6 @@ public class UserInterface {
System.out.println("Writing results to file");
writer.writeResultsToFile();
System.out.println("Results written to file: " + filename);
System.gc();
}
///////

View File

@@ -0,0 +1,3 @@
Manifest-Version: 1.0
Main-Class: BiGpairSEQ

View File

@@ -7,13 +7,10 @@ import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.StandardOpenOption;
import java.util.List;
import java.util.regex.Pattern;
public class MatchingFileWriter {
private String filename;
private String sourceFileName;
private List<String> comments;
private List<String> headers;
private List<List<String>> allResults;
@@ -23,7 +20,6 @@ public class MatchingFileWriter {
filename = filename + ".csv";
}
this.filename = filename;
this.sourceFileName = result.getSourceFileName();
this.comments = result.getComments();
this.headers = result.getHeaders();
this.allResults = result.getAllResults();

View File

@@ -1,18 +1,41 @@
import java.time.Duration;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
public class MatchingResult {
private String sourceFile;
private List<String> comments;
private List<String> headers;
private List<List<String>> allResults;
private Map<Integer, Integer> matchMap;
private Duration time;
public MatchingResult(String sourceFileName, List<String> comments, List<String> headers, List<List<String>> allResults, Map<Integer, Integer>matchMap, Duration time){
this.sourceFile = sourceFileName;
this.comments = comments;
private final Map<String, String> metadata;
private final List<String> comments;
private final List<String> headers;
private final List<List<String>> allResults;
private final Map<Integer, Integer> matchMap;
private final Duration time;
public MatchingResult(Map<String, String> metadata, List<String> headers,
List<List<String>> allResults, Map<Integer, Integer>matchMap, Duration time){
/*
* POSSIBLE KEYS FOR METADATA MAP ARE:
* sample plate filename *
* graph filename *
* well populations *
* total alphas found *
* total betas found *
* high overlap threshold
* low overlap threshold
* maximum occupancy difference
* minimum overlap percent
* pairing attempt rate
* correct pairing count
* incorrect pairing count
* pairing error rate
* simulation time
*/
this.metadata = metadata;
this.comments = new ArrayList<>();
for (String key : metadata.keySet()) {
comments.add(key +": " + metadata.get(key));
}
this.headers = headers;
this.allResults = allResults;
this.matchMap = matchMap;
@@ -20,6 +43,8 @@ public class MatchingResult {
}
public Map<String, String> getMetadata() {return metadata;}
public List<String> getComments() {
return comments;
}
@@ -40,7 +65,32 @@ public class MatchingResult {
return time;
}
public String getSourceFileName() {
return sourceFile;
public String getPlateFilename() {
return metadata.get("sample plate filename");
}
public String getGraphFilename() {
return metadata.get("graph filename");
}
public Integer[] getWellPopulations() {
List<Integer> wellPopulations = new ArrayList<>();
String popString = metadata.get("well populations");
for (String p : popString.split(", ")) {
wellPopulations.add(Integer.parseInt(p));
}
Integer[] popArray = new Integer[wellPopulations.size()];
return wellPopulations.toArray(popArray);
}
public Integer getAlphaCount() {
return Integer.parseInt(metadata.get("total alpha count"));
}
public Integer getBetaCount() {
return Integer.parseInt(metadata.get("total beta count"));
}
//put in the rest of these methods following the same pattern
}

View File

@@ -1,26 +1,28 @@
import java.util.*;
/*
TODO: Implement exponential distribution using inversion method - DONE
TODO: Implement discrete frequency distributions using Vose's Alias Method
*/
import java.util.*;
public class Plate {
private String sourceFile;
private List<List<Integer[]>> wells;
private Random rand = new Random();
private int size;
private double error;
private Integer[] concentrations;
private Integer[] populations;
private double stdDev;
private double lambda;
boolean exponential = false;
public Plate(int size, double error, Integer[] concentrations) {
public Plate(int size, double error, Integer[] populations) {
this.size = size;
this.error = error;
this.concentrations = concentrations;
this.populations = populations;
wells = new ArrayList<>();
}
@@ -35,9 +37,9 @@ public class Plate {
concentrations.add(w.size());
}
}
this.concentrations = new Integer[concentrations.size()];
for (int i = 0; i < this.concentrations.length; i++) {
this.concentrations[i] = concentrations.get(i);
this.populations = new Integer[concentrations.size()];
for (int i = 0; i < this.populations.length; i++) {
this.populations[i] = concentrations.get(i);
}
}
@@ -45,7 +47,7 @@ public class Plate {
this.lambda = lambda;
exponential = true;
sourceFile = sourceFileName;
int numSections = concentrations.length;
int numSections = populations.length;
int section = 0;
double m;
int n;
@@ -53,7 +55,7 @@ public class Plate {
while (section < numSections){
for (int i = 0; i < (size / numSections); i++) {
List<Integer[]> well = new ArrayList<>();
for (int j = 0; j < concentrations[section]; j++) {
for (int j = 0; j < populations[section]; j++) {
do {
//inverse transform sampling: for random number u in [0,1), x = log(1-u) / (-lambda)
m = (Math.log10((1 - rand.nextDouble()))/(-lambda)) * Math.sqrt(cells.size());
@@ -84,14 +86,14 @@ public class Plate {
public void fillWells(String sourceFileName, List<Integer[]> cells, double stdDev) {
this.stdDev = stdDev;
sourceFile = sourceFileName;
int numSections = concentrations.length;
int numSections = populations.length;
int section = 0;
double m;
int n;
while (section < numSections){
for (int i = 0; i < (size / numSections); i++) {
List<Integer[]> well = new ArrayList<>();
for (int j = 0; j < concentrations[section]; j++) {
for (int j = 0; j < populations[section]; j++) {
do {
m = (rand.nextGaussian() * stdDev) + (cells.size() / 2);
} while (m >= cells.size() || m < 0);
@@ -110,8 +112,8 @@ public class Plate {
}
}
public Integer[] getConcentrations(){
return concentrations;
public Integer[] getPopulations(){
return populations;
}
public int getSize(){

View File

@@ -7,7 +7,6 @@ import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.StandardOpenOption;
import java.util.*;
import java.util.regex.Pattern;
public class PlateFileWriter {
private int size;
@@ -17,8 +16,7 @@ public class PlateFileWriter {
private Double error;
private String filename;
private String sourceFileName;
private String[] headers;
private List<Integer> concentrations;
private Integer[] concentrations;
private boolean isExponential = false;
public PlateFileWriter(String filename, Plate plate) {
@@ -37,8 +35,8 @@ public class PlateFileWriter {
}
this.error = plate.getError();
this.wells = plate.getWells();
this.concentrations = Arrays.asList(plate.getConcentrations());
concentrations.sort(Comparator.reverseOrder());
this.concentrations = plate.getPopulations();
Arrays.sort(concentrations);
}
public void writePlateFile(){
@@ -59,28 +57,34 @@ public class PlateFileWriter {
}
}
//this took forever
List<List<String>> rows = new ArrayList<>();
List<String> tmp = new ArrayList<>();
for(int i = 0; i < wellsAsStrings.size(); i++){//List<Integer[]> w: wells){
tmp.add("well " + (i+1));
}
rows.add(tmp);
for(int row = 0; row < maxLength; row++){
tmp = new ArrayList<>();
for(List<String> c: wellsAsStrings){
tmp.add(c.get(row));
}
rows.add(tmp);
}
//build string of well concentrations
StringBuilder concen = new StringBuilder();
for(Integer i: concentrations){
concen.append(i.toString());
concen.append(" ");
}
String concenString = concen.toString();
// //this took forever and I don't use it
// //if I wanted to use it, I'd replace printer.printRecords(wellsAsStrings) with printer.printRecords(rows)
// List<List<String>> rows = new ArrayList<>();
// List<String> tmp = new ArrayList<>();
// for(int i = 0; i < wellsAsStrings.size(); i++){//List<Integer[]> w: wells){
// tmp.add("well " + (i+1));
// }
// rows.add(tmp);
// for(int row = 0; row < maxLength; row++){
// tmp = new ArrayList<>();
// for(List<String> c: wellsAsStrings){
// tmp.add(c.get(row));
// }
// rows.add(tmp);
// }
//get list of well populations
List<Integer> wellPopulations = Arrays.asList(concentrations);
//make string out of populations list
StringBuilder populationsStringBuilder = new StringBuilder();
populationsStringBuilder.append(wellPopulations.remove(0).toString());
for(Integer i: wellPopulations){
populationsStringBuilder.append(", ");
populationsStringBuilder.append(i.toString());
}
String wellPopulationsString = populationsStringBuilder.toString();
//set CSV format
CSVFormat plateFileFormat = CSVFormat.Builder.create()
.setCommentMarker('#')
.build();
@@ -92,7 +96,7 @@ public class PlateFileWriter {
printer.printComment("Each row represents one well on the plate.");
printer.printComment("Plate size: " + size);
printer.printComment("Error rate: " + error);
printer.printComment("Concentrations: " + concenString);
printer.printComment("Well populations: " + wellPopulationsString);
if(isExponential){
printer.printComment("Lambda: " + lambda);
}

View File

@@ -1,3 +1,4 @@
import org.jgrapht.Graph;
import org.jgrapht.alg.interfaces.MatchingAlgorithm;
import org.jgrapht.alg.matching.MaximumWeightBipartiteMatching;
import org.jgrapht.generate.SimpleWeightedBipartiteGraphMatrixGenerator;
@@ -49,6 +50,7 @@ public class Simulator {
Instant start = Instant.now();
int[] alphaIndex = {cdr3AlphaIndex};
int[] betaIndex = {cdr3BetaIndex};
int numWells = samplePlate.getSize();
if(verbose){System.out.println("Making cell maps");}
@@ -63,15 +65,11 @@ public class Simulator {
if(verbose){System.out.println("All alphas count: " + alphaCount);}
int betaCount = allBetas.size();
if(verbose){System.out.println("All betas count: " + betaCount);}
if(verbose){System.out.println("Well maps made");}
//Remove saturating-occupancy sequences because they have no signal value.
//Remove sequences with total occupancy below minimum pair overlap threshold
if(verbose){System.out.println("Removing sequences present in all wells.");}
//if(verbose){System.out.println("Removing sequences with occupancy below the minimum overlap threshold");}
filterByOccupancyThreshold(allAlphas, 1, numWells - 1);
filterByOccupancyThreshold(allBetas, 1, numWells - 1);
filterByOccupancyThresholds(allAlphas, 1, numWells - 1);
filterByOccupancyThresholds(allBetas, 1, numWells - 1);
if(verbose){System.out.println("Sequences removed");}
int pairableAlphaCount = allAlphas.size();
if(verbose){System.out.println("Remaining alphas count: " + pairableAlphaCount);}
@@ -131,10 +129,15 @@ public class Simulator {
Instant stop = Instant.now();
Duration time = Duration.between(start, stop);
//return GraphWithMapData object
return new GraphWithMapData(graph, numWells, samplePlate.getConcentrations(), alphaCount, betaCount,
//create GraphWithMapData object
GraphWithMapData output = new GraphWithMapData(graph, numWells, samplePlate.getPopulations(), alphaCount, betaCount,
distCellsMapAlphaKey, plateVtoAMap, plateVtoBMap, plateAtoVMap,
plateBtoVMap, alphaWellCounts, betaWellCounts, time);
//Set source file name in graph to name of sample plate
output.setSourceFilename(samplePlate.getSourceFileName());
//return GraphWithMapData object
return output;
}
//match CDR3s.
@@ -142,6 +145,8 @@ public class Simulator {
Integer highThreshold, Integer maxOccupancyDifference,
Integer minOverlapPercent, boolean verbose) {
Instant start = Instant.now();
//Integer arrays will contain TO VERTEX, FROM VERTEX, and WEIGHT (which I'll need to cast to double)
List<Integer[]> removedEdges = new ArrayList<>();
int numWells = data.getNumWells();
Integer alphaCount = data.getAlphaCount();
Integer betaCount = data.getBetaCount();
@@ -152,24 +157,26 @@ public class Simulator {
Map<Integer, Integer> betaWellCounts = data.getBetaWellCounts();
SimpleWeightedGraph<Integer, DefaultWeightedEdge> graph = data.getGraph();
//remove weights outside given overlap thresholds
//remove edges with weights outside given overlap thresholds, add those to removed edge list
if(verbose){System.out.println("Eliminating edges with weights outside overlap threshold values");}
filterByOccupancyThreshold(graph, lowThreshold, highThreshold);
if(verbose){System.out.println("Over- and under-weight edges set to 0.0");}
removedEdges.addAll(GraphModificationFunctions.filterByOverlapThresholds(graph, lowThreshold, highThreshold));
if(verbose){System.out.println("Over- and under-weight edges removed");}
//Filter by overlap size
//remove edges between vertices with too small an overlap size, add those to removed edge list
if(verbose){System.out.println("Eliminating edges with weights less than " + minOverlapPercent.toString() +
" percent of vertex occupancy value.");}
filterByOverlapSize(graph, alphaWellCounts, betaWellCounts, plateVtoAMap, plateVtoBMap, minOverlapPercent);
if(verbose){System.out.println("Edges with weights too far below vertex occupancy values set to 0.0");}
removedEdges.addAll(GraphModificationFunctions.filterByOverlapPercent(graph, alphaWellCounts, betaWellCounts,
plateVtoAMap, plateVtoBMap, minOverlapPercent));
if(verbose){System.out.println("Edges with weights too far below a vertex occupancy value removed");}
//Filter by relative occupancy
if(verbose){System.out.println("Eliminating edges between vertices with occupancy difference > "
+ maxOccupancyDifference);}
filterByRelativeOccupancy(graph, alphaWellCounts, betaWellCounts, plateVtoAMap, plateVtoBMap,
maxOccupancyDifference);
removedEdges.addAll(GraphModificationFunctions.filterByRelativeOccupancy(graph, alphaWellCounts, betaWellCounts,
plateVtoAMap, plateVtoBMap, maxOccupancyDifference));
if(verbose){System.out.println("Edges between vertices of with excessively different occupancy values " +
"set to 0.0");}
"removed");}
//Find Maximum Weighted Matching
//using jheaps library class PairingHeap for improved efficiency
if(verbose){System.out.println("Finding maximum weighted matching");}
@@ -233,351 +240,376 @@ public class Simulator {
allResults.add(result);
}
//Metadate comments for CSV file
//Metadata comments for CSV file
int min = Math.min(alphaCount, betaCount);
//rate of attempted matching
double attemptRate = (double) (trueCount + falseCount) / min;
BigDecimal attemptRateTrunc = new BigDecimal(attemptRate, mc);
//rate of pairing error
double pairingErrorRate = (double) falseCount / (trueCount + falseCount);
BigDecimal pairingErrorRateTrunc = new BigDecimal(pairingErrorRate, mc);
//get list of well concentrations
List<Integer> wellConcentrations = Arrays.asList(data.getWellConcentrations());
Integer[] wellPopulations = data.getWellConcentrations();
//make string out of concentrations list
StringBuilder concentrationStringBuilder = new StringBuilder();
for(Integer i: wellConcentrations){
concentrationStringBuilder.append(i.toString());
concentrationStringBuilder.append(" ");
StringBuilder populationsStringBuilder = new StringBuilder();
populationsStringBuilder.append(wellPopulations[0].toString());
for(int i = 1; i < wellPopulations.length; i++){
populationsStringBuilder.append(", ");
populationsStringBuilder.append(wellPopulations[i].toString());
}
String concentrationString = concentrationStringBuilder.toString();
List<String> comments = new ArrayList<>();
comments.add("Source Sample Plate filename: " + data.getSourceFilename());
comments.add("Source Graph and Data filename: " + dataFilename);
comments.add("T cell counts in sample plate wells: " + concentrationString);
comments.add("Total alphas found: " + alphaCount);
comments.add("Total betas found: " + betaCount);
comments.add("High overlap threshold: " + highThreshold);
comments.add("Low overlap threshold: " + lowThreshold);
comments.add("Minimum overlap percent: " + minOverlapPercent);
comments.add("Maximum occupancy difference: " + maxOccupancyDifference);
comments.add("Pairing attempt rate: " + attemptRateTrunc);
comments.add("Correct pairings: " + trueCount);
comments.add("Incorrect pairings: " + falseCount);
comments.add("Pairing error rate: " + pairingErrorRateTrunc);
String wellPopulationsString = populationsStringBuilder.toString();
//total simulation time
Duration time = Duration.between(start, stop);
time = time.plus(data.getTime());
comments.add("Simulation time: " + nf.format(time.toSeconds()) + " seconds");
Map<String, String> metadata = new LinkedHashMap<>();
metadata.put("sample plate filename", data.getSourceFilename());
metadata.put("graph filename", dataFilename);
metadata.put("well populations", wellPopulationsString);
metadata.put("total alphas found", alphaCount.toString());
metadata.put("total betas found", betaCount.toString());
metadata.put("high overlap threshold", highThreshold.toString());
metadata.put("low overlap threshold", lowThreshold.toString());
metadata.put("maximum occupancy difference", maxOccupancyDifference.toString());
metadata.put("minimum overlap percent", minOverlapPercent.toString());
metadata.put("pairing attempt rate", attemptRateTrunc.toString());
metadata.put("correct pairing count", Integer.toString(trueCount));
metadata.put("incorrect pairing count", Integer.toString(falseCount));
metadata.put("pairing error rate", pairingErrorRateTrunc.toString());
metadata.put("simulation time", nf.format(time.toSeconds()));
//create MatchingResult object
MatchingResult output = new MatchingResult(metadata, header, allResults, matchMap, time);
if(verbose){
for(String s: comments){
for(String s: output.getComments()){
System.out.println(s);
}
}
return new MatchingResult(data.getSourceFilename(), comments, header, allResults, matchMap, time);
}
//Simulated matching of CDR1s to CDR3s. Requires MatchingResult from prior run of matchCDR3s.
public static MatchingResult[] matchCDR1s(List<Integer[]> distinctCells,
Plate samplePlate, Integer lowThreshold,
Integer highThreshold, MatchingResult priorResult){
Instant start = Instant.now();
Duration previousTime = priorResult.getTime();
Map<Integer, Integer> previousMatches = priorResult.getMatchMap();
int numWells = samplePlate.getSize();
int[] cdr3Indices = {cdr3AlphaIndex, cdr3BetaIndex};
int[] cdr1Indices = {cdr1AlphaIndex, cdr1BetaIndex};
//put the removed edges back on the graph
System.out.println("Restoring removed edges to graph.");
GraphModificationFunctions.addRemovedEdges(graph, removedEdges);
System.out.println("Making previous match maps");
Map<Integer, Integer> cdr3AtoBMap = previousMatches;
Map<Integer, Integer> cdr3BtoAMap = invertVertexMap(cdr3AtoBMap);
System.out.println("Previous match maps made");
System.out.println("Making cell maps");
Map<Integer, Integer> alphaCDR3toCDR1Map = makeSequenceToSequenceMap(distinctCells, cdr3AlphaIndex, cdr1AlphaIndex);
Map<Integer, Integer> betaCDR3toCDR1Map = makeSequenceToSequenceMap(distinctCells, cdr3BetaIndex, cdr1BetaIndex);
System.out.println("Cell maps made");
System.out.println("Making well maps");
Map<Integer, Integer> allCDR3s = samplePlate.assayWellsSequenceS(cdr3Indices);
Map<Integer, Integer> allCDR1s = samplePlate.assayWellsSequenceS(cdr1Indices);
int CDR3Count = allCDR3s.size();
System.out.println("all CDR3s count: " + CDR3Count);
int CDR1Count = allCDR1s.size();
System.out.println("all CDR1s count: " + CDR1Count);
System.out.println("Well maps made");
System.out.println("Removing unpaired CDR3s from well maps");
List<Integer> unpairedCDR3s = new ArrayList<>();
for(Integer i: allCDR3s.keySet()){
if(!(cdr3AtoBMap.containsKey(i) || cdr3BtoAMap.containsKey(i))){
unpairedCDR3s.add(i);
}
}
for(Integer i: unpairedCDR3s){
allCDR3s.remove(i);
}
System.out.println("Unpaired CDR3s removed.");
System.out.println("Remaining CDR3 count: " + allCDR3s.size());
System.out.println("Removing below-minimum-overlap-threshold and saturating-occupancy CDR1s");
filterByOccupancyThreshold(allCDR1s, lowThreshold, numWells - 1);
System.out.println("CDR1s removed.");
System.out.println("Remaining CDR1 count: " + allCDR1s.size());
System.out.println("Making vertex maps");
//For the SimpleWeightedBipartiteGraphMatrixGenerator, all vertices must have
// distinct numbers associated with them. Since I'm using a 2D array, that means
// distinct indices between the rows and columns. vertexStartValue lets me track where I switch
// from numbering rows to columns, so I can assign unique numbers to every vertex, and then
// subtract the vertexStartValue from CDR1s to use their vertex labels as array indices
Integer vertexStartValue = 0;
//keys are sequential integer vertices, values are CDR3s
Map<Integer, Integer> plateVtoCDR3Map = makeVertexToSequenceMap(allCDR3s, vertexStartValue);
//New start value for vertex to CDR1 map should be one more than final vertex value in CDR3 map
vertexStartValue += plateVtoCDR3Map.size();
//keys are sequential integers vertices, values are CDR1s
Map<Integer, Integer> plateVtoCDR1Map = makeVertexToSequenceMap(allCDR1s, vertexStartValue);
//keys are CDR3s, values are sequential integer vertices from previous map
Map<Integer, Integer> plateCDR3toVMap = invertVertexMap(plateVtoCDR3Map);
//keys are CDR1s, values are sequential integer vertices from previous map
Map<Integer, Integer> plateCDR1toVMap = invertVertexMap(plateVtoCDR1Map);
System.out.println("Vertex maps made");
System.out.println("Creating adjacency matrix");
//Count how many wells each CDR3 appears in
Map<Integer, Integer> cdr3WellCounts = new HashMap<>();
//count how many wells each CDR1 appears in
Map<Integer, Integer> cdr1WellCounts = new HashMap<>();
//add edges, where weights are number of wells the peptides share in common.
//If this is too slow, can make a 2d array and use the SimpleWeightedGraphMatrixGenerator class
Map<Integer, Integer> wellNCDR3s = null;
Map<Integer, Integer> wellNCDR1s = null;
double[][] weights = new double[plateVtoCDR3Map.size()][plateVtoCDR1Map.size()];
countSequencesAndFillMatrix(samplePlate, allCDR3s, allCDR1s, plateCDR3toVMap, plateCDR1toVMap,
cdr3Indices, cdr1Indices, cdr3WellCounts, cdr1WellCounts, weights);
System.out.println("Matrix created");
System.out.println("Creating graph");
SimpleWeightedGraph<Integer, DefaultWeightedEdge> graph =
new SimpleWeightedGraph<>(DefaultWeightedEdge.class);
SimpleWeightedBipartiteGraphMatrixGenerator graphGenerator = new SimpleWeightedBipartiteGraphMatrixGenerator();
List<Integer> cdr3Vertices = new ArrayList<>(plateVtoCDR3Map.keySet()); //This will work because LinkedHashMap preserves order of entry
graphGenerator.first(cdr3Vertices);
List<Integer> cdr1Vertices = new ArrayList<>(plateVtoCDR1Map.keySet());
graphGenerator.second(cdr1Vertices); //This will work because LinkedHashMap preserves order of entry
graphGenerator.weights(weights);
graphGenerator.generateGraph(graph);
System.out.println("Graph created");
System.out.println("Removing edges outside of weight thresholds");
filterByOccupancyThreshold(graph, lowThreshold, highThreshold);
System.out.println("Over- and under-weight edges set to 0.0");
System.out.println("Finding first maximum weighted matching");
MaximumWeightBipartiteMatching firstMaxWeightMatching =
new MaximumWeightBipartiteMatching(graph, plateVtoCDR3Map.keySet(), plateVtoCDR1Map.keySet());
MatchingAlgorithm.Matching<String, DefaultWeightedEdge> graphMatching = firstMaxWeightMatching.getMatching();
System.out.println("First maximum weighted matching found");
//first processing run
Map<Integer, Integer> firstMatchCDR3toCDR1Map = new HashMap<>();
Iterator<DefaultWeightedEdge> weightIter = graphMatching.iterator();
DefaultWeightedEdge e;
while(weightIter.hasNext()){
e = weightIter.next();
// if(graph.getEdgeWeight(e) < lowThreshold || graph.getEdgeWeight(e) > highThreshold) {
// continue;
// }
Integer source = graph.getEdgeSource(e);
Integer target = graph.getEdgeTarget(e);
firstMatchCDR3toCDR1Map.put(plateVtoCDR3Map.get(source), plateVtoCDR1Map.get(target));
}
System.out.println("First pass matches: " + firstMatchCDR3toCDR1Map.size());
System.out.println("Removing edges from first maximum weighted matching");
//zero out the edge weights in the matching
weightIter = graphMatching.iterator();
while(weightIter.hasNext()){
graph.removeEdge(weightIter.next());
}
System.out.println("Edges removed");
//Generate a new matching
System.out.println("Finding second maximum weighted matching");
MaximumWeightBipartiteMatching secondMaxWeightMatching =
new MaximumWeightBipartiteMatching(graph, plateVtoCDR3Map.keySet(), plateVtoCDR1Map.keySet());
graphMatching = secondMaxWeightMatching.getMatching();
System.out.println("Second maximum weighted matching found");
//second processing run
Map<Integer, Integer> secondMatchCDR3toCDR1Map = new HashMap<>();
weightIter = graphMatching.iterator();
while(weightIter.hasNext()){
e = weightIter.next();
// if(graph.getEdgeWeight(e) < lowThreshold || graph.getEdgeWeight(e) > highThreshold) {
// continue;
// }
Integer source = graph.getEdgeSource(e);
// if(!(CDR3AtoBMap.containsKey(source) || CDR3BtoAMap.containsKey(source))){
// continue;
// }
Integer target = graph.getEdgeTarget(e);
secondMatchCDR3toCDR1Map.put(plateVtoCDR3Map.get(source), plateVtoCDR1Map.get(target));
}
System.out.println("Second pass matches: " + secondMatchCDR3toCDR1Map.size());
System.out.println("Mapping first pass CDR3 alpha/beta pairs");
//get linked map for first matching attempt
Map<Integer, Integer> firstMatchesMap = new LinkedHashMap<>();
for(Integer alphaCDR3: cdr3AtoBMap.keySet()) {
if (!(firstMatchCDR3toCDR1Map.containsKey(alphaCDR3))) {
continue;
}
Integer betaCDR3 = cdr3AtoBMap.get(alphaCDR3);
if (!(firstMatchCDR3toCDR1Map.containsKey(betaCDR3))) {
continue;
}
firstMatchesMap.put(alphaCDR3, firstMatchCDR3toCDR1Map.get(alphaCDR3));
firstMatchesMap.put(betaCDR3, firstMatchCDR3toCDR1Map.get(betaCDR3));
}
System.out.println("First pass CDR3 alpha/beta pairs mapped");
System.out.println("Mapping second pass CDR3 alpha/beta pairs.");
System.out.println("Finding CDR3 pairs that swapped CDR1 matches between first pass and second pass.");
//Look for matches that simply swapped already-matched alpha and beta CDR3s
Map<Integer, Integer> dualMatchesMap = new LinkedHashMap<>();
for(Integer alphaCDR3: cdr3AtoBMap.keySet()) {
if (!(firstMatchCDR3toCDR1Map.containsKey(alphaCDR3) && secondMatchCDR3toCDR1Map.containsKey(alphaCDR3))) {
continue;
}
Integer betaCDR3 = cdr3AtoBMap.get(alphaCDR3);
if (!(firstMatchCDR3toCDR1Map.containsKey(betaCDR3) && secondMatchCDR3toCDR1Map.containsKey(betaCDR3))) {
continue;
}
if(firstMatchCDR3toCDR1Map.get(alphaCDR3).equals(secondMatchCDR3toCDR1Map.get(betaCDR3))){
if(firstMatchCDR3toCDR1Map.get(betaCDR3).equals(secondMatchCDR3toCDR1Map.get(alphaCDR3))){
dualMatchesMap.put(alphaCDR3, firstMatchCDR3toCDR1Map.get(alphaCDR3));
dualMatchesMap.put(betaCDR3, firstMatchCDR3toCDR1Map.get(betaCDR3));
}
}
}
System.out.println("Second pass mapping made. Dual CDR3/CDR1 pairings found.");
Instant stop = Instant.now();
//results for first map
System.out.println("RESULTS FOR FIRST PASS MATCHING");
List<List<String>> allResults = new ArrayList<>();
Integer trueCount = 0;
Iterator iter = firstMatchesMap.keySet().iterator();
while(iter.hasNext()){
Boolean proven = false;
List<String> tmp = new ArrayList<>();
tmp.add(iter.next().toString());
tmp.add(iter.next().toString());
tmp.add(firstMatchesMap.get(Integer.valueOf(tmp.get(0))).toString());
tmp.add(firstMatchesMap.get(Integer.valueOf(tmp.get(1))).toString());
if(alphaCDR3toCDR1Map.get(Integer.valueOf(tmp.get(0))).equals(Integer.valueOf(tmp.get(2)))){
if(betaCDR3toCDR1Map.get(Integer.valueOf(tmp.get(1))).equals(Integer.valueOf(tmp.get(3)))){
proven = true;
}
}
else if(alphaCDR3toCDR1Map.get(Integer.valueOf(tmp.get(0))).equals(Integer.valueOf(tmp.get(3)))){
if(betaCDR3toCDR1Map.get(Integer.valueOf(tmp.get(1))).equals(Integer.valueOf(tmp.get(2)))){
proven = true;
}
}
tmp.add(proven.toString());
allResults.add(tmp);
if(proven){
trueCount++;
}
}
List<String> comments = new ArrayList<>();
comments.add("Plate size: " + samplePlate.getSize() + " wells");
comments.add("Previous pairs found: " + previousMatches.size());
comments.add("CDR1 matches attempted: " + allResults.size());
double attemptRate = (double) allResults.size() / previousMatches.size();
comments.add("Matching attempt rate: " + attemptRate);
comments.add("Number of correct matches: " + trueCount);
double correctRate = (double) trueCount / allResults.size();
comments.add("Correct matching rate: " + correctRate);
NumberFormat nf = NumberFormat.getInstance(Locale.US);
Duration time = Duration.between(start, stop);
time = time.plus(previousTime);
comments.add("Simulation time: " + nf.format(time.toSeconds()) + " seconds");
for(String s: comments){
System.out.println(s);
}
List<String> headers = new ArrayList<>();
headers.add("CDR3 alpha");
headers.add("CDR3 beta");
headers.add("first matched CDR1");
headers.add("second matched CDR1");
headers.add("Correct match?");
MatchingResult firstTest = new MatchingResult(samplePlate.getSourceFileName(),
comments, headers, allResults, dualMatchesMap, time);
//results for dual map
System.out.println("RESULTS FOR SECOND PASS MATCHING");
allResults = new ArrayList<>();
trueCount = 0;
iter = dualMatchesMap.keySet().iterator();
while(iter.hasNext()){
Boolean proven = false;
List<String> tmp = new ArrayList<>();
tmp.add(iter.next().toString());
tmp.add(iter.next().toString());
tmp.add(dualMatchesMap.get(Integer.valueOf(tmp.get(0))).toString());
tmp.add(dualMatchesMap.get(Integer.valueOf(tmp.get(1))).toString());
if(alphaCDR3toCDR1Map.get(Integer.valueOf(tmp.get(0))).equals(Integer.valueOf(tmp.get(2)))){
if(betaCDR3toCDR1Map.get(Integer.valueOf(tmp.get(1))).equals(Integer.valueOf(tmp.get(3)))){
proven = true;
}
}
else if(alphaCDR3toCDR1Map.get(Integer.valueOf(tmp.get(0))).equals(Integer.valueOf(tmp.get(3)))){
if(betaCDR3toCDR1Map.get(Integer.valueOf(tmp.get(1))).equals(Integer.valueOf(tmp.get(2)))){
proven = true;
}
}
tmp.add(proven.toString());
allResults.add(tmp);
if(proven){
trueCount++;
}
}
comments = new ArrayList<>();
comments.add("Plate size: " + samplePlate.getSize() + " wells");
comments.add("Previous pairs found: " + previousMatches.size());
comments.add("High overlap threshold: " + highThreshold);
comments.add("Low overlap threshold: " + lowThreshold);
comments.add("CDR1 matches attempted: " + allResults.size());
attemptRate = (double) allResults.size() / previousMatches.size();
comments.add("Matching attempt rate: " + attemptRate);
comments.add("Number of correct matches: " + trueCount);
correctRate = (double) trueCount / allResults.size();
comments.add("Correct matching rate: " + correctRate);
comments.add("Simulation time: " + nf.format(time.toSeconds()) + " seconds");
for(String s: comments){
System.out.println(s);
}
System.out.println("Simulation time: " + nf.format(time.toSeconds()) + " seconds");
MatchingResult dualTest = new MatchingResult(samplePlate.getSourceFileName(), comments, headers,
allResults, dualMatchesMap, time);
MatchingResult[] output = {firstTest, dualTest};
//return MatchingResult object
return output;
}
//Commented out CDR1 matching until it's time to re-implement it
// //Simulated matching of CDR1s to CDR3s. Requires MatchingResult from prior run of matchCDR3s.
// public static MatchingResult[] matchCDR1s(List<Integer[]> distinctCells,
// Plate samplePlate, Integer lowThreshold,
// Integer highThreshold, MatchingResult priorResult){
// Instant start = Instant.now();
// Duration previousTime = priorResult.getTime();
// Map<Integer, Integer> previousMatches = priorResult.getMatchMap();
// int numWells = samplePlate.getSize();
// int[] cdr3Indices = {cdr3AlphaIndex, cdr3BetaIndex};
// int[] cdr1Indices = {cdr1AlphaIndex, cdr1BetaIndex};
//
// System.out.println("Making previous match maps");
// Map<Integer, Integer> cdr3AtoBMap = previousMatches;
// Map<Integer, Integer> cdr3BtoAMap = invertVertexMap(cdr3AtoBMap);
// System.out.println("Previous match maps made");
//
// System.out.println("Making cell maps");
// Map<Integer, Integer> alphaCDR3toCDR1Map = makeSequenceToSequenceMap(distinctCells, cdr3AlphaIndex, cdr1AlphaIndex);
// Map<Integer, Integer> betaCDR3toCDR1Map = makeSequenceToSequenceMap(distinctCells, cdr3BetaIndex, cdr1BetaIndex);
// System.out.println("Cell maps made");
//
// System.out.println("Making well maps");
// Map<Integer, Integer> allCDR3s = samplePlate.assayWellsSequenceS(cdr3Indices);
// Map<Integer, Integer> allCDR1s = samplePlate.assayWellsSequenceS(cdr1Indices);
// int CDR3Count = allCDR3s.size();
// System.out.println("all CDR3s count: " + CDR3Count);
// int CDR1Count = allCDR1s.size();
// System.out.println("all CDR1s count: " + CDR1Count);
// System.out.println("Well maps made");
//
// System.out.println("Removing unpaired CDR3s from well maps");
// List<Integer> unpairedCDR3s = new ArrayList<>();
// for(Integer i: allCDR3s.keySet()){
// if(!(cdr3AtoBMap.containsKey(i) || cdr3BtoAMap.containsKey(i))){
// unpairedCDR3s.add(i);
// }
// }
// for(Integer i: unpairedCDR3s){
// allCDR3s.remove(i);
// }
// System.out.println("Unpaired CDR3s removed.");
// System.out.println("Remaining CDR3 count: " + allCDR3s.size());
//
// System.out.println("Removing below-minimum-overlap-threshold and saturating-occupancy CDR1s");
// filterByOccupancyThreshold(allCDR1s, lowThreshold, numWells - 1);
// System.out.println("CDR1s removed.");
// System.out.println("Remaining CDR1 count: " + allCDR1s.size());
//
// System.out.println("Making vertex maps");
//
// //For the SimpleWeightedBipartiteGraphMatrixGenerator, all vertices must have
// // distinct numbers associated with them. Since I'm using a 2D array, that means
// // distinct indices between the rows and columns. vertexStartValue lets me track where I switch
// // from numbering rows to columns, so I can assign unique numbers to every vertex, and then
// // subtract the vertexStartValue from CDR1s to use their vertex labels as array indices
// Integer vertexStartValue = 0;
// //keys are sequential integer vertices, values are CDR3s
// Map<Integer, Integer> plateVtoCDR3Map = makeVertexToSequenceMap(allCDR3s, vertexStartValue);
// //New start value for vertex to CDR1 map should be one more than final vertex value in CDR3 map
// vertexStartValue += plateVtoCDR3Map.size();
// //keys are sequential integers vertices, values are CDR1s
// Map<Integer, Integer> plateVtoCDR1Map = makeVertexToSequenceMap(allCDR1s, vertexStartValue);
// //keys are CDR3s, values are sequential integer vertices from previous map
// Map<Integer, Integer> plateCDR3toVMap = invertVertexMap(plateVtoCDR3Map);
// //keys are CDR1s, values are sequential integer vertices from previous map
// Map<Integer, Integer> plateCDR1toVMap = invertVertexMap(plateVtoCDR1Map);
// System.out.println("Vertex maps made");
//
// System.out.println("Creating adjacency matrix");
// //Count how many wells each CDR3 appears in
// Map<Integer, Integer> cdr3WellCounts = new HashMap<>();
// //count how many wells each CDR1 appears in
// Map<Integer, Integer> cdr1WellCounts = new HashMap<>();
// //add edges, where weights are number of wells the peptides share in common.
// //If this is too slow, can make a 2d array and use the SimpleWeightedGraphMatrixGenerator class
// Map<Integer, Integer> wellNCDR3s = null;
// Map<Integer, Integer> wellNCDR1s = null;
// double[][] weights = new double[plateVtoCDR3Map.size()][plateVtoCDR1Map.size()];
// countSequencesAndFillMatrix(samplePlate, allCDR3s, allCDR1s, plateCDR3toVMap, plateCDR1toVMap,
// cdr3Indices, cdr1Indices, cdr3WellCounts, cdr1WellCounts, weights);
// System.out.println("Matrix created");
//
// System.out.println("Creating graph");
// SimpleWeightedGraph<Integer, DefaultWeightedEdge> graph =
// new SimpleWeightedGraph<>(DefaultWeightedEdge.class);
//
// SimpleWeightedBipartiteGraphMatrixGenerator graphGenerator = new SimpleWeightedBipartiteGraphMatrixGenerator();
// List<Integer> cdr3Vertices = new ArrayList<>(plateVtoCDR3Map.keySet()); //This will work because LinkedHashMap preserves order of entry
// graphGenerator.first(cdr3Vertices);
// List<Integer> cdr1Vertices = new ArrayList<>(plateVtoCDR1Map.keySet());
// graphGenerator.second(cdr1Vertices); //This will work because LinkedHashMap preserves order of entry
// graphGenerator.weights(weights);
// graphGenerator.generateGraph(graph);
// System.out.println("Graph created");
//
// System.out.println("Removing edges outside of weight thresholds");
// filterByOccupancyThreshold(graph, lowThreshold, highThreshold);
// System.out.println("Over- and under-weight edges set to 0.0");
//
// System.out.println("Finding first maximum weighted matching");
// MaximumWeightBipartiteMatching firstMaxWeightMatching =
// new MaximumWeightBipartiteMatching(graph, plateVtoCDR3Map.keySet(), plateVtoCDR1Map.keySet());
// MatchingAlgorithm.Matching<String, DefaultWeightedEdge> graphMatching = firstMaxWeightMatching.getMatching();
// System.out.println("First maximum weighted matching found");
//
//
// //first processing run
// Map<Integer, Integer> firstMatchCDR3toCDR1Map = new HashMap<>();
// Iterator<DefaultWeightedEdge> weightIter = graphMatching.iterator();
// DefaultWeightedEdge e;
// while(weightIter.hasNext()){
// e = weightIter.next();
//// if(graph.getEdgeWeight(e) < lowThreshold || graph.getEdgeWeight(e) > highThreshold) {
//// continue;
//// }
// Integer source = graph.getEdgeSource(e);
// Integer target = graph.getEdgeTarget(e);
// firstMatchCDR3toCDR1Map.put(plateVtoCDR3Map.get(source), plateVtoCDR1Map.get(target));
// }
// System.out.println("First pass matches: " + firstMatchCDR3toCDR1Map.size());
//
// System.out.println("Removing edges from first maximum weighted matching");
// //zero out the edge weights in the matching
// weightIter = graphMatching.iterator();
// while(weightIter.hasNext()){
// graph.removeEdge(weightIter.next());
// }
// System.out.println("Edges removed");
//
// //Generate a new matching
// System.out.println("Finding second maximum weighted matching");
// MaximumWeightBipartiteMatching secondMaxWeightMatching =
// new MaximumWeightBipartiteMatching(graph, plateVtoCDR3Map.keySet(), plateVtoCDR1Map.keySet());
// graphMatching = secondMaxWeightMatching.getMatching();
// System.out.println("Second maximum weighted matching found");
//
//
// //second processing run
// Map<Integer, Integer> secondMatchCDR3toCDR1Map = new HashMap<>();
// weightIter = graphMatching.iterator();
// while(weightIter.hasNext()){
// e = weightIter.next();
//// if(graph.getEdgeWeight(e) < lowThreshold || graph.getEdgeWeight(e) > highThreshold) {
//// continue;
//// }
// Integer source = graph.getEdgeSource(e);
//// if(!(CDR3AtoBMap.containsKey(source) || CDR3BtoAMap.containsKey(source))){
//// continue;
//// }
// Integer target = graph.getEdgeTarget(e);
// secondMatchCDR3toCDR1Map.put(plateVtoCDR3Map.get(source), plateVtoCDR1Map.get(target));
// }
// System.out.println("Second pass matches: " + secondMatchCDR3toCDR1Map.size());
//
// System.out.println("Mapping first pass CDR3 alpha/beta pairs");
// //get linked map for first matching attempt
// Map<Integer, Integer> firstMatchesMap = new LinkedHashMap<>();
// for(Integer alphaCDR3: cdr3AtoBMap.keySet()) {
// if (!(firstMatchCDR3toCDR1Map.containsKey(alphaCDR3))) {
// continue;
// }
// Integer betaCDR3 = cdr3AtoBMap.get(alphaCDR3);
// if (!(firstMatchCDR3toCDR1Map.containsKey(betaCDR3))) {
// continue;
// }
// firstMatchesMap.put(alphaCDR3, firstMatchCDR3toCDR1Map.get(alphaCDR3));
// firstMatchesMap.put(betaCDR3, firstMatchCDR3toCDR1Map.get(betaCDR3));
// }
// System.out.println("First pass CDR3 alpha/beta pairs mapped");
//
// System.out.println("Mapping second pass CDR3 alpha/beta pairs.");
// System.out.println("Finding CDR3 pairs that swapped CDR1 matches between first pass and second pass.");
// //Look for matches that simply swapped already-matched alpha and beta CDR3s
// Map<Integer, Integer> dualMatchesMap = new LinkedHashMap<>();
// for(Integer alphaCDR3: cdr3AtoBMap.keySet()) {
// if (!(firstMatchCDR3toCDR1Map.containsKey(alphaCDR3) && secondMatchCDR3toCDR1Map.containsKey(alphaCDR3))) {
// continue;
// }
// Integer betaCDR3 = cdr3AtoBMap.get(alphaCDR3);
// if (!(firstMatchCDR3toCDR1Map.containsKey(betaCDR3) && secondMatchCDR3toCDR1Map.containsKey(betaCDR3))) {
// continue;
// }
// if(firstMatchCDR3toCDR1Map.get(alphaCDR3).equals(secondMatchCDR3toCDR1Map.get(betaCDR3))){
// if(firstMatchCDR3toCDR1Map.get(betaCDR3).equals(secondMatchCDR3toCDR1Map.get(alphaCDR3))){
// dualMatchesMap.put(alphaCDR3, firstMatchCDR3toCDR1Map.get(alphaCDR3));
// dualMatchesMap.put(betaCDR3, firstMatchCDR3toCDR1Map.get(betaCDR3));
// }
// }
// }
// System.out.println("Second pass mapping made. Dual CDR3/CDR1 pairings found.");
//
// Instant stop = Instant.now();
// //results for first map
// System.out.println("RESULTS FOR FIRST PASS MATCHING");
// List<List<String>> allResults = new ArrayList<>();
// Integer trueCount = 0;
// Iterator iter = firstMatchesMap.keySet().iterator();
//
// while(iter.hasNext()){
// Boolean proven = false;
// List<String> tmp = new ArrayList<>();
// tmp.add(iter.next().toString());
// tmp.add(iter.next().toString());
// tmp.add(firstMatchesMap.get(Integer.valueOf(tmp.get(0))).toString());
// tmp.add(firstMatchesMap.get(Integer.valueOf(tmp.get(1))).toString());
// if(alphaCDR3toCDR1Map.get(Integer.valueOf(tmp.get(0))).equals(Integer.valueOf(tmp.get(2)))){
// if(betaCDR3toCDR1Map.get(Integer.valueOf(tmp.get(1))).equals(Integer.valueOf(tmp.get(3)))){
// proven = true;
// }
// }
// else if(alphaCDR3toCDR1Map.get(Integer.valueOf(tmp.get(0))).equals(Integer.valueOf(tmp.get(3)))){
// if(betaCDR3toCDR1Map.get(Integer.valueOf(tmp.get(1))).equals(Integer.valueOf(tmp.get(2)))){
// proven = true;
// }
// }
// tmp.add(proven.toString());
// allResults.add(tmp);
// if(proven){
// trueCount++;
// }
// }
//
// List<String> comments = new ArrayList<>();
// comments.add("Plate size: " + samplePlate.getSize() + " wells");
// comments.add("Previous pairs found: " + previousMatches.size());
// comments.add("CDR1 matches attempted: " + allResults.size());
// double attemptRate = (double) allResults.size() / previousMatches.size();
// comments.add("Matching attempt rate: " + attemptRate);
// comments.add("Number of correct matches: " + trueCount);
// double correctRate = (double) trueCount / allResults.size();
// comments.add("Correct matching rate: " + correctRate);
// NumberFormat nf = NumberFormat.getInstance(Locale.US);
// Duration time = Duration.between(start, stop);
// time = time.plus(previousTime);
// comments.add("Simulation time: " + nf.format(time.toSeconds()) + " seconds");
// for(String s: comments){
// System.out.println(s);
// }
//
//
//
// List<String> headers = new ArrayList<>();
// headers.add("CDR3 alpha");
// headers.add("CDR3 beta");
// headers.add("first matched CDR1");
// headers.add("second matched CDR1");
// headers.add("Correct match?");
//
// MatchingResult firstTest = new MatchingResult(samplePlate.getSourceFileName(),
// comments, headers, allResults, dualMatchesMap, time);
//
// //results for dual map
// System.out.println("RESULTS FOR SECOND PASS MATCHING");
// allResults = new ArrayList<>();
// trueCount = 0;
// iter = dualMatchesMap.keySet().iterator();
// while(iter.hasNext()){
// Boolean proven = false;
// List<String> tmp = new ArrayList<>();
// tmp.add(iter.next().toString());
// tmp.add(iter.next().toString());
// tmp.add(dualMatchesMap.get(Integer.valueOf(tmp.get(0))).toString());
// tmp.add(dualMatchesMap.get(Integer.valueOf(tmp.get(1))).toString());
// if(alphaCDR3toCDR1Map.get(Integer.valueOf(tmp.get(0))).equals(Integer.valueOf(tmp.get(2)))){
// if(betaCDR3toCDR1Map.get(Integer.valueOf(tmp.get(1))).equals(Integer.valueOf(tmp.get(3)))){
// proven = true;
// }
// }
// else if(alphaCDR3toCDR1Map.get(Integer.valueOf(tmp.get(0))).equals(Integer.valueOf(tmp.get(3)))){
// if(betaCDR3toCDR1Map.get(Integer.valueOf(tmp.get(1))).equals(Integer.valueOf(tmp.get(2)))){
// proven = true;
// }
// }
// tmp.add(proven.toString());
// allResults.add(tmp);
// if(proven){
// trueCount++;
// }
// }
//
// comments = new ArrayList<>();
// comments.add("Plate size: " + samplePlate.getSize() + " wells");
// comments.add("Previous pairs found: " + previousMatches.size());
// comments.add("High overlap threshold: " + highThreshold);
// comments.add("Low overlap threshold: " + lowThreshold);
// comments.add("CDR1 matches attempted: " + allResults.size());
// attemptRate = (double) allResults.size() / previousMatches.size();
// comments.add("Matching attempt rate: " + attemptRate);
// comments.add("Number of correct matches: " + trueCount);
// correctRate = (double) trueCount / allResults.size();
// comments.add("Correct matching rate: " + correctRate);
// comments.add("Simulation time: " + nf.format(time.toSeconds()) + " seconds");
//
// for(String s: comments){
// System.out.println(s);
// }
//
// System.out.println("Simulation time: " + nf.format(time.toSeconds()) + " seconds");
// MatchingResult dualTest = new MatchingResult(samplePlate.getSourceFileName(), comments, headers,
// allResults, dualMatchesMap, time);
// MatchingResult[] output = {firstTest, dualTest};
// return output;
// }
//Remove sequences based on occupancy
public static void filterByOccupancyThresholds(Map<Integer, Integer> wellMap, int low, int high){
List<Integer> noise = new ArrayList<>();
for(Integer k: wellMap.keySet()){
if((wellMap.get(k) > high) || (wellMap.get(k) < low)){
noise.add(k);
}
}
for(Integer k: noise) {
wellMap.remove(k);
}
}
//Counts the well occupancy of the row peptides and column peptides into given maps, and
//fills weights in the given 2D array
@@ -621,62 +653,6 @@ public class Simulator {
}
}
private static void filterByOccupancyThreshold(SimpleWeightedGraph<Integer, DefaultWeightedEdge> graph,
int low, int high) {
for(DefaultWeightedEdge e: graph.edgeSet()){
if ((graph.getEdgeWeight(e) > high) || (graph.getEdgeWeight(e) < low)){
graph.setEdgeWeight(e, 0.0);
}
}
}
private static void filterByOccupancyThreshold(Map<Integer, Integer> wellMap, int low, int high){
List<Integer> noise = new ArrayList<>();
for(Integer k: wellMap.keySet()){
if((wellMap.get(k) > high) || (wellMap.get(k) < low)){
noise.add(k);
}
}
for(Integer k: noise) {
wellMap.remove(k);
}
}
//Remove edges for pairs with large occupancy discrepancy
private static void filterByRelativeOccupancy(SimpleWeightedGraph<Integer, DefaultWeightedEdge> graph,
Map<Integer, Integer> alphaWellCounts,
Map<Integer, Integer> betaWellCounts,
Map<Integer, Integer> plateVtoAMap,
Map<Integer, Integer> plateVtoBMap,
Integer maxOccupancyDifference) {
for (DefaultWeightedEdge e : graph.edgeSet()) {
Integer alphaOcc = alphaWellCounts.get(plateVtoAMap.get(graph.getEdgeSource(e)));
Integer betaOcc = betaWellCounts.get(plateVtoBMap.get(graph.getEdgeTarget(e)));
//Adjust this to something cleverer later
if (Math.abs(alphaOcc - betaOcc) >= maxOccupancyDifference) {
graph.setEdgeWeight(e, 0.0);
}
}
}
//Remove edges for pairs where overlap size is significantly lower than the well occupancy
private static void filterByOverlapSize(SimpleWeightedGraph<Integer, DefaultWeightedEdge> graph,
Map<Integer, Integer> alphaWellCounts,
Map<Integer, Integer> betaWellCounts,
Map<Integer, Integer> plateVtoAMap,
Map<Integer, Integer> plateVtoBMap,
Integer minOverlapPercent) {
for (DefaultWeightedEdge e : graph.edgeSet()) {
Integer alphaOcc = alphaWellCounts.get(plateVtoAMap.get(graph.getEdgeSource(e)));
Integer betaOcc = betaWellCounts.get(plateVtoBMap.get(graph.getEdgeTarget(e)));
double weight = graph.getEdgeWeight(e);
double min = minOverlapPercent / 100.0;
if ((weight / alphaOcc < min) || (weight / betaOcc < min)) {
graph.setEdgeWeight(e, 0.0);
}
}
}
private static Map<Integer, Integer> makeSequenceToSequenceMap(List<Integer[]> cells, int keySequenceIndex,
int valueSequenceIndex){
Map<Integer, Integer> keySequenceToValueSequenceMap = new HashMap<>();