Compare commits
8 Commits
7f18311054
...
c2db4f87c1
| Author | SHA1 | Date | |
|---|---|---|---|
| c2db4f87c1 | |||
| 8935407ade | |||
| 9fcc20343d | |||
| e4d094d796 | |||
| f385ebc31f | |||
| 8745550e11 | |||
| 41805135b3 | |||
| 373a5e02f9 |
77
readme.md
77
readme.md
@@ -66,6 +66,18 @@ Please select an option:
|
||||
0) Exit
|
||||
```
|
||||
|
||||
By default, the Options menu looks like this:
|
||||
```
|
||||
--------------OPTIONS---------------
|
||||
1) Turn on cell sample file caching
|
||||
2) Turn on plate file caching
|
||||
3) Turn on graph/data file caching
|
||||
4) Turn off serialized binary graph output
|
||||
5) Turn on GraphML graph output
|
||||
6) Maximum weight matching algorithm options
|
||||
0) Return to main menu
|
||||
```
|
||||
|
||||
### INPUT/OUTPUT
|
||||
|
||||
To run the simulation, the program reads and writes 4 kinds of files:
|
||||
@@ -75,21 +87,24 @@ To run the simulation, the program reads and writes 4 kinds of files:
|
||||
* Matching Results files in CSV format
|
||||
|
||||
These files are often generated in sequence. When entering filenames, it is not necessary to include the file extension
|
||||
(.csv or .ser). When reading or writing files, the program will automatically add the correct extension to any filename without one.
|
||||
(.csv or .ser). When reading or writing files, the program will automatically add the correct extension to any filename
|
||||
without one.
|
||||
|
||||
To save file I/O time, the most recent instance of each of these four
|
||||
files either generated or read from disk can be cached in program memory. This is could be important for Graph/Data files,
|
||||
which can be several gigabytes in size. Since some simulations may require running multiple,
|
||||
differently-configured BiGpairSEQ matchings on the same graph, keeping the most recent graph cached may reduce execution time.
|
||||
(The manipulation necessary to re-use a graph incurs its own performance overhead, though, which may scale with graph
|
||||
size faster than file I/O does. If so, caching is best for smaller graphs.)
|
||||
|
||||
When caching is active, subsequent uses of the same data file won't need to be read in again until another file of that type is used or generated,
|
||||
files either generated or read from disk can be cached in program memory. When caching is active, subsequent uses of the
|
||||
same data file won't need to be read in again until another file of that type is used or generated,
|
||||
or caching is turned off for that file type. The program checks whether it needs to update its cached data by comparing
|
||||
filenames as entered by the user. On encountering a new filename, the program flushes its cache and reads in the new file.
|
||||
|
||||
(Note that cached Graph/Data files must be transformed back into their original state after a matching experiment, which
|
||||
may take some time. Whether file I/O or graph transformation takes longer for graph/data files is likely to be
|
||||
device-specific.)
|
||||
|
||||
The program's caching behavior can be controlled in the Options menu. By default, all caching is OFF.
|
||||
|
||||
The program can optionally output Graph/Data files in .GraphML format (.graphml) for data portability. This can be
|
||||
turned on in the Options menu. By default, GraphML output is OFF.
|
||||
|
||||
#### Cell Sample Files
|
||||
Cell Sample files consist of any number of distinct "T cells." Every cell contains
|
||||
four sequences: Alpha CDR3, Beta CDR3, Alpha CDR1, Beta CDR1. The sequences are represented by
|
||||
@@ -181,14 +196,19 @@ Options for creating a Graph/Data file:
|
||||
* The Cell Sample file to use
|
||||
* The Sample Plate file to use. (This must have been generated from the selected Cell Sample file.)
|
||||
|
||||
These files do not have a human-readable structure, and are not portable to other programs. (Export of graphs in a
|
||||
portable data format may be implemented in the future. The tricky part is encoding the necessary metadata.)
|
||||
These files do not have a human-readable structure, and are not portable to other programs.
|
||||
|
||||
(For portability to other software, turn on GraphML output in the Options menu. This will produce a .graphml file
|
||||
for the weighted graph, with vertex attributes sequence, type, and occupancy data.)
|
||||
|
||||
---
|
||||
|
||||
#### Matching Results Files
|
||||
Matching results files consist of the results of a BiGpairSEQ matching simulation. Making them requires a Graph and
|
||||
Data file. Matching results files are in CSV format. Rows are sequence pairings with extra relevant data. Columns are pairing-specific details.
|
||||
Matching results files consist of the results of a BiGpairSEQ matching simulation. Making them requires a serialized
|
||||
binary Graph/Data file (.ser). (Because .graphML files are larger than .ser files, BiGpairSEQ_Sim supports .graphML
|
||||
output only. Graph/data input must use a serialized binary.)
|
||||
|
||||
Matching results files are in CSV format. Rows are sequence pairings with extra relevant data. Columns are pairing-specific details.
|
||||
Metadata about the matching simulation is included as comments. Comments are preceded by `#`.
|
||||
|
||||
Options when running a BiGpairSEQ simulation of CDR3 alpha/beta matching:
|
||||
@@ -258,25 +278,26 @@ slightly less time than the simulation itself. Real elapsed time from start to f
|
||||
* ~~*No, this won't work, because BiGpairSEQ simulations alter the underlying graph based on filtering constraints. Changes would cascade with multiple experiments.*~~
|
||||
* Might have figured out a way to do it, by taking edges out and then putting them back into the graph. This may actually be possible.
|
||||
* It is possible, though the modifications to the graph incur their own performance penalties. Need testing to see which option is best.
|
||||
* See if there's a reasonable way to reformat Sample Plate files so that wells are columns instead of rows.
|
||||
* ~~Problem is variable number of cells in a well~~
|
||||
* ~~Apache Commons CSV library writes entries a row at a time~~
|
||||
* _Got this working, but at the cost of a profoundly strange bug in graph occupancy filtering. Have reverted the repo until I can figure out what caused that. Given how easily Thingiverse transposes CSV matrices in R, might not even be worth fixing._
|
||||
* Re-implement command line arguments, to enable scripting and statistical simulation studies
|
||||
* ~~Implement sample plates with random numbers of T cells per well.~~ DONE
|
||||
* Possible BiGpairSEQ advantage over pairSEQ: BiGpairSEQ is resilient to variations in well population sizes on a sample plate; pairSEQ is not.
|
||||
* preliminary data suggests that BiGpairSEQ behaves roughly as though the whole plate had whatever the *average* well concentration is, but that's still speculative.
|
||||
* Enable GraphML output in addition to serialized object binaries, for data portability
|
||||
* Custom vertex type with attribute for sequence occupancy?
|
||||
* Re-implement CDR1 matching method
|
||||
* Implement Duan and Su's maximum weight matching algorithm
|
||||
* Add controllable algorithm-type parameter?
|
||||
* ~~Test whether pairing heap (currently used) or Fibonacci heap is more efficient for priority queue in current matching algorithm~~ DONE
|
||||
* ~~in theory Fibonacci heap should be more efficient, but complexity overhead may eliminate theoretical advantage~~
|
||||
* ~~Add controllable heap-type parameter?~~
|
||||
* Parameter implemented. For large graphs, Fibonacci heap wins. Now the new default.
|
||||
|
||||
|
||||
* Parameter implemented. Fibonacci heap the current default.
|
||||
* ~~Implement sample plates with random numbers of T cells per well.~~ DONE
|
||||
* Possible BiGpairSEQ advantage over pairSEQ: BiGpairSEQ is resilient to variations in well population sizes on a sample plate; pairSEQ is not.
|
||||
* preliminary data suggests that BiGpairSEQ behaves roughly as though the whole plate had whatever the *average* well concentration is, but that's still speculative.
|
||||
* See if there's a reasonable way to reformat Sample Plate files so that wells are columns instead of rows.
|
||||
* ~~Problem is variable number of cells in a well~~
|
||||
* ~~Apache Commons CSV library writes entries a row at a time~~
|
||||
* _Got this working, but at the cost of a profoundly strange bug in graph occupancy filtering. Have reverted the repo until I can figure out what caused that. Given how easily Thingiverse transposes CSV matrices in R, might not even be worth fixing.
|
||||
* ~~Enable GraphML output in addition to serialized object binaries, for data portability~~ DONE
|
||||
* ~~Custom vertex type with attribute for sequence occupancy?~~ABANDONED
|
||||
* Have a branch where this is implemented, but there's a bug that broke matching. Don't currently have time to fix.
|
||||
* Re-implement command line arguments, to enable scripting and statistical simulation studies
|
||||
* Re-implement CDR1 matching method
|
||||
* Implement Duan and Su's maximum weight matching algorithm
|
||||
* Add controllable algorithm-type parameter?
|
||||
* This would be fun and valuable, but probably take more time than I have for a hobby project.
|
||||
|
||||
|
||||
## CITATIONS
|
||||
* Howie, B., Sherwood, A. M., et al. ["High-throughput pairing of T cell receptor alpha and beta sequences."](https://pubmed.ncbi.nlm.nih.gov/26290413/) Sci. Transl. Med. 7, 301ra131 (2015)
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
import java.util.Random;
|
||||
|
||||
//main class. For choosing interface type and caching file data
|
||||
//main class. For choosing interface type and holding settings
|
||||
public class BiGpairSEQ {
|
||||
|
||||
private static final Random rand = new Random();
|
||||
@@ -14,6 +14,8 @@ public class BiGpairSEQ {
|
||||
private static boolean cachePlate = false;
|
||||
private static boolean cacheGraph = false;
|
||||
private static String priorityQueueHeapType = "FIBONACCI";
|
||||
private static boolean outputBinary = true;
|
||||
private static boolean outputGraphML = false;
|
||||
|
||||
public static void main(String[] args) {
|
||||
if (args.length == 0) {
|
||||
@@ -164,4 +166,11 @@ public class BiGpairSEQ {
|
||||
public static void setFibonacciHeap() {
|
||||
priorityQueueHeapType = "FIBONACCI";
|
||||
}
|
||||
|
||||
public static boolean outputBinary() {return outputBinary;}
|
||||
public static void setOutputBinary(boolean b) {outputBinary = b;}
|
||||
|
||||
public static boolean outputGraphML() {return outputGraphML;}
|
||||
public static void setOutputGraphML(boolean b) {outputGraphML = b;}
|
||||
|
||||
}
|
||||
|
||||
@@ -1,10 +1,37 @@
|
||||
import java.util.ArrayList;
|
||||
import java.util.Collections;
|
||||
import java.util.List;
|
||||
import java.util.stream.IntStream;
|
||||
|
||||
public class CellSample {
|
||||
|
||||
private List<Integer[]> cells;
|
||||
private Integer cdr1Freq;
|
||||
|
||||
public CellSample(Integer numDistinctCells, Integer cdr1Freq){
|
||||
this.cdr1Freq = cdr1Freq;
|
||||
List<Integer> numbersCDR3 = new ArrayList<>();
|
||||
List<Integer> numbersCDR1 = new ArrayList<>();
|
||||
Integer numDistCDR3s = 2 * numDistinctCells + 1;
|
||||
IntStream.range(1, numDistCDR3s + 1).forEach(i -> numbersCDR3.add(i));
|
||||
IntStream.range(numDistCDR3s + 1, numDistCDR3s + 1 + (numDistCDR3s / cdr1Freq) + 1).forEach(i -> numbersCDR1.add(i));
|
||||
Collections.shuffle(numbersCDR3);
|
||||
Collections.shuffle(numbersCDR1);
|
||||
|
||||
//Each cell represented by 4 values
|
||||
//two CDR3s, and two CDR1s. First two values are CDR3s (alpha, beta), second two are CDR1s (alpha, beta)
|
||||
List<Integer[]> distinctCells = new ArrayList<>();
|
||||
for(int i = 0; i < numbersCDR3.size() - 1; i = i + 2){
|
||||
Integer tmpCDR3a = numbersCDR3.get(i);
|
||||
Integer tmpCDR3b = numbersCDR3.get(i+1);
|
||||
Integer tmpCDR1a = numbersCDR1.get(i % numbersCDR1.size());
|
||||
Integer tmpCDR1b = numbersCDR1.get((i+1) % numbersCDR1.size());
|
||||
Integer[] tmp = {tmpCDR3a, tmpCDR3b, tmpCDR1a, tmpCDR1b};
|
||||
distinctCells.add(tmp);
|
||||
}
|
||||
this.cells = distinctCells;
|
||||
}
|
||||
|
||||
public CellSample(List<Integer[]> cells, Integer cdr1Freq){
|
||||
this.cells = cells;
|
||||
this.cdr1Freq = cdr1Freq;
|
||||
|
||||
@@ -288,7 +288,7 @@ public class CommandLineInterface {
|
||||
|
||||
//for calling from command line
|
||||
public static void makeCells(String filename, Integer numCells, Integer cdr1Freq){
|
||||
CellSample sample = Simulator.generateCellSample(numCells, cdr1Freq);
|
||||
CellSample sample = new CellSample(numCells, cdr1Freq);
|
||||
CellFileWriter writer = new CellFileWriter(filename, sample);
|
||||
writer.writeCellsToFile();
|
||||
}
|
||||
|
||||
@@ -1,35 +0,0 @@
|
||||
import org.jgrapht.graph.SimpleWeightedGraph;
|
||||
import org.jgrapht.nio.graphml.GraphMLImporter;
|
||||
|
||||
import java.io.BufferedReader;
|
||||
import java.io.IOException;
|
||||
import java.nio.file.Files;
|
||||
import java.nio.file.Path;
|
||||
|
||||
public class GraphMLFileReader {
|
||||
|
||||
private String filename;
|
||||
private SimpleWeightedGraph graph;
|
||||
|
||||
public GraphMLFileReader(String filename, SimpleWeightedGraph graph) {
|
||||
if(!filename.matches(".*\\.graphml")){
|
||||
filename = filename + ".graphml";
|
||||
}
|
||||
this.filename = filename;
|
||||
this.graph = graph;
|
||||
|
||||
try(//don't need to close reader bc of try-with-resources auto-closing
|
||||
BufferedReader reader = Files.newBufferedReader(Path.of(filename));
|
||||
){
|
||||
GraphMLImporter<SimpleWeightedGraph, BufferedReader> importer = new GraphMLImporter<>();
|
||||
importer.importGraph(graph, reader);
|
||||
}
|
||||
catch (IOException ex) {
|
||||
System.out.println("Graph file " + filename + " not found.");
|
||||
System.err.println(ex);
|
||||
}
|
||||
}
|
||||
|
||||
public SimpleWeightedGraph getGraph() { return graph; }
|
||||
|
||||
}
|
||||
@@ -1,4 +1,8 @@
|
||||
import org.jgrapht.graph.DefaultWeightedEdge;
|
||||
import org.jgrapht.graph.SimpleWeightedGraph;
|
||||
import org.jgrapht.nio.Attribute;
|
||||
import org.jgrapht.nio.AttributeType;
|
||||
import org.jgrapht.nio.DefaultAttribute;
|
||||
import org.jgrapht.nio.dot.DOTExporter;
|
||||
import org.jgrapht.nio.graphml.GraphMLExporter;
|
||||
|
||||
@@ -7,25 +11,69 @@ import java.io.IOException;
|
||||
import java.nio.file.Files;
|
||||
import java.nio.file.Path;
|
||||
import java.nio.file.StandardOpenOption;
|
||||
import java.util.HashMap;
|
||||
import java.util.LinkedHashMap;
|
||||
import java.util.Map;
|
||||
|
||||
public class GraphMLFileWriter {
|
||||
|
||||
String filename;
|
||||
SimpleWeightedGraph graph;
|
||||
GraphWithMapData data;
|
||||
|
||||
|
||||
public GraphMLFileWriter(String filename, SimpleWeightedGraph graph) {
|
||||
public GraphMLFileWriter(String filename, GraphWithMapData data) {
|
||||
if(!filename.matches(".*\\.graphml")){
|
||||
filename = filename + ".graphml";
|
||||
}
|
||||
this.filename = filename;
|
||||
this.graph = graph;
|
||||
this.data = data;
|
||||
}
|
||||
|
||||
// public void writeGraphToFile() {
|
||||
// try(BufferedWriter writer = Files.newBufferedWriter(Path.of(filename), StandardOpenOption.CREATE_NEW);
|
||||
// ){
|
||||
// GraphMLExporter<SimpleWeightedGraph, BufferedWriter> exporter = new GraphMLExporter<>();
|
||||
// exporter.exportGraph(graph, writer);
|
||||
// } catch(IOException ex){
|
||||
// System.out.println("Could not make new file named "+filename);
|
||||
// System.err.println(ex);
|
||||
// }
|
||||
// }
|
||||
|
||||
public void writeGraphToFile() {
|
||||
SimpleWeightedGraph graph = data.getGraph();
|
||||
Map<Integer, Integer> vertexToAlphaMap = data.getPlateVtoAMap();
|
||||
Map<Integer, Integer> vertexToBetaMap = data.getPlateVtoBMap();
|
||||
Map<Integer, Integer> alphaOccs = data.getAlphaWellCounts();
|
||||
Map<Integer, Integer> betaOccs = data.getBetaWellCounts();
|
||||
try(BufferedWriter writer = Files.newBufferedWriter(Path.of(filename), StandardOpenOption.CREATE_NEW);
|
||||
){
|
||||
GraphMLExporter<SimpleWeightedGraph, BufferedWriter> exporter = new GraphMLExporter<>();
|
||||
//create exporter. Let the vertex labels be the unique ids for the vertices
|
||||
GraphMLExporter<Integer, SimpleWeightedGraph<Vertex, DefaultWeightedEdge>> exporter = new GraphMLExporter<>(v -> v.toString());
|
||||
//set to export weights
|
||||
exporter.setExportEdgeWeights(true);
|
||||
//set type, sequence, and occupancy attributes for each vertex
|
||||
exporter.setVertexAttributeProvider( v -> {
|
||||
Map<String, Attribute> attributes = new HashMap<>();
|
||||
if(vertexToAlphaMap.containsKey(v)) {
|
||||
attributes.put("type", DefaultAttribute.createAttribute("CDR3 Alpha"));
|
||||
attributes.put("sequence", DefaultAttribute.createAttribute(vertexToAlphaMap.get(v)));
|
||||
attributes.put("occupancy", DefaultAttribute.createAttribute(
|
||||
alphaOccs.get(vertexToAlphaMap.get(v))));
|
||||
}
|
||||
else if(vertexToBetaMap.containsKey(v)) {
|
||||
attributes.put("type", DefaultAttribute.createAttribute("CDR3 Beta"));
|
||||
attributes.put("sequence", DefaultAttribute.createAttribute(vertexToBetaMap.get(v)));
|
||||
attributes.put("occupancy", DefaultAttribute.createAttribute(
|
||||
betaOccs.get(vertexToBetaMap.get(v))));
|
||||
}
|
||||
return attributes;
|
||||
});
|
||||
//register the attributes
|
||||
exporter.registerAttribute("type", GraphMLExporter.AttributeCategory.NODE, AttributeType.STRING);
|
||||
exporter.registerAttribute("sequence", GraphMLExporter.AttributeCategory.NODE, AttributeType.STRING);
|
||||
exporter.registerAttribute("occupancy", GraphMLExporter.AttributeCategory.NODE, AttributeType.STRING);
|
||||
//export the graph
|
||||
exporter.exportGraph(graph, writer);
|
||||
} catch(IOException ex){
|
||||
System.out.println("Could not make new file named "+filename);
|
||||
@@ -33,3 +81,4 @@ public class GraphMLFileWriter {
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -4,7 +4,6 @@ import org.jgrapht.graph.SimpleWeightedGraph;
|
||||
import java.util.ArrayList;
|
||||
import java.util.List;
|
||||
import java.util.Map;
|
||||
import java.util.Set;
|
||||
|
||||
public interface GraphModificationFunctions {
|
||||
|
||||
|
||||
@@ -74,7 +74,7 @@ public class InteractiveInterface {
|
||||
System.out.println(ex);
|
||||
sc.next();
|
||||
}
|
||||
CellSample sample = Simulator.generateCellSample(numCells, cdr1Freq);
|
||||
CellSample sample = new CellSample(numCells, cdr1Freq);
|
||||
assert filename != null;
|
||||
System.out.println("Writing cells to file");
|
||||
CellFileWriter writer = new CellFileWriter(filename, sample);
|
||||
@@ -252,7 +252,6 @@ public class InteractiveInterface {
|
||||
String filename = null;
|
||||
String cellFile = null;
|
||||
String plateFile = null;
|
||||
|
||||
try {
|
||||
String str = "\nGenerating bipartite weighted graph encoding occupancy overlap data ";
|
||||
str = str.concat("\nrequires a cell sample file and a sample plate file.");
|
||||
@@ -310,9 +309,16 @@ public class InteractiveInterface {
|
||||
List<Integer[]> cells = cellSample.getCells();
|
||||
GraphWithMapData data = Simulator.makeGraph(cells, plate, true);
|
||||
assert filename != null;
|
||||
GraphDataObjectWriter dataWriter = new GraphDataObjectWriter(filename, data);
|
||||
dataWriter.writeDataToFile();
|
||||
System.out.println("Graph and Data file written to: " + filename);
|
||||
if(BiGpairSEQ.outputBinary()) {
|
||||
GraphDataObjectWriter dataWriter = new GraphDataObjectWriter(filename, data);
|
||||
dataWriter.writeDataToFile();
|
||||
System.out.println("Serialized binary graph/data file written to: " + filename);
|
||||
}
|
||||
if(BiGpairSEQ.outputGraphML()) {
|
||||
GraphMLFileWriter graphMLWriter = new GraphMLFileWriter(filename, data);
|
||||
graphMLWriter.writeGraphToFile();
|
||||
System.out.println("GraphML file written to: " + filename);
|
||||
}
|
||||
if(BiGpairSEQ.cacheGraph()) {
|
||||
BiGpairSEQ.setGraphInMemory(data, filename);
|
||||
|
||||
@@ -500,7 +506,9 @@ public class InteractiveInterface {
|
||||
System.out.println("1) Turn " + getOnOff(!BiGpairSEQ.cacheCells()) + " cell sample file caching");
|
||||
System.out.println("2) Turn " + getOnOff(!BiGpairSEQ.cachePlate()) + " plate file caching");
|
||||
System.out.println("3) Turn " + getOnOff(!BiGpairSEQ.cacheGraph()) + " graph/data file caching");
|
||||
System.out.println("4) Maximum weight matching algorithm options");
|
||||
System.out.println("4) Turn " + getOnOff(!BiGpairSEQ.outputBinary()) + " serialized binary graph output");
|
||||
System.out.println("5) Turn " + getOnOff(!BiGpairSEQ.outputGraphML()) + " GraphML graph output");
|
||||
System.out.println("6) Maximum weight matching algorithm options");
|
||||
System.out.println("0) Return to main menu");
|
||||
try {
|
||||
input = sc.nextInt();
|
||||
@@ -508,7 +516,9 @@ public class InteractiveInterface {
|
||||
case 1 -> BiGpairSEQ.setCacheCells(!BiGpairSEQ.cacheCells());
|
||||
case 2 -> BiGpairSEQ.setCachePlate(!BiGpairSEQ.cachePlate());
|
||||
case 3 -> BiGpairSEQ.setCacheGraph(!BiGpairSEQ.cacheGraph());
|
||||
case 4 -> algorithmOptions();
|
||||
case 4 -> BiGpairSEQ.setOutputBinary(!BiGpairSEQ.outputBinary());
|
||||
case 5 -> BiGpairSEQ.setOutputGraphML(!BiGpairSEQ.outputGraphML());
|
||||
case 6 -> algorithmOptions();
|
||||
case 0 -> backToMain = true;
|
||||
default -> System.out.println("Invalid input");
|
||||
}
|
||||
|
||||
@@ -23,30 +23,6 @@ public class Simulator implements GraphModificationFunctions {
|
||||
private static final int cdr1AlphaIndex = 2;
|
||||
private static final int cdr1BetaIndex = 3;
|
||||
|
||||
public static CellSample generateCellSample(Integer numDistinctCells, Integer cdr1Freq) {
|
||||
//In real T cells, CDR1s have about one third the diversity of CDR3s
|
||||
List<Integer> numbersCDR3 = new ArrayList<>();
|
||||
List<Integer> numbersCDR1 = new ArrayList<>();
|
||||
Integer numDistCDR3s = 2 * numDistinctCells + 1;
|
||||
IntStream.range(1, numDistCDR3s + 1).forEach(i -> numbersCDR3.add(i));
|
||||
IntStream.range(numDistCDR3s + 1, numDistCDR3s + 1 + (numDistCDR3s / cdr1Freq) + 1).forEach(i -> numbersCDR1.add(i));
|
||||
Collections.shuffle(numbersCDR3);
|
||||
Collections.shuffle(numbersCDR1);
|
||||
|
||||
//Each cell represented by 4 values
|
||||
//two CDR3s, and two CDR1s. First two values are CDR3s (alpha, beta), second two are CDR1s (alpha, beta)
|
||||
List<Integer[]> distinctCells = new ArrayList<>();
|
||||
for(int i = 0; i < numbersCDR3.size() - 1; i = i + 2){
|
||||
Integer tmpCDR3a = numbersCDR3.get(i);
|
||||
Integer tmpCDR3b = numbersCDR3.get(i+1);
|
||||
Integer tmpCDR1a = numbersCDR1.get(i % numbersCDR1.size());
|
||||
Integer tmpCDR1b = numbersCDR1.get((i+1) % numbersCDR1.size());
|
||||
Integer[] tmp = {tmpCDR3a, tmpCDR3b, tmpCDR1a, tmpCDR1b};
|
||||
distinctCells.add(tmp);
|
||||
}
|
||||
return new CellSample(distinctCells, cdr1Freq);
|
||||
}
|
||||
|
||||
//Make the graph needed for matching CDR3s
|
||||
public static GraphWithMapData makeGraph(List<Integer[]> distinctCells, Plate samplePlate, boolean verbose) {
|
||||
Instant start = Instant.now();
|
||||
@@ -260,6 +236,7 @@ public class Simulator implements GraphModificationFunctions {
|
||||
}
|
||||
|
||||
//Metadata comments for CSV file
|
||||
String algoType = "LEDA book with heap: " + heapType;
|
||||
int min = Math.min(alphaCount, betaCount);
|
||||
//rate of attempted matching
|
||||
double attemptRate = (double) (trueCount + falseCount) / min;
|
||||
@@ -290,6 +267,7 @@ public class Simulator implements GraphModificationFunctions {
|
||||
Map<String, String> metadata = new LinkedHashMap<>();
|
||||
metadata.put("sample plate filename", data.getSourceFilename());
|
||||
metadata.put("graph filename", dataFilename);
|
||||
metadata.put("algorithm type", algoType);
|
||||
metadata.put("well populations", wellPopulationsString);
|
||||
metadata.put("total alphas found", alphaCount.toString());
|
||||
metadata.put("total betas found", betaCount.toString());
|
||||
@@ -690,7 +668,7 @@ public class Simulator implements GraphModificationFunctions {
|
||||
|
||||
private static Map<Integer, Integer> makeVertexToSequenceMap(Map<Integer, Integer> sequences, Integer startValue) {
|
||||
Map<Integer, Integer> map = new LinkedHashMap<>(); //LinkedHashMap to preserve order of entry
|
||||
Integer index = startValue;
|
||||
Integer index = startValue; //is this necessary? I don't think I use this.
|
||||
for (Integer k: sequences.keySet()) {
|
||||
map.put(index, k);
|
||||
index++;
|
||||
|
||||
@@ -1,14 +1,20 @@
|
||||
|
||||
|
||||
public class Vertex {
|
||||
private final Integer peptide;
|
||||
private final Integer vertexLabel;
|
||||
private final Integer sequence;
|
||||
private final Integer occupancy;
|
||||
|
||||
public Vertex(Integer peptide, Integer occupancy) {
|
||||
this.peptide = peptide;
|
||||
public Vertex(Integer vertexLabel, Integer sequence, Integer occupancy) {
|
||||
this.vertexLabel = vertexLabel;
|
||||
this.sequence = sequence;
|
||||
this.occupancy = occupancy;
|
||||
}
|
||||
|
||||
public Integer getPeptide() {
|
||||
return peptide;
|
||||
public Integer getVertexLabel() { return vertexLabel; }
|
||||
|
||||
public Integer getSequence() {
|
||||
return sequence;
|
||||
}
|
||||
|
||||
public Integer getOccupancy() {
|
||||
|
||||
Reference in New Issue
Block a user