Compare commits
16 Commits
v1.4
...
b4cc240048
| Author | SHA1 | Date | |
|---|---|---|---|
| b4cc240048 | |||
| ff72c9b359 | |||
| 88eb8aca50 | |||
| 98bf452891 | |||
| c2db4f87c1 | |||
| 8935407ade | |||
| 9fcc20343d | |||
| e4d094d796 | |||
| f385ebc31f | |||
| 8745550e11 | |||
| 41805135b3 | |||
| 373a5e02f9 | |||
| 7f18311054 | |||
| bcb816c3e6 | |||
| dad0fd35fd | |||
| 35d580cfcf |
96
readme.md
96
readme.md
@@ -12,7 +12,7 @@ Unlike pairSEQ, which calculates p-values for every TCR alpha/beta overlap and c
|
|||||||
against a null distribution, BiGpairSEQ does not do any statistical calculations
|
against a null distribution, BiGpairSEQ does not do any statistical calculations
|
||||||
directly.
|
directly.
|
||||||
|
|
||||||
BiGpairSEQ creates a [weightd bipartite graph](https://en.wikipedia.org/wiki/Bipartite_graph) representing the sample plate.
|
BiGpairSEQ creates a [weighted bipartite graph](https://en.wikipedia.org/wiki/Bipartite_graph) representing the sample plate.
|
||||||
The distinct TCRA and TCRB sequences form the two sets of vertices. Every TCRA/TCRB pair that share a well
|
The distinct TCRA and TCRB sequences form the two sets of vertices. Every TCRA/TCRB pair that share a well
|
||||||
are connected by an edge, with the edge weight set to the number of wells in which both sequences appear.
|
are connected by an edge, with the edge weight set to the number of wells in which both sequences appear.
|
||||||
(Sequences present in *all* wells are filtered out prior to creating the graph, as there is no signal in their occupancy pattern.)
|
(Sequences present in *all* wells are filtered out prior to creating the graph, as there is no signal in their occupancy pattern.)
|
||||||
@@ -66,6 +66,18 @@ Please select an option:
|
|||||||
0) Exit
|
0) Exit
|
||||||
```
|
```
|
||||||
|
|
||||||
|
By default, the Options menu looks like this:
|
||||||
|
```
|
||||||
|
--------------OPTIONS---------------
|
||||||
|
1) Turn on cell sample file caching
|
||||||
|
2) Turn on plate file caching
|
||||||
|
3) Turn on graph/data file caching
|
||||||
|
4) Turn off serialized binary graph output
|
||||||
|
5) Turn on GraphML graph output
|
||||||
|
6) Maximum weight matching algorithm options
|
||||||
|
0) Return to main menu
|
||||||
|
```
|
||||||
|
|
||||||
### INPUT/OUTPUT
|
### INPUT/OUTPUT
|
||||||
|
|
||||||
To run the simulation, the program reads and writes 4 kinds of files:
|
To run the simulation, the program reads and writes 4 kinds of files:
|
||||||
@@ -75,21 +87,25 @@ To run the simulation, the program reads and writes 4 kinds of files:
|
|||||||
* Matching Results files in CSV format
|
* Matching Results files in CSV format
|
||||||
|
|
||||||
These files are often generated in sequence. When entering filenames, it is not necessary to include the file extension
|
These files are often generated in sequence. When entering filenames, it is not necessary to include the file extension
|
||||||
(.csv or .ser). When reading or writing files, the program will automatically add the correct extension to any filename without one.
|
(.csv or .ser). When reading or writing files, the program will automatically add the correct extension to any filename
|
||||||
|
without one.
|
||||||
|
|
||||||
To save file I/O time, the most recent instance of each of these four
|
To save file I/O time, the most recent instance of each of these four
|
||||||
files either generated or read from disk can be cached in program memory. This is could be important for Graph/Data files,
|
files either generated or read from disk can be cached in program memory. When caching is active, subsequent uses of the
|
||||||
which can be several gigabytes in size. Since some simulations may require running multiple,
|
same data file won't need to be read in again until another file of that type is used or generated,
|
||||||
differently-configured BiGpairSEQ matchings on the same graph, keeping the most recent graph cached may reduce execution time.
|
|
||||||
(The manipulation necessary to re-use a graph incurs its own performance overhead, though, which may scale with graph
|
|
||||||
size faster than file I/O does. If so, caching is best for smaller graphs.)
|
|
||||||
|
|
||||||
When caching is active, subsequent uses of the same data file won't need to be read in again until another file of that type is used or generated,
|
|
||||||
or caching is turned off for that file type. The program checks whether it needs to update its cached data by comparing
|
or caching is turned off for that file type. The program checks whether it needs to update its cached data by comparing
|
||||||
filenames as entered by the user. On encountering a new filename, the program flushes its cache and reads in the new file.
|
filenames as entered by the user. On encountering a new filename, the program flushes its cache and reads in the new file.
|
||||||
|
|
||||||
|
(Note that cached Graph/Data files must be transformed back into their original state after a matching experiment, which
|
||||||
|
may take some time. Whether file I/O or graph transformation takes longer for graph/data files is likely to be
|
||||||
|
device-specific.)
|
||||||
|
|
||||||
The program's caching behavior can be controlled in the Options menu. By default, all caching is OFF.
|
The program's caching behavior can be controlled in the Options menu. By default, all caching is OFF.
|
||||||
|
|
||||||
|
The program can optionally output Graph/Data files in .GraphML format (.graphml) for data portability. This can be
|
||||||
|
turned on in the Options menu. By default, GraphML output is OFF.
|
||||||
|
|
||||||
|
---
|
||||||
#### Cell Sample Files
|
#### Cell Sample Files
|
||||||
Cell Sample files consist of any number of distinct "T cells." Every cell contains
|
Cell Sample files consist of any number of distinct "T cells." Every cell contains
|
||||||
four sequences: Alpha CDR3, Beta CDR3, Alpha CDR1, Beta CDR1. The sequences are represented by
|
four sequences: Alpha CDR3, Beta CDR3, Alpha CDR1, Beta CDR1. The sequences are represented by
|
||||||
@@ -107,7 +123,6 @@ Comments are preceded by `#`
|
|||||||
|
|
||||||
Structure:
|
Structure:
|
||||||
|
|
||||||
---
|
|
||||||
# Sample contains 1 unique CDR1 for every 4 unique CDR3s.
|
# Sample contains 1 unique CDR1 for every 4 unique CDR3s.
|
||||||
| Alpha CDR3 | Beta CDR3 | Alpha CDR1 | Beta CDR1 |
|
| Alpha CDR3 | Beta CDR3 | Alpha CDR1 | Beta CDR1 |
|
||||||
|---|---|---|---|
|
|---|---|---|---|
|
||||||
@@ -131,11 +146,14 @@ Options when making a Sample Plate file:
|
|||||||
* Standard deviation size
|
* Standard deviation size
|
||||||
* Exponential
|
* Exponential
|
||||||
* Lambda value
|
* Lambda value
|
||||||
* *(Based on the slope of the graph in Figure 4C of the pairSEQ paper, the distribution of the original experiment was exponential with a lambda of approximately 0.6. (Howie, et al. 2015))*
|
* *(Based on the slope of the graph in Figure 4C of the pairSEQ paper, the distribution of the original experiment was approximately exponential with a lambda ~0.6. (Howie, et al. 2015))*
|
||||||
* Total number of wells on the plate
|
* Total number of wells on the plate
|
||||||
* Number of sections on plate
|
* Well populations random or fixed
|
||||||
* Number of T cells per well
|
* If random, minimum and maximum population sizes
|
||||||
* per section, if more than one section
|
* If fixed
|
||||||
|
* Number of sections on plate
|
||||||
|
* Number of T cells per well
|
||||||
|
* per section, if more than one section
|
||||||
* Dropout rate
|
* Dropout rate
|
||||||
|
|
||||||
Files are in CSV format. There are no header labels. Every row represents a well.
|
Files are in CSV format. There are no header labels. Every row represents a well.
|
||||||
@@ -149,7 +167,6 @@ Dropout sequences are replaced with the value `-1`. Comments are preceded by `#`
|
|||||||
|
|
||||||
Structure:
|
Structure:
|
||||||
|
|
||||||
---
|
|
||||||
```
|
```
|
||||||
# Cell source file name:
|
# Cell source file name:
|
||||||
# Each row represents one well on the plate
|
# Each row represents one well on the plate
|
||||||
@@ -178,14 +195,19 @@ Options for creating a Graph/Data file:
|
|||||||
* The Cell Sample file to use
|
* The Cell Sample file to use
|
||||||
* The Sample Plate file to use. (This must have been generated from the selected Cell Sample file.)
|
* The Sample Plate file to use. (This must have been generated from the selected Cell Sample file.)
|
||||||
|
|
||||||
These files do not have a human-readable structure, and are not portable to other programs. (Export of graphs in a
|
These files do not have a human-readable structure, and are not portable to other programs.
|
||||||
portable data format may be implemented in the future. The tricky part is encoding the necessary metadata.)
|
|
||||||
|
(For portability to other software, turn on GraphML output in the Options menu. This will produce a .graphml file
|
||||||
|
for the weighted graph, with vertex attributes sequence, type, and occupancy data.)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
#### Matching Results Files
|
#### Matching Results Files
|
||||||
Matching results files consist of the results of a BiGpairSEQ matching simulation. Making them requires a Graph and
|
Matching results files consist of the results of a BiGpairSEQ matching simulation. Making them requires a serialized
|
||||||
Data file. Matching results files are in CSV format. Rows are sequence pairings with extra relevant data. Columns are pairing-specific details.
|
binary Graph/Data file (.ser). (Because .graphML files are larger than .ser files, BiGpairSEQ_Sim supports .graphML
|
||||||
|
output only. Graph/data input must use a serialized binary.)
|
||||||
|
|
||||||
|
Matching results files are in CSV format. Rows are sequence pairings with extra relevant data. Columns are pairing-specific details.
|
||||||
Metadata about the matching simulation is included as comments. Comments are preceded by `#`.
|
Metadata about the matching simulation is included as comments. Comments are preceded by `#`.
|
||||||
|
|
||||||
Options when running a BiGpairSEQ simulation of CDR3 alpha/beta matching:
|
Options when running a BiGpairSEQ simulation of CDR3 alpha/beta matching:
|
||||||
@@ -200,7 +222,6 @@ Options when running a BiGpairSEQ simulation of CDR3 alpha/beta matching:
|
|||||||
|
|
||||||
Example output:
|
Example output:
|
||||||
|
|
||||||
---
|
|
||||||
```
|
```
|
||||||
# Source Sample Plate file: 4MilCellsPlate.csv
|
# Source Sample Plate file: 4MilCellsPlate.csv
|
||||||
# Source Graph and Data file: 4MilCellsPlateGraph.ser
|
# Source Graph and Data file: 4MilCellsPlateGraph.ser
|
||||||
@@ -251,29 +272,30 @@ slightly less time than the simulation itself. Real elapsed time from start to f
|
|||||||
## TODO
|
## TODO
|
||||||
|
|
||||||
* ~~Try invoking GC at end of workloads to reduce paging to disk~~ DONE
|
* ~~Try invoking GC at end of workloads to reduce paging to disk~~ DONE
|
||||||
* Hold graph data in memory until another graph is read-in? ~~ABANDONED~~ ~~UNABANDONED~~ DONE
|
* ~~Hold graph data in memory until another graph is read-in? ABANDONED UNABANDONED~~ DONE
|
||||||
* ~~*No, this won't work, because BiGpairSEQ simulations alter the underlying graph based on filtering constraints. Changes would cascade with multiple experiments.*~~
|
* ~~*No, this won't work, because BiGpairSEQ simulations alter the underlying graph based on filtering constraints. Changes would cascade with multiple experiments.*~~
|
||||||
* Might have figured out a way to do it, by taking edges out and then putting them back into the graph. This may actually be possible.
|
* Might have figured out a way to do it, by taking edges out and then putting them back into the graph. This may actually be possible.
|
||||||
* It is possible, though the modifications to the graph incur their own performance penalties. Need testing to see which option is best.
|
* It is possible, though the modifications to the graph incur their own performance penalties. Need testing to see which option is best.
|
||||||
* See if there's a reasonable way to reformat Sample Plate files so that wells are columns instead of rows.
|
|
||||||
* ~~Problem is variable number of cells in a well~~
|
|
||||||
* ~~Apache Commons CSV library writes entries a row at a time~~
|
|
||||||
* _Got this working, but at the cost of a profoundly strange bug in graph occupancy filtering. Have reverted the repo until I can figure out what caused that. Given how easily Thingiverse transposes CSV matrices in R, might not even be worth fixing._
|
|
||||||
* Re-implement command line arguments, to enable scripting and statistical simulation studies
|
|
||||||
* Implement sample plates with random numbers of T cells per well.
|
|
||||||
* Possible BiGpairSEQ advantage over pairSEQ: BiGpairSEQ is resilient to variations in well population sizes on a sample plate; pairSEQ is not.
|
|
||||||
* preliminary data suggests that BiGpairSEQ behaves roughly as though the whole plate had whatever the *average* well concentration is, but that's still speculative.
|
|
||||||
* Enable GraphML output in addition to serialized object binaries, for data portability
|
|
||||||
* Custom vertex type with attribute for sequence occupancy?
|
|
||||||
* Re-implement CDR1 matching method
|
|
||||||
* Implement Duan and Su's maximum weight matching algorithm
|
|
||||||
* Add controllable algorithm-type parameter?
|
|
||||||
* ~~Test whether pairing heap (currently used) or Fibonacci heap is more efficient for priority queue in current matching algorithm~~ DONE
|
* ~~Test whether pairing heap (currently used) or Fibonacci heap is more efficient for priority queue in current matching algorithm~~ DONE
|
||||||
* ~~in theory Fibonacci heap should be more efficient, but complexity overhead may eliminate theoretical advantage~~
|
* ~~in theory Fibonacci heap should be more efficient, but complexity overhead may eliminate theoretical advantage~~
|
||||||
* ~~Add controllable heap-type parameter?~~
|
* ~~Add controllable heap-type parameter?~~
|
||||||
* Parameter implemented. For large graphs, Fibonacci heap wins. Now the new default.
|
* Parameter implemented. Fibonacci heap the current default.
|
||||||
|
* ~~Implement sample plates with random numbers of T cells per well.~~ DONE
|
||||||
|
* Possible BiGpairSEQ advantage over pairSEQ: BiGpairSEQ is resilient to variations in well population sizes on a sample plate; pairSEQ is not.
|
||||||
|
* preliminary data suggests that BiGpairSEQ behaves roughly as though the whole plate had whatever the *average* well concentration is, but that's still speculative.
|
||||||
|
* See if there's a reasonable way to reformat Sample Plate files so that wells are columns instead of rows.
|
||||||
|
* ~~Problem is variable number of cells in a well~~
|
||||||
|
* ~~Apache Commons CSV library writes entries a row at a time~~
|
||||||
|
* _Got this working, but at the cost of a profoundly strange bug in graph occupancy filtering. Have reverted the repo until I can figure out what caused that. Given how easily Thingiverse transposes CSV matrices in R, might not even be worth fixing.
|
||||||
|
* ~~Enable GraphML output in addition to serialized object binaries, for data portability~~ DONE
|
||||||
|
* ~~Custom vertex type with attribute for sequence occupancy?~~ ABANDONED
|
||||||
|
* Have a branch where this is implemented, but there's a bug that broke matching. Don't currently have time to fix.
|
||||||
|
* Re-implement command line arguments, to enable scripting and statistical simulation studies
|
||||||
|
* Re-implement CDR1 matching method
|
||||||
|
* Implement Duan and Su's maximum weight matching algorithm
|
||||||
|
* Add controllable algorithm-type parameter?
|
||||||
|
* This would be fun and valuable, but probably take more time than I have for a hobby project.
|
||||||
|
|
||||||
|
|
||||||
## CITATIONS
|
## CITATIONS
|
||||||
* Howie, B., Sherwood, A. M., et al. ["High-throughput pairing of T cell receptor alpha and beta sequences."](https://pubmed.ncbi.nlm.nih.gov/26290413/) Sci. Transl. Med. 7, 301ra131 (2015)
|
* Howie, B., Sherwood, A. M., et al. ["High-throughput pairing of T cell receptor alpha and beta sequences."](https://pubmed.ncbi.nlm.nih.gov/26290413/) Sci. Transl. Med. 7, 301ra131 (2015)
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
import java.util.Random;
|
import java.util.Random;
|
||||||
|
|
||||||
//main class. For choosing interface type and caching file data
|
//main class. For choosing interface type and holding settings
|
||||||
public class BiGpairSEQ {
|
public class BiGpairSEQ {
|
||||||
|
|
||||||
private static final Random rand = new Random();
|
private static final Random rand = new Random();
|
||||||
@@ -14,6 +14,8 @@ public class BiGpairSEQ {
|
|||||||
private static boolean cachePlate = false;
|
private static boolean cachePlate = false;
|
||||||
private static boolean cacheGraph = false;
|
private static boolean cacheGraph = false;
|
||||||
private static String priorityQueueHeapType = "FIBONACCI";
|
private static String priorityQueueHeapType = "FIBONACCI";
|
||||||
|
private static boolean outputBinary = true;
|
||||||
|
private static boolean outputGraphML = false;
|
||||||
|
|
||||||
public static void main(String[] args) {
|
public static void main(String[] args) {
|
||||||
if (args.length == 0) {
|
if (args.length == 0) {
|
||||||
@@ -164,4 +166,11 @@ public class BiGpairSEQ {
|
|||||||
public static void setFibonacciHeap() {
|
public static void setFibonacciHeap() {
|
||||||
priorityQueueHeapType = "FIBONACCI";
|
priorityQueueHeapType = "FIBONACCI";
|
||||||
}
|
}
|
||||||
|
|
||||||
|
public static boolean outputBinary() {return outputBinary;}
|
||||||
|
public static void setOutputBinary(boolean b) {outputBinary = b;}
|
||||||
|
|
||||||
|
public static boolean outputGraphML() {return outputGraphML;}
|
||||||
|
public static void setOutputGraphML(boolean b) {outputGraphML = b;}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,10 +1,37 @@
|
|||||||
|
import java.util.ArrayList;
|
||||||
|
import java.util.Collections;
|
||||||
import java.util.List;
|
import java.util.List;
|
||||||
|
import java.util.stream.IntStream;
|
||||||
|
|
||||||
public class CellSample {
|
public class CellSample {
|
||||||
|
|
||||||
private List<Integer[]> cells;
|
private List<Integer[]> cells;
|
||||||
private Integer cdr1Freq;
|
private Integer cdr1Freq;
|
||||||
|
|
||||||
|
public CellSample(Integer numDistinctCells, Integer cdr1Freq){
|
||||||
|
this.cdr1Freq = cdr1Freq;
|
||||||
|
List<Integer> numbersCDR3 = new ArrayList<>();
|
||||||
|
List<Integer> numbersCDR1 = new ArrayList<>();
|
||||||
|
Integer numDistCDR3s = 2 * numDistinctCells + 1;
|
||||||
|
IntStream.range(1, numDistCDR3s + 1).forEach(i -> numbersCDR3.add(i));
|
||||||
|
IntStream.range(numDistCDR3s + 1, numDistCDR3s + 1 + (numDistCDR3s / cdr1Freq) + 1).forEach(i -> numbersCDR1.add(i));
|
||||||
|
Collections.shuffle(numbersCDR3);
|
||||||
|
Collections.shuffle(numbersCDR1);
|
||||||
|
|
||||||
|
//Each cell represented by 4 values
|
||||||
|
//two CDR3s, and two CDR1s. First two values are CDR3s (alpha, beta), second two are CDR1s (alpha, beta)
|
||||||
|
List<Integer[]> distinctCells = new ArrayList<>();
|
||||||
|
for(int i = 0; i < numbersCDR3.size() - 1; i = i + 2){
|
||||||
|
Integer tmpCDR3a = numbersCDR3.get(i);
|
||||||
|
Integer tmpCDR3b = numbersCDR3.get(i+1);
|
||||||
|
Integer tmpCDR1a = numbersCDR1.get(i % numbersCDR1.size());
|
||||||
|
Integer tmpCDR1b = numbersCDR1.get((i+1) % numbersCDR1.size());
|
||||||
|
Integer[] tmp = {tmpCDR3a, tmpCDR3b, tmpCDR1a, tmpCDR1b};
|
||||||
|
distinctCells.add(tmp);
|
||||||
|
}
|
||||||
|
this.cells = distinctCells;
|
||||||
|
}
|
||||||
|
|
||||||
public CellSample(List<Integer[]> cells, Integer cdr1Freq){
|
public CellSample(List<Integer[]> cells, Integer cdr1Freq){
|
||||||
this.cells = cells;
|
this.cells = cells;
|
||||||
this.cdr1Freq = cdr1Freq;
|
this.cdr1Freq = cdr1Freq;
|
||||||
|
|||||||
@@ -288,7 +288,7 @@ public class CommandLineInterface {
|
|||||||
|
|
||||||
//for calling from command line
|
//for calling from command line
|
||||||
public static void makeCells(String filename, Integer numCells, Integer cdr1Freq){
|
public static void makeCells(String filename, Integer numCells, Integer cdr1Freq){
|
||||||
CellSample sample = Simulator.generateCellSample(numCells, cdr1Freq);
|
CellSample sample = new CellSample(numCells, cdr1Freq);
|
||||||
CellFileWriter writer = new CellFileWriter(filename, sample);
|
CellFileWriter writer = new CellFileWriter(filename, sample);
|
||||||
writer.writeCellsToFile();
|
writer.writeCellsToFile();
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,35 +0,0 @@
|
|||||||
import org.jgrapht.graph.SimpleWeightedGraph;
|
|
||||||
import org.jgrapht.nio.graphml.GraphMLImporter;
|
|
||||||
|
|
||||||
import java.io.BufferedReader;
|
|
||||||
import java.io.IOException;
|
|
||||||
import java.nio.file.Files;
|
|
||||||
import java.nio.file.Path;
|
|
||||||
|
|
||||||
public class GraphMLFileReader {
|
|
||||||
|
|
||||||
private String filename;
|
|
||||||
private SimpleWeightedGraph graph;
|
|
||||||
|
|
||||||
public GraphMLFileReader(String filename, SimpleWeightedGraph graph) {
|
|
||||||
if(!filename.matches(".*\\.graphml")){
|
|
||||||
filename = filename + ".graphml";
|
|
||||||
}
|
|
||||||
this.filename = filename;
|
|
||||||
this.graph = graph;
|
|
||||||
|
|
||||||
try(//don't need to close reader bc of try-with-resources auto-closing
|
|
||||||
BufferedReader reader = Files.newBufferedReader(Path.of(filename));
|
|
||||||
){
|
|
||||||
GraphMLImporter<SimpleWeightedGraph, BufferedReader> importer = new GraphMLImporter<>();
|
|
||||||
importer.importGraph(graph, reader);
|
|
||||||
}
|
|
||||||
catch (IOException ex) {
|
|
||||||
System.out.println("Graph file " + filename + " not found.");
|
|
||||||
System.err.println(ex);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
public SimpleWeightedGraph getGraph() { return graph; }
|
|
||||||
|
|
||||||
}
|
|
||||||
@@ -1,4 +1,8 @@
|
|||||||
|
import org.jgrapht.graph.DefaultWeightedEdge;
|
||||||
import org.jgrapht.graph.SimpleWeightedGraph;
|
import org.jgrapht.graph.SimpleWeightedGraph;
|
||||||
|
import org.jgrapht.nio.Attribute;
|
||||||
|
import org.jgrapht.nio.AttributeType;
|
||||||
|
import org.jgrapht.nio.DefaultAttribute;
|
||||||
import org.jgrapht.nio.dot.DOTExporter;
|
import org.jgrapht.nio.dot.DOTExporter;
|
||||||
import org.jgrapht.nio.graphml.GraphMLExporter;
|
import org.jgrapht.nio.graphml.GraphMLExporter;
|
||||||
|
|
||||||
@@ -7,25 +11,69 @@ import java.io.IOException;
|
|||||||
import java.nio.file.Files;
|
import java.nio.file.Files;
|
||||||
import java.nio.file.Path;
|
import java.nio.file.Path;
|
||||||
import java.nio.file.StandardOpenOption;
|
import java.nio.file.StandardOpenOption;
|
||||||
|
import java.util.HashMap;
|
||||||
|
import java.util.LinkedHashMap;
|
||||||
|
import java.util.Map;
|
||||||
|
|
||||||
public class GraphMLFileWriter {
|
public class GraphMLFileWriter {
|
||||||
|
|
||||||
String filename;
|
String filename;
|
||||||
SimpleWeightedGraph graph;
|
GraphWithMapData data;
|
||||||
|
|
||||||
|
|
||||||
public GraphMLFileWriter(String filename, SimpleWeightedGraph graph) {
|
public GraphMLFileWriter(String filename, GraphWithMapData data) {
|
||||||
if(!filename.matches(".*\\.graphml")){
|
if(!filename.matches(".*\\.graphml")){
|
||||||
filename = filename + ".graphml";
|
filename = filename + ".graphml";
|
||||||
}
|
}
|
||||||
this.filename = filename;
|
this.filename = filename;
|
||||||
this.graph = graph;
|
this.data = data;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// public void writeGraphToFile() {
|
||||||
|
// try(BufferedWriter writer = Files.newBufferedWriter(Path.of(filename), StandardOpenOption.CREATE_NEW);
|
||||||
|
// ){
|
||||||
|
// GraphMLExporter<SimpleWeightedGraph, BufferedWriter> exporter = new GraphMLExporter<>();
|
||||||
|
// exporter.exportGraph(graph, writer);
|
||||||
|
// } catch(IOException ex){
|
||||||
|
// System.out.println("Could not make new file named "+filename);
|
||||||
|
// System.err.println(ex);
|
||||||
|
// }
|
||||||
|
// }
|
||||||
|
|
||||||
public void writeGraphToFile() {
|
public void writeGraphToFile() {
|
||||||
|
SimpleWeightedGraph graph = data.getGraph();
|
||||||
|
Map<Integer, Integer> vertexToAlphaMap = data.getPlateVtoAMap();
|
||||||
|
Map<Integer, Integer> vertexToBetaMap = data.getPlateVtoBMap();
|
||||||
|
Map<Integer, Integer> alphaOccs = data.getAlphaWellCounts();
|
||||||
|
Map<Integer, Integer> betaOccs = data.getBetaWellCounts();
|
||||||
try(BufferedWriter writer = Files.newBufferedWriter(Path.of(filename), StandardOpenOption.CREATE_NEW);
|
try(BufferedWriter writer = Files.newBufferedWriter(Path.of(filename), StandardOpenOption.CREATE_NEW);
|
||||||
){
|
){
|
||||||
GraphMLExporter<SimpleWeightedGraph, BufferedWriter> exporter = new GraphMLExporter<>();
|
//create exporter. Let the vertex labels be the unique ids for the vertices
|
||||||
|
GraphMLExporter<Integer, SimpleWeightedGraph<Vertex, DefaultWeightedEdge>> exporter = new GraphMLExporter<>(v -> v.toString());
|
||||||
|
//set to export weights
|
||||||
|
exporter.setExportEdgeWeights(true);
|
||||||
|
//set type, sequence, and occupancy attributes for each vertex
|
||||||
|
exporter.setVertexAttributeProvider( v -> {
|
||||||
|
Map<String, Attribute> attributes = new HashMap<>();
|
||||||
|
if(vertexToAlphaMap.containsKey(v)) {
|
||||||
|
attributes.put("type", DefaultAttribute.createAttribute("CDR3 Alpha"));
|
||||||
|
attributes.put("sequence", DefaultAttribute.createAttribute(vertexToAlphaMap.get(v)));
|
||||||
|
attributes.put("occupancy", DefaultAttribute.createAttribute(
|
||||||
|
alphaOccs.get(vertexToAlphaMap.get(v))));
|
||||||
|
}
|
||||||
|
else if(vertexToBetaMap.containsKey(v)) {
|
||||||
|
attributes.put("type", DefaultAttribute.createAttribute("CDR3 Beta"));
|
||||||
|
attributes.put("sequence", DefaultAttribute.createAttribute(vertexToBetaMap.get(v)));
|
||||||
|
attributes.put("occupancy", DefaultAttribute.createAttribute(
|
||||||
|
betaOccs.get(vertexToBetaMap.get(v))));
|
||||||
|
}
|
||||||
|
return attributes;
|
||||||
|
});
|
||||||
|
//register the attributes
|
||||||
|
exporter.registerAttribute("type", GraphMLExporter.AttributeCategory.NODE, AttributeType.STRING);
|
||||||
|
exporter.registerAttribute("sequence", GraphMLExporter.AttributeCategory.NODE, AttributeType.STRING);
|
||||||
|
exporter.registerAttribute("occupancy", GraphMLExporter.AttributeCategory.NODE, AttributeType.STRING);
|
||||||
|
//export the graph
|
||||||
exporter.exportGraph(graph, writer);
|
exporter.exportGraph(graph, writer);
|
||||||
} catch(IOException ex){
|
} catch(IOException ex){
|
||||||
System.out.println("Could not make new file named "+filename);
|
System.out.println("Could not make new file named "+filename);
|
||||||
@@ -33,3 +81,4 @@ public class GraphMLFileWriter {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -4,7 +4,6 @@ import org.jgrapht.graph.SimpleWeightedGraph;
|
|||||||
import java.util.ArrayList;
|
import java.util.ArrayList;
|
||||||
import java.util.List;
|
import java.util.List;
|
||||||
import java.util.Map;
|
import java.util.Map;
|
||||||
import java.util.Set;
|
|
||||||
|
|
||||||
public interface GraphModificationFunctions {
|
public interface GraphModificationFunctions {
|
||||||
|
|
||||||
|
|||||||
@@ -74,7 +74,7 @@ public class InteractiveInterface {
|
|||||||
System.out.println(ex);
|
System.out.println(ex);
|
||||||
sc.next();
|
sc.next();
|
||||||
}
|
}
|
||||||
CellSample sample = Simulator.generateCellSample(numCells, cdr1Freq);
|
CellSample sample = new CellSample(numCells, cdr1Freq);
|
||||||
assert filename != null;
|
assert filename != null;
|
||||||
System.out.println("Writing cells to file");
|
System.out.println("Writing cells to file");
|
||||||
CellFileWriter writer = new CellFileWriter(filename, sample);
|
CellFileWriter writer = new CellFileWriter(filename, sample);
|
||||||
@@ -252,7 +252,6 @@ public class InteractiveInterface {
|
|||||||
String filename = null;
|
String filename = null;
|
||||||
String cellFile = null;
|
String cellFile = null;
|
||||||
String plateFile = null;
|
String plateFile = null;
|
||||||
|
|
||||||
try {
|
try {
|
||||||
String str = "\nGenerating bipartite weighted graph encoding occupancy overlap data ";
|
String str = "\nGenerating bipartite weighted graph encoding occupancy overlap data ";
|
||||||
str = str.concat("\nrequires a cell sample file and a sample plate file.");
|
str = str.concat("\nrequires a cell sample file and a sample plate file.");
|
||||||
@@ -310,9 +309,16 @@ public class InteractiveInterface {
|
|||||||
List<Integer[]> cells = cellSample.getCells();
|
List<Integer[]> cells = cellSample.getCells();
|
||||||
GraphWithMapData data = Simulator.makeGraph(cells, plate, true);
|
GraphWithMapData data = Simulator.makeGraph(cells, plate, true);
|
||||||
assert filename != null;
|
assert filename != null;
|
||||||
GraphDataObjectWriter dataWriter = new GraphDataObjectWriter(filename, data);
|
if(BiGpairSEQ.outputBinary()) {
|
||||||
dataWriter.writeDataToFile();
|
GraphDataObjectWriter dataWriter = new GraphDataObjectWriter(filename, data);
|
||||||
System.out.println("Graph and Data file written to: " + filename);
|
dataWriter.writeDataToFile();
|
||||||
|
System.out.println("Serialized binary graph/data file written to: " + filename);
|
||||||
|
}
|
||||||
|
if(BiGpairSEQ.outputGraphML()) {
|
||||||
|
GraphMLFileWriter graphMLWriter = new GraphMLFileWriter(filename, data);
|
||||||
|
graphMLWriter.writeGraphToFile();
|
||||||
|
System.out.println("GraphML file written to: " + filename);
|
||||||
|
}
|
||||||
if(BiGpairSEQ.cacheGraph()) {
|
if(BiGpairSEQ.cacheGraph()) {
|
||||||
BiGpairSEQ.setGraphInMemory(data, filename);
|
BiGpairSEQ.setGraphInMemory(data, filename);
|
||||||
|
|
||||||
@@ -500,7 +506,9 @@ public class InteractiveInterface {
|
|||||||
System.out.println("1) Turn " + getOnOff(!BiGpairSEQ.cacheCells()) + " cell sample file caching");
|
System.out.println("1) Turn " + getOnOff(!BiGpairSEQ.cacheCells()) + " cell sample file caching");
|
||||||
System.out.println("2) Turn " + getOnOff(!BiGpairSEQ.cachePlate()) + " plate file caching");
|
System.out.println("2) Turn " + getOnOff(!BiGpairSEQ.cachePlate()) + " plate file caching");
|
||||||
System.out.println("3) Turn " + getOnOff(!BiGpairSEQ.cacheGraph()) + " graph/data file caching");
|
System.out.println("3) Turn " + getOnOff(!BiGpairSEQ.cacheGraph()) + " graph/data file caching");
|
||||||
System.out.println("4) Maximum weight matching algorithm options");
|
System.out.println("4) Turn " + getOnOff(!BiGpairSEQ.outputBinary()) + " serialized binary graph output");
|
||||||
|
System.out.println("5) Turn " + getOnOff(!BiGpairSEQ.outputGraphML()) + " GraphML graph output");
|
||||||
|
System.out.println("6) Maximum weight matching algorithm options");
|
||||||
System.out.println("0) Return to main menu");
|
System.out.println("0) Return to main menu");
|
||||||
try {
|
try {
|
||||||
input = sc.nextInt();
|
input = sc.nextInt();
|
||||||
@@ -508,7 +516,9 @@ public class InteractiveInterface {
|
|||||||
case 1 -> BiGpairSEQ.setCacheCells(!BiGpairSEQ.cacheCells());
|
case 1 -> BiGpairSEQ.setCacheCells(!BiGpairSEQ.cacheCells());
|
||||||
case 2 -> BiGpairSEQ.setCachePlate(!BiGpairSEQ.cachePlate());
|
case 2 -> BiGpairSEQ.setCachePlate(!BiGpairSEQ.cachePlate());
|
||||||
case 3 -> BiGpairSEQ.setCacheGraph(!BiGpairSEQ.cacheGraph());
|
case 3 -> BiGpairSEQ.setCacheGraph(!BiGpairSEQ.cacheGraph());
|
||||||
case 4 -> algorithmOptions();
|
case 4 -> BiGpairSEQ.setOutputBinary(!BiGpairSEQ.outputBinary());
|
||||||
|
case 5 -> BiGpairSEQ.setOutputGraphML(!BiGpairSEQ.outputGraphML());
|
||||||
|
case 6 -> algorithmOptions();
|
||||||
case 0 -> backToMain = true;
|
case 0 -> backToMain = true;
|
||||||
default -> System.out.println("Invalid input");
|
default -> System.out.println("Invalid input");
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -23,30 +23,6 @@ public class Simulator implements GraphModificationFunctions {
|
|||||||
private static final int cdr1AlphaIndex = 2;
|
private static final int cdr1AlphaIndex = 2;
|
||||||
private static final int cdr1BetaIndex = 3;
|
private static final int cdr1BetaIndex = 3;
|
||||||
|
|
||||||
public static CellSample generateCellSample(Integer numDistinctCells, Integer cdr1Freq) {
|
|
||||||
//In real T cells, CDR1s have about one third the diversity of CDR3s
|
|
||||||
List<Integer> numbersCDR3 = new ArrayList<>();
|
|
||||||
List<Integer> numbersCDR1 = new ArrayList<>();
|
|
||||||
Integer numDistCDR3s = 2 * numDistinctCells + 1;
|
|
||||||
IntStream.range(1, numDistCDR3s + 1).forEach(i -> numbersCDR3.add(i));
|
|
||||||
IntStream.range(numDistCDR3s + 1, numDistCDR3s + 1 + (numDistCDR3s / cdr1Freq) + 1).forEach(i -> numbersCDR1.add(i));
|
|
||||||
Collections.shuffle(numbersCDR3);
|
|
||||||
Collections.shuffle(numbersCDR1);
|
|
||||||
|
|
||||||
//Each cell represented by 4 values
|
|
||||||
//two CDR3s, and two CDR1s. First two values are CDR3s (alpha, beta), second two are CDR1s (alpha, beta)
|
|
||||||
List<Integer[]> distinctCells = new ArrayList<>();
|
|
||||||
for(int i = 0; i < numbersCDR3.size() - 1; i = i + 2){
|
|
||||||
Integer tmpCDR3a = numbersCDR3.get(i);
|
|
||||||
Integer tmpCDR3b = numbersCDR3.get(i+1);
|
|
||||||
Integer tmpCDR1a = numbersCDR1.get(i % numbersCDR1.size());
|
|
||||||
Integer tmpCDR1b = numbersCDR1.get((i+1) % numbersCDR1.size());
|
|
||||||
Integer[] tmp = {tmpCDR3a, tmpCDR3b, tmpCDR1a, tmpCDR1b};
|
|
||||||
distinctCells.add(tmp);
|
|
||||||
}
|
|
||||||
return new CellSample(distinctCells, cdr1Freq);
|
|
||||||
}
|
|
||||||
|
|
||||||
//Make the graph needed for matching CDR3s
|
//Make the graph needed for matching CDR3s
|
||||||
public static GraphWithMapData makeGraph(List<Integer[]> distinctCells, Plate samplePlate, boolean verbose) {
|
public static GraphWithMapData makeGraph(List<Integer[]> distinctCells, Plate samplePlate, boolean verbose) {
|
||||||
Instant start = Instant.now();
|
Instant start = Instant.now();
|
||||||
@@ -260,6 +236,7 @@ public class Simulator implements GraphModificationFunctions {
|
|||||||
}
|
}
|
||||||
|
|
||||||
//Metadata comments for CSV file
|
//Metadata comments for CSV file
|
||||||
|
String algoType = "LEDA book with heap: " + heapType;
|
||||||
int min = Math.min(alphaCount, betaCount);
|
int min = Math.min(alphaCount, betaCount);
|
||||||
//rate of attempted matching
|
//rate of attempted matching
|
||||||
double attemptRate = (double) (trueCount + falseCount) / min;
|
double attemptRate = (double) (trueCount + falseCount) / min;
|
||||||
@@ -290,6 +267,7 @@ public class Simulator implements GraphModificationFunctions {
|
|||||||
Map<String, String> metadata = new LinkedHashMap<>();
|
Map<String, String> metadata = new LinkedHashMap<>();
|
||||||
metadata.put("sample plate filename", data.getSourceFilename());
|
metadata.put("sample plate filename", data.getSourceFilename());
|
||||||
metadata.put("graph filename", dataFilename);
|
metadata.put("graph filename", dataFilename);
|
||||||
|
metadata.put("algorithm type", algoType);
|
||||||
metadata.put("well populations", wellPopulationsString);
|
metadata.put("well populations", wellPopulationsString);
|
||||||
metadata.put("total alphas found", alphaCount.toString());
|
metadata.put("total alphas found", alphaCount.toString());
|
||||||
metadata.put("total betas found", betaCount.toString());
|
metadata.put("total betas found", betaCount.toString());
|
||||||
@@ -690,7 +668,7 @@ public class Simulator implements GraphModificationFunctions {
|
|||||||
|
|
||||||
private static Map<Integer, Integer> makeVertexToSequenceMap(Map<Integer, Integer> sequences, Integer startValue) {
|
private static Map<Integer, Integer> makeVertexToSequenceMap(Map<Integer, Integer> sequences, Integer startValue) {
|
||||||
Map<Integer, Integer> map = new LinkedHashMap<>(); //LinkedHashMap to preserve order of entry
|
Map<Integer, Integer> map = new LinkedHashMap<>(); //LinkedHashMap to preserve order of entry
|
||||||
Integer index = startValue;
|
Integer index = startValue; //is this necessary? I don't think I use this.
|
||||||
for (Integer k: sequences.keySet()) {
|
for (Integer k: sequences.keySet()) {
|
||||||
map.put(index, k);
|
map.put(index, k);
|
||||||
index++;
|
index++;
|
||||||
|
|||||||
@@ -1,14 +1,20 @@
|
|||||||
|
|
||||||
|
|
||||||
public class Vertex {
|
public class Vertex {
|
||||||
private final Integer peptide;
|
private final Integer vertexLabel;
|
||||||
|
private final Integer sequence;
|
||||||
private final Integer occupancy;
|
private final Integer occupancy;
|
||||||
|
|
||||||
public Vertex(Integer peptide, Integer occupancy) {
|
public Vertex(Integer vertexLabel, Integer sequence, Integer occupancy) {
|
||||||
this.peptide = peptide;
|
this.vertexLabel = vertexLabel;
|
||||||
|
this.sequence = sequence;
|
||||||
this.occupancy = occupancy;
|
this.occupancy = occupancy;
|
||||||
}
|
}
|
||||||
|
|
||||||
public Integer getPeptide() {
|
public Integer getVertexLabel() { return vertexLabel; }
|
||||||
return peptide;
|
|
||||||
|
public Integer getSequence() {
|
||||||
|
return sequence;
|
||||||
}
|
}
|
||||||
|
|
||||||
public Integer getOccupancy() {
|
public Integer getOccupancy() {
|
||||||
|
|||||||
Reference in New Issue
Block a user