update readme
This commit is contained in:
32
readme.md
32
readme.md
@@ -12,7 +12,7 @@ Unlike pairSEQ, which calculates p-values for every TCR alpha/beta overlap and c
|
|||||||
against a null distribution, BiGpairSEQ does not do any statistical calculations
|
against a null distribution, BiGpairSEQ does not do any statistical calculations
|
||||||
directly.
|
directly.
|
||||||
|
|
||||||
BiGpairSEQ creates a [simple bipartite weighted graph](https://en.wikipedia.org/wiki/Bipartite_graph) representing the sample plate.
|
BiGpairSEQ creates a [weightd bipartite graph](https://en.wikipedia.org/wiki/Bipartite_graph) representing the sample plate.
|
||||||
The distinct TCRA and TCRB sequences form the two sets of vertices. Every TCRA/TCRB pair that share a well
|
The distinct TCRA and TCRB sequences form the two sets of vertices. Every TCRA/TCRB pair that share a well
|
||||||
are connected by an edge, with the edge weight set to the number of wells in which both sequences appear.
|
are connected by an edge, with the edge weight set to the number of wells in which both sequences appear.
|
||||||
(Sequences present in *all* wells are filtered out prior to creating the graph, as there is no signal in their occupancy pattern.)
|
(Sequences present in *all* wells are filtered out prior to creating the graph, as there is no signal in their occupancy pattern.)
|
||||||
@@ -69,16 +69,26 @@ Please select an option:
|
|||||||
0) Exit
|
0) Exit
|
||||||
```
|
```
|
||||||
|
|
||||||
### OUTPUT
|
### INPUT/OUTPUT
|
||||||
|
|
||||||
To run the simulation, the program reads and writes 4 kinds of files:
|
To run the simulation, the program reads and writes 4 kinds of files:
|
||||||
* Cell Sample files in CSV format
|
* Cell Sample files in CSV format
|
||||||
* Sample Plate files in CSV format
|
* Sample Plate files in CSV format
|
||||||
* Graph and Data files in binary object serialization format
|
* Graph/Data files in binary object serialization format
|
||||||
* Matching Results files in CSV format
|
* Matching Results files in CSV format
|
||||||
|
|
||||||
|
These files are often generated in sequence. To save file I/O time, the most recent instance of each of these four
|
||||||
|
files either generated or read from disk is cached in program memory. This is especially important for Graph/Data files,
|
||||||
|
which can be several gigabytes in size. Since some simulations may require running multiple,
|
||||||
|
differntly-configured BiGpairSEQ matchings on the same graph, keeping the most recent graph cached drastically reduces
|
||||||
|
execution time.
|
||||||
|
|
||||||
|
Subsequent uses of the same data file won't need to be read in again until another file of that type is used or generated.
|
||||||
|
The program checks whether it needs to update its cached data by comparing filenames as entered by the user. On
|
||||||
|
encountering a new filename, the program flushes its cache and reads in the new file.
|
||||||
|
|
||||||
When entering filenames, it is not necessary to include the file extension (.csv or .ser). When reading or
|
When entering filenames, it is not necessary to include the file extension (.csv or .ser). When reading or
|
||||||
writing files, the program will automatically add the correct extension to any filename without one.
|
writing files, the program will automatically add the correct extension to any filename without one.
|
||||||
|
|
||||||
#### Cell Sample Files
|
#### Cell Sample Files
|
||||||
Cell Sample files consist of any number of distinct "T cells." Every cell contains
|
Cell Sample files consist of any number of distinct "T cells." Every cell contains
|
||||||
@@ -121,7 +131,7 @@ Options when making a Sample Plate file:
|
|||||||
* Standard deviation size
|
* Standard deviation size
|
||||||
* Exponential
|
* Exponential
|
||||||
* Lambda value
|
* Lambda value
|
||||||
* (Based on the slope of the graph in Figure 4C of the pairSEQ paper, the distribution of the original experiment was exponential with a lambda of approximately 0.6. (Howie, et al. 2015))
|
* *(Based on the slope of the graph in Figure 4C of the pairSEQ paper, the distribution of the original experiment was exponential with a lambda of approximately 0.6. (Howie, et al. 2015))*
|
||||||
* Total number of wells on the plate
|
* Total number of wells on the plate
|
||||||
* Number of sections on plate
|
* Number of sections on plate
|
||||||
* Number of T cells per well
|
* Number of T cells per well
|
||||||
@@ -155,8 +165,8 @@ Structure:
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
#### Graph and Data Files
|
#### Graph/Data Files
|
||||||
Graph and Data files are serialized binaries of a Java object containing the weigthed bipartite graph representation of a
|
Graph/Data files are serialized binaries of a Java object containing the weigthed bipartite graph representation of a
|
||||||
Sample Plate, along with the necessary metadata for matching and results output. Making them requires a Cell Sample file
|
Sample Plate, along with the necessary metadata for matching and results output. Making them requires a Cell Sample file
|
||||||
(to construct a list of correct sequence pairs for checking the accuracy of BiGpairSEQ simulations) and a
|
(to construct a list of correct sequence pairs for checking the accuracy of BiGpairSEQ simulations) and a
|
||||||
Sample Plate file (to construct the associated occupancy graph).
|
Sample Plate file (to construct the associated occupancy graph).
|
||||||
@@ -164,7 +174,7 @@ Sample Plate file (to construct the associated occupancy graph).
|
|||||||
These files can be several gigabytes in size. Writing them to a file lets us generate a graph and its metadata once,
|
These files can be several gigabytes in size. Writing them to a file lets us generate a graph and its metadata once,
|
||||||
then use it for multiple different BiGpairSEQ simulations.
|
then use it for multiple different BiGpairSEQ simulations.
|
||||||
|
|
||||||
Options for creating a Graph and Data file:
|
Options for creating a Graph/Data file:
|
||||||
* The Cell Sample file to use
|
* The Cell Sample file to use
|
||||||
* The Sample Plate file to use. (This must have been generated from the selected Cell Sample file.)
|
* The Sample Plate file to use. (This must have been generated from the selected Cell Sample file.)
|
||||||
|
|
||||||
@@ -175,11 +185,7 @@ portable data format may be implemented in the future. The tricky part is encodi
|
|||||||
|
|
||||||
#### Matching Results Files
|
#### Matching Results Files
|
||||||
Matching results files consist of the results of a BiGpairSEQ matching simulation. Making them requires a Graph and
|
Matching results files consist of the results of a BiGpairSEQ matching simulation. Making them requires a Graph and
|
||||||
Data file. To save file I/O time, the data from the most recent Graph and Data file read or generated is cached
|
Data file. Matching results files are in CSV format. Rows are sequence pairings with extra relevant data. Columns are pairing-specific details.
|
||||||
by the simulator. Subsequent BiGpairSEQ simulations run with the same input filename will use the cached version
|
|
||||||
rather than reading in again from disk.
|
|
||||||
|
|
||||||
Files are in CSV format. Rows are sequence pairings with extra relevant data. Columns are pairing-specific details.
|
|
||||||
Metadata about the matching simulation is included as comments. Comments are preceded by `#`.
|
Metadata about the matching simulation is included as comments. Comments are preceded by `#`.
|
||||||
|
|
||||||
Options when running a BiGpairSEQ simulation of CDR3 alpha/beta matching:
|
Options when running a BiGpairSEQ simulation of CDR3 alpha/beta matching:
|
||||||
|
|||||||
Reference in New Issue
Block a user