update readme
This commit is contained in:
30
readme.md
30
readme.md
@@ -12,7 +12,7 @@ Unlike pairSEQ, which calculates p-values for every TCR alpha/beta overlap and c
|
||||
against a null distribution, BiGpairSEQ does not do any statistical calculations
|
||||
directly.
|
||||
|
||||
BiGpairSEQ creates a [simple bipartite weighted graph](https://en.wikipedia.org/wiki/Bipartite_graph) representing the sample plate.
|
||||
BiGpairSEQ creates a [weightd bipartite graph](https://en.wikipedia.org/wiki/Bipartite_graph) representing the sample plate.
|
||||
The distinct TCRA and TCRB sequences form the two sets of vertices. Every TCRA/TCRB pair that share a well
|
||||
are connected by an edge, with the edge weight set to the number of wells in which both sequences appear.
|
||||
(Sequences present in *all* wells are filtered out prior to creating the graph, as there is no signal in their occupancy pattern.)
|
||||
@@ -69,14 +69,24 @@ Please select an option:
|
||||
0) Exit
|
||||
```
|
||||
|
||||
### OUTPUT
|
||||
### INPUT/OUTPUT
|
||||
|
||||
To run the simulation, the program reads and writes 4 kinds of files:
|
||||
* Cell Sample files in CSV format
|
||||
* Sample Plate files in CSV format
|
||||
* Graph and Data files in binary object serialization format
|
||||
* Graph/Data files in binary object serialization format
|
||||
* Matching Results files in CSV format
|
||||
|
||||
These files are often generated in sequence. To save file I/O time, the most recent instance of each of these four
|
||||
files either generated or read from disk is cached in program memory. This is especially important for Graph/Data files,
|
||||
which can be several gigabytes in size. Since some simulations may require running multiple,
|
||||
differntly-configured BiGpairSEQ matchings on the same graph, keeping the most recent graph cached drastically reduces
|
||||
execution time.
|
||||
|
||||
Subsequent uses of the same data file won't need to be read in again until another file of that type is used or generated.
|
||||
The program checks whether it needs to update its cached data by comparing filenames as entered by the user. On
|
||||
encountering a new filename, the program flushes its cache and reads in the new file.
|
||||
|
||||
When entering filenames, it is not necessary to include the file extension (.csv or .ser). When reading or
|
||||
writing files, the program will automatically add the correct extension to any filename without one.
|
||||
|
||||
@@ -121,7 +131,7 @@ Options when making a Sample Plate file:
|
||||
* Standard deviation size
|
||||
* Exponential
|
||||
* Lambda value
|
||||
* (Based on the slope of the graph in Figure 4C of the pairSEQ paper, the distribution of the original experiment was exponential with a lambda of approximately 0.6. (Howie, et al. 2015))
|
||||
* *(Based on the slope of the graph in Figure 4C of the pairSEQ paper, the distribution of the original experiment was exponential with a lambda of approximately 0.6. (Howie, et al. 2015))*
|
||||
* Total number of wells on the plate
|
||||
* Number of sections on plate
|
||||
* Number of T cells per well
|
||||
@@ -155,8 +165,8 @@ Structure:
|
||||
|
||||
---
|
||||
|
||||
#### Graph and Data Files
|
||||
Graph and Data files are serialized binaries of a Java object containing the weigthed bipartite graph representation of a
|
||||
#### Graph/Data Files
|
||||
Graph/Data files are serialized binaries of a Java object containing the weigthed bipartite graph representation of a
|
||||
Sample Plate, along with the necessary metadata for matching and results output. Making them requires a Cell Sample file
|
||||
(to construct a list of correct sequence pairs for checking the accuracy of BiGpairSEQ simulations) and a
|
||||
Sample Plate file (to construct the associated occupancy graph).
|
||||
@@ -164,7 +174,7 @@ Sample Plate file (to construct the associated occupancy graph).
|
||||
These files can be several gigabytes in size. Writing them to a file lets us generate a graph and its metadata once,
|
||||
then use it for multiple different BiGpairSEQ simulations.
|
||||
|
||||
Options for creating a Graph and Data file:
|
||||
Options for creating a Graph/Data file:
|
||||
* The Cell Sample file to use
|
||||
* The Sample Plate file to use. (This must have been generated from the selected Cell Sample file.)
|
||||
|
||||
@@ -175,11 +185,7 @@ portable data format may be implemented in the future. The tricky part is encodi
|
||||
|
||||
#### Matching Results Files
|
||||
Matching results files consist of the results of a BiGpairSEQ matching simulation. Making them requires a Graph and
|
||||
Data file. To save file I/O time, the data from the most recent Graph and Data file read or generated is cached
|
||||
by the simulator. Subsequent BiGpairSEQ simulations run with the same input filename will use the cached version
|
||||
rather than reading in again from disk.
|
||||
|
||||
Files are in CSV format. Rows are sequence pairings with extra relevant data. Columns are pairing-specific details.
|
||||
Data file. Matching results files are in CSV format. Rows are sequence pairings with extra relevant data. Columns are pairing-specific details.
|
||||
Metadata about the matching simulation is included as comments. Comments are preceded by `#`.
|
||||
|
||||
Options when running a BiGpairSEQ simulation of CDR3 alpha/beta matching:
|
||||
|
||||
Reference in New Issue
Block a user