update readme

This commit is contained in:
eugenefischer
2022-09-29 00:00:19 -05:00
parent c30167d5ec
commit 756e5572b9

View File

@@ -96,7 +96,7 @@ These files are often generated in sequence. When entering filenames, it is not
(.csv or .ser). When reading or writing files, the program will automatically add the correct extension to any filename (.csv or .ser). When reading or writing files, the program will automatically add the correct extension to any filename
without one. without one.
To save file I/O time, the most recent instance of each of these four To save file I/O time when using the interactive interface, the most recent instance of each of these four
files either generated or read from disk can be cached in program memory. When caching is active, subsequent uses of the files either generated or read from disk can be cached in program memory. When caching is active, subsequent uses of the
same data file won't need to be read in again until another file of that type is used or generated, same data file won't need to be read in again until another file of that type is used or generated,
or caching is turned off for that file type. The program checks whether it needs to update its cached data by comparing or caching is turned off for that file type. The program checks whether it needs to update its cached data by comparing
@@ -160,7 +160,7 @@ Options when making a Sample Plate file:
* Number of sections on plate * Number of sections on plate
* Number of T cells per well * Number of T cells per well
* per section, if more than one section * per section, if more than one section
* Dropout rate * Sequence dropout rate
Files are in CSV format. There are no header labels. Every row represents a well. Files are in CSV format. There are no header labels. Every row represents a well.
Every value represents an individual cell, containing four sequences, depicted as an array string: Every value represents an individual cell, containing four sequences, depicted as an array string:
@@ -212,8 +212,8 @@ These files do not have a human-readable structure, and are not portable to othe
For portability of graph data to other software, turn on [GraphML](http://graphml.graphdrawing.org/index.html) output For portability of graph data to other software, turn on [GraphML](http://graphml.graphdrawing.org/index.html) output
in the Options menu in interactive mode, or use the `-graphml`command line argument. This will produce a .graphml file in the Options menu in interactive mode, or use the `-graphml`command line argument. This will produce a .graphml file
for the weighted graph, with vertex attributes for sequence, type, and occupancy data. This graph contains all the data for the weighted graph, with vertex attributes for sequence, type, total occupancy, total read count, and the read count for every individual occupied well.
necessary for the BiGpairSEQ matching algorithm. It does not include the data to measure pairing accuracy; for that, This graph contains all the data necessary for the BiGpairSEQ matching algorithm. It does not include the data to measure pairing accuracy; for that,
compare the matching results to the original Cell Sample .csv file. compare the matching results to the original Cell Sample .csv file.
--- ---
@@ -365,6 +365,7 @@ roughly as though it had a constant well population equal to the plate's average
* ~~Implement simulation of sequences being misread as other real sequence~~ DONE * ~~Implement simulation of sequences being misread as other real sequence~~ DONE
* Update matching metadata output options in CLI * Update matching metadata output options in CLI
* Update performance data in this readme * Update performance data in this readme
* Add section to ReadMe describing data filtering methods.
* Re-implement CDR1 matching method * Re-implement CDR1 matching method
* Refactor simulator code to collect all needed data in a single scan of the plate * Refactor simulator code to collect all needed data in a single scan of the plate
* Currently it scans once for the vertices and then again for the edge weights. This made simulating read depth awkward, and incompatible with caching of plate files. * Currently it scans once for the vertices and then again for the edge weights. This made simulating read depth awkward, and incompatible with caching of plate files.