diff --git a/readme.md b/readme.md index 950cf4a..142b829 100644 --- a/readme.md +++ b/readme.md @@ -96,7 +96,7 @@ These files are often generated in sequence. When entering filenames, it is not (.csv or .ser). When reading or writing files, the program will automatically add the correct extension to any filename without one. -To save file I/O time, the most recent instance of each of these four +To save file I/O time when using the interactive interface, the most recent instance of each of these four files either generated or read from disk can be cached in program memory. When caching is active, subsequent uses of the same data file won't need to be read in again until another file of that type is used or generated, or caching is turned off for that file type. The program checks whether it needs to update its cached data by comparing @@ -160,7 +160,7 @@ Options when making a Sample Plate file: * Number of sections on plate * Number of T cells per well * per section, if more than one section -* Dropout rate +* Sequence dropout rate Files are in CSV format. There are no header labels. Every row represents a well. Every value represents an individual cell, containing four sequences, depicted as an array string: @@ -212,8 +212,8 @@ These files do not have a human-readable structure, and are not portable to othe For portability of graph data to other software, turn on [GraphML](http://graphml.graphdrawing.org/index.html) output in the Options menu in interactive mode, or use the `-graphml`command line argument. This will produce a .graphml file -for the weighted graph, with vertex attributes for sequence, type, and occupancy data. This graph contains all the data -necessary for the BiGpairSEQ matching algorithm. It does not include the data to measure pairing accuracy; for that, +for the weighted graph, with vertex attributes for sequence, type, total occupancy, total read count, and the read count for every individual occupied well. +This graph contains all the data necessary for the BiGpairSEQ matching algorithm. It does not include the data to measure pairing accuracy; for that, compare the matching results to the original Cell Sample .csv file. --- @@ -365,6 +365,7 @@ roughly as though it had a constant well population equal to the plate's average * ~~Implement simulation of sequences being misread as other real sequence~~ DONE * Update matching metadata output options in CLI * Update performance data in this readme +* Add section to ReadMe describing data filtering methods. * Re-implement CDR1 matching method * Refactor simulator code to collect all needed data in a single scan of the plate * Currently it scans once for the vertices and then again for the edge weights. This made simulating read depth awkward, and incompatible with caching of plate files.