Update readme to reflect new default caching behavior.

2022-02-24 15:39:15 -06:00
parent 3d9890e16a
commit ab8d98ed81
1 changed files with 12 additions and 9 deletions
--- a/readme.md
+++ b/readme.md
@@ -78,16 +78,17 @@ These files are often generated in sequence. When entering filenames, it is not
 (.csv or .ser). When reading or writing files, the program will automatically add the correct extension to any filename without one.

 To save file I/O time, the most recent instance of each of these four
-files either generated or read from disk can be cached in program memory. This is especially important for Graph/Data files,
+files either generated or read from disk can be cached in program memory. This is could be important for Graph/Data files,
 which can be several gigabytes in size. Since some simulations may require running multiple, 
-differently-configured BiGpairSEQ matchings on the same graph, keeping the most recent graph cached can reduce execution time
+differently-configured BiGpairSEQ matchings on the same graph, keeping the most recent graph cached may reduce execution time.
+(The manipulation necessary to re-use a graph incurs its own performance overhead, though, which may scale with graph
+size faster than file I/O does. If so, caching is best for smaller graphs.)

-Subsequent uses of the same data file won't need to be read in again until another file of that type is used or generated,
+When caching is active, subsequent uses of the same data file won't need to be read in again until another file of that type is used or generated,
 or caching is turned off for that file type. The program checks whether it needs to update its cached data by comparing
 filenames as entered by the user. On encountering a new filename, the program flushes its cache and reads in the new file.

-The program's caching behavior can be controlled in the Options menu. By default, caching for cell sample and 
-sample plate files is OFF, and caching for graph/data files is OFF.
+The program's caching behavior can be controlled in the Options menu. By default, all caching is OFF.

 #### Cell Sample Files
 Cell Sample files consist of any number of distinct "T cells." Every cell contains 
@@ -252,7 +253,8 @@ slightly less time than the simulation itself. Real elapsed time from start to f
 * ~~Try invoking GC at end of workloads to reduce paging to disk~~ DONE
 * Hold graph data in memory until another graph is read-in? ~~ABANDONED~~ ~~UNABANDONED~~ DONE
  * ~~*No, this won't work, because BiGpairSEQ simulations alter the underlying graph based on filtering constraints. Changes would cascade with multiple experiments.*~~
-  * Might have figured out a way to do it, by taking edges out and then putting them back into the graph. This may actually be possible. If so, awesome.
+  * Might have figured out a way to do it, by taking edges out and then putting them back into the graph. This may actually be possible.
+  * It is possible, though the modifications to the graph incur their own performance penalties. Need testing to see which option is best.
 * See if there's a reasonable way to reformat Sample Plate files so that wells are columns instead of rows. 
  * ~~Problem is variable number of cells in a well~~
  * ~~Apache Commons CSV library writes entries a row at a time~~ 
@@ -266,9 +268,10 @@ slightly less time than the simulation itself. Real elapsed time from start to f
 * Re-implement CDR1 matching method
 * Implement Duan and Su's maximum weight matching algorithm
  * Add controllable algorithm-type parameter?
-* Test whether pairing heap (currently used) or Fibonacci heap is more efficient for priority queue in current matching algorithm
-  * in theory Fibonacci heap should be more efficient, but complexity overhead may eliminate theoretical advantage
-  * Add controllable heap-type parameter?
+* ~~Test whether pairing heap (currently used) or Fibonacci heap is more efficient for priority queue in current matching algorithm~~ DONE
+  * ~~in theory Fibonacci heap should be more efficient, but complexity overhead may eliminate theoretical advantage~~
+  * ~~Add controllable heap-type parameter?~~
+    * Parameter implemented. For large graphs, Fibonacci heap wins. Now the new default.