Correct misstatement of filter condition in Algorithm section

This commit is contained in:
eugenefischer
2022-09-29 18:32:42 -05:00
parent 633334a1b8
commit 19badac92b

View File

@@ -72,7 +72,7 @@ The relative time/space efficiencies of BiGpairSEQ when backed by different MWM
* Pre-filter the sequence data to minimize the size of the necessary graph. * Pre-filter the sequence data to minimize the size of the necessary graph.
* *Saturating sequence filter*: remove any sequences present in all of the wells on the sample plate, as there is no signal in the occupancy data of saturating sequences. * *Saturating sequence filter*: remove any sequences present in all of the wells on the sample plate, as there is no signal in the occupancy data of saturating sequences.
* *Non-existent sequence filter*: sequencing misreads can pollute the data from the sample plate with non-existent sequences. These can be identified by the discrepancy between their occupancy and their total read count. If a sequence is read correctly at least half the time, then the total read count of a sequence (R) should be at least half the well occupancy of that sequence (O) times the read depth of the sequencing run (D). Remove any sequences for which C < (O * D) / 2. * *Non-existent sequence filter*: sequencing misreads can pollute the data from the sample plate with non-existent sequences. These can be identified by the discrepancy between their occupancy and their total read count. If a sequence is read correctly at least half the time, then the total read count of a sequence (R) should be at least half the well occupancy of that sequence (O) times the read depth of the sequencing run (D). Remove any sequences for which C < (O * D) / 2.
* *Misidentified sequence filter*: sequencing misreads can cause one real sequence to be misidentified as a different real sequence. This should be fairly infrequent, but is a problem if it causes a sequence to seem to be in a well where it is not, in fact, present. This can be detected by looking at discrepancies in a sequence's per-well read count. On average, the read count for a sequence in an individual well (r) should be equal to its total read count (R) divided by its total well occupancy (O). Remove any sequences for which r < R / (2 * O). * *Misidentified sequence filter*: sequencing misreads can cause one real sequence to be misidentified as a different real sequence. This should be fairly infrequent, but is a problem if it causes a sequence to seem to be in a well where it is not, in fact, present. This can be detected by looking at discrepancies in a sequence's per-well read count. On average, the read count for a sequence in an individual well (r) should be equal to its total read count (R) divided by its total well occupancy (O). Remove from the list of wells occupied by a sequence any wells for which r < R / (2 * O).
* Encode the occupancy data from the sample plate as a weighted bipartite graph, where one set of vertices represent the distinct TCRAs and the other set represents distinct TCRBs. Between any TCRA and TCRB that share a well, draw an edge. Assign that edge a weight equal to the total number of wells shared by both sequences. * Encode the occupancy data from the sample plate as a weighted bipartite graph, where one set of vertices represent the distinct TCRAs and the other set represents distinct TCRBs. Between any TCRA and TCRB that share a well, draw an edge. Assign that edge a weight equal to the total number of wells shared by both sequences.
* Find a maximum weight matching of the bipartite graph, using any [MWM algorithm](https://en.wikipedia.org/wiki/Assignment_problem#Algorithms) that produces a provably optimal result. * Find a maximum weight matching of the bipartite graph, using any [MWM algorithm](https://en.wikipedia.org/wiki/Assignment_problem#Algorithms) that produces a provably optimal result.
* If desired, restrict the matching to a subset of the graph. (Example: restricting matching attempts to cases where the occupancy overlap is 3 or more wells--that is, edges with weight >= 3.0.) * If desired, restrict the matching to a subset of the graph. (Example: restricting matching attempts to cases where the occupancy overlap is 3 or more wells--that is, edges with weight >= 3.0.)