Efficient pattern matching in degenerate strings with the Burrows–Wheeler transform
Standard
Efficient pattern matching in degenerate strings with the Burrows–Wheeler transform. / Daykin, Jacqueline; Groult, Richard; Guesnet, Yannick et al.
In: Information Processing Letters, 15.03.2019.Research output: Contribution to journal › Article › peer-review
Harvard
APA
Vancouver
Author
Bibtex - Download
}
RIS (suitable for import to EndNote) - Download
TY - JOUR
T1 - Efficient pattern matching in degenerate strings with the Burrows–Wheeler transform
AU - Daykin, Jacqueline
AU - Groult, Richard
AU - Guesnet, Yannick
AU - Lecroq, Thierry
AU - Lefebvre, Arnaud
AU - Léonard, Martine
AU - Mouchard, Laurent
AU - Prieur-Gaston, Élise
AU - Watson, Bruce
PY - 2019/3/15
Y1 - 2019/3/15
N2 - A degenerate or indeterminate string on an alphabet Σ is a sequence of non-empty subsets of Σ. Given a degenerate string t of length n and its Burrows–Wheeler transform we present a new method for searching for a degenerate pattern of length m in t running in O(mn) time on a constant size alphabet Σ. Furthermore, it is a hybrid pattern matching technique that works on both regular and degenerate strings. A degenerate string is said to be conservative if its number of non-solid letters is upper-bounded by a fixed positive constant q; in this case we show that the search time complexity is O(qm2) for counting the number of occurrences andO(qm2+occ) for reporting the found occurrences where occ is the number of occurrences of the pattern in t. Experimental results show that our method performs well in practice
AB - A degenerate or indeterminate string on an alphabet Σ is a sequence of non-empty subsets of Σ. Given a degenerate string t of length n and its Burrows–Wheeler transform we present a new method for searching for a degenerate pattern of length m in t running in O(mn) time on a constant size alphabet Σ. Furthermore, it is a hybrid pattern matching technique that works on both regular and degenerate strings. A degenerate string is said to be conservative if its number of non-solid letters is upper-bounded by a fixed positive constant q; in this case we show that the search time complexity is O(qm2) for counting the number of occurrences andO(qm2+occ) for reporting the found occurrences where occ is the number of occurrences of the pattern in t. Experimental results show that our method performs well in practice
KW - algorithm
KW - Burrows-Wheeler transform
KW - conservative
KW - degenerate
KW - pattern matching
KW - string
U2 - 10.1016/j.ipl.2019.03.003
DO - 10.1016/j.ipl.2019.03.003
M3 - Article
JO - Information Processing Letters
JF - Information Processing Letters
SN - 0020-0190
ER -