Efficient pattern matching in degenerate strings with the Burrows–Wheeler transform
Authors
Organisations
Type | Article |
---|
Original language | English |
---|---|
Journal | Information Processing Letters |
Early online date | 15 Mar 2019 |
DOI | |
Publication status | E-pub ahead of print - 15 Mar 2019 |
Permanent link | Permanent link |
---|
Abstract
A degenerate or indeterminate string on an alphabet Σ is a sequence of non-empty subsets of Σ. Given a degenerate string t of length n and its Burrows–Wheeler transform we present a new method for searching for a degenerate pattern of length m in t running in O(mn) time on a constant size alphabet Σ. Furthermore, it is a hybrid pattern matching technique that works on both regular and degenerate strings. A degenerate string is said to be conservative if its number of non-solid letters is upper-bounded by a fixed positive constant q; in this case we show that the search time complexity is O(qm2) for counting the number of occurrences andO(qm2+occ) for reporting the found occurrences where occ is the number of occurrences of the pattern in t. Experimental results show that our method performs well in practice
Keywords
- algorithm, Burrows-Wheeler transform, conservative, degenerate, pattern matching, string
Documents
- Efficient pattern matching in degenerate strings with the Burrows-Wheeler transform
Accepted author manuscript, 267 KB, PDF