Efficient pattern matching in degenerate strings with the Burrows–Wheeler transform

Authors Organisations
  • Jacqueline Daykin(Author)
  • Richard Groult(Author)
    Normandy University
    University of Picardie Jules Verne
  • Yannick Guesnet(Author)
    Normandy University
  • Thierry Lecroq(Author)
    Normandy University
  • Arnaud Lefebvre(Author)
    Normandy University
  • Martine Léonard(Author)
    Normandy University
  • Laurent Mouchard(Author)
    Normandy University
  • Élise Prieur-Gaston(Author)
    Normandy University
  • Bruce Watson(Author)
    University of South Africa
    Centre for Artificial Intelligence Research
Type Article
Original languageEnglish
JournalInformation Processing Letters
Early online date15 Mar 2019
DOI
Publication statusE-pub ahead of print - 15 Mar 2019
Permanent link
Show download statistics
View graph of relations
Citation formats

Abstract

A degenerate or indeterminate string on an alphabet Σ is a sequence of non-empty subsets of Σ. Given a degenerate string t of length n and its Burrows–Wheeler transform we present a new method for searching for a degenerate pattern of length m in t running in O(mn) time on a constant size alphabet Σ. Furthermore, it is a hybrid pattern matching technique that works on both regular and degenerate strings. A degenerate string is said to be conservative if its number of non-solid letters is upper-bounded by a fixed positive constant q; in this case we show that the search time complexity is O(qm2) for counting the number of occurrences andO(qm2+occ) for reporting the found occurrences where occ is the number of occurrences of the pattern in t. Experimental results show that our method performs well in practice

Keywords

  • algorithm, Burrows-Wheeler transform, conservative, degenerate, pattern matching, string