Evolutionary Search Techniques for the Lyndon Factorization of Biosequences
Authors
Organisations
Type | Paper |
---|
Original language | English |
---|---|
Pages | 1543-1550 |
Publication status | Published - 13 Jul 2019 |
Event | Workshop on Evolutionary Computation for Permutation Problems at GECCO 2019 - Prague, Czech Republic Duration: 13 Jul 2019 → 17 Jul 2019 Conference number: 3 http://www.sc.ehu.es/ccwbayes/gecco2019_permutations/scheduling.html |
Workshop
Workshop | Workshop on Evolutionary Computation for Permutation Problems at GECCO 2019 |
---|---|
Country/Territory | Czech Republic |
City | Prague |
Period | 13 Jul 2019 → 17 Jul 2019 |
Internet address |
Permanent link | Permanent link |
---|
Abstract
A non-empty string x over an ordered alphabet is said to be a Lyndon word if it is alphabetically smaller than all of its cyclic rotations. Any string can be uniquely factored into Lyndon words and efficient algorithms exist to perform the factorization process in linear time and constant space. Lyndon words find wide-ranging applications including string matching and pattern inference in bioinformatics. Here we investigate the impact of permuting the alphabet ordering on the resulting factorization and demonstrate significant variations in the numbers of factors obtained. We also propose an evolutionary algorithm to find optimal orderings of the alphabet to enhance this factorization process and illustrate the impact of different operators.
The flexibility of such an approach is illustrated by our use of five fitness functions which produce different factorizations suitable for different downstream tasks.
The flexibility of such an approach is illustrated by our use of five fitness functions which produce different factorizations suitable for different downstream tasks.
Keywords
- algorithm, alphabet, artificial intelligence, Burrows-Wheeler Transform, factorization, evolutionary search, Genome, Lyndon word, pattern matching, string, word
Documents
- Evolutionary Search Techniques for the Lyndon Factorization of Biosequences
Accepted author manuscript, 2.03 MB, PDF
Licence: Other