Practical Methods for Approximate String Matching
Author | : Heikki Hyyrö |
Publisher | : |
Total Pages | : 105 |
Release | : 2003 |
ISBN-10 | : 9514458184 |
ISBN-13 | : 9789514458187 |
Rating | : 4/5 (84 Downloads) |
Download or read book Practical Methods for Approximate String Matching written by Heikki Hyyrö and published by . This book was released on 2003 with total page 105 pages. Available in PDF, EPUB and Kindle. Book excerpt: Abstract: "Given a pattern string and a text, the task of approximate string matching is to find all locations in the text that are similar to the pattern. This type of search may be done for example in applications of spelling error correction or bioinformatics. Typically edit distance is used as the measure of similarity (or distance) between two strings. In this thesis we concentrate on unit-cost edit distance that defines the distance between two strings as the minimum number of edit operations that are needed in transforming one of the strings into the other. More specifically, we discuss the Levenshtein and the Damerau edit distances. Aproximate [sic] string matching algorithms can be divided into off-line and on-line algorithms depending on whether they may or may not, respectively, preprocess the text. In this thesis we propose practical algorithms for both types of approximate string matching as well as for computing edit distance. Our main contributions are a new variant of the bit-parallel approximate string matching algorithm of Myers, a method that makes it easy to modify many existing Levenshtein edit distance algorithms into using the Damerau edit distance, a bit-parallel algorithm for computing edit distance, a more error tolerant version of the ABNDM algorithm, a two-phase filtering scheme, a tuned indexed approximate string matching method for genome searching, and an improved and extended version of the hybrid index of Navarro and Baeza-Yates. To evaluate their practicality, we compare most of the proposed methods with previously existing algorithms. The test results support the claim of the title of this thesis that our proposed algorithms work well in practice."