Content-based Handwritten Document Indexing and Retrieval

Content-based Handwritten Document Indexing and Retrieval
Author :
Publisher :
Total Pages : 121
Release :
ISBN-10 : OCLC:244286945
ISBN-13 :
Rating : 4/5 (45 Downloads)

Book Synopsis Content-based Handwritten Document Indexing and Retrieval by :

Download or read book Content-based Handwritten Document Indexing and Retrieval written by and published by . This book was released on 2008 with total page 121 pages. Available in PDF, EPUB and Kindle. Book excerpt: Information retrieval on textual data has been well studied and its applications (such as web searching) have become ubiquitous in our daily lives. However content-based image retrieval on handwritten document collections still remains a challenging problem. Here "content-based" means that the search will analyze the actual content of the images, instead of merely the metadata. In the context of handwritten documents, the word "content" might refer different things, such as writing style, shape of words and characters, or the truth of the writing. Accordingly, two different types of retrieval can be performed: "query by example" and semantic (or "query by text") retrieval. While both of them have their own applications in the real world, the second one is more intuitive and user-friendly, since it uses not only the low level underlying computational features, but also the understanding of documents. This work explores several automatic techniques to do both types of retrieval upon handwritten document collections. These techniques are three-fold: (i) indexing, (ii) "query by example" retrieval and (iii) "query by text" retrieval. For indexing, we focus on the problem of word segmentation and transcript mapping. Word segmentation is the task of segmenting text line images into word image, which is one of the most important preprocessing steps in order to perform any word level analysis or recognition. We propose the use of neural network with a new set of global and local features to make the classification between inter-word and intra-word gaps. The transcript mapping problem is an alignment problem between the handwritten document image and its transcript. It is not a trivial task simply because the word segmentation algorithm is error prone. A recognition based dynamic programming algorithm is proposed to solve this problem. It is also shown to improve the accuracy of automatic word segmentation. In "query by example" retrieval, the query can be either a full page document or a single word image. For the document level retrieval, a statistical model is learned to determine whether the writing styles of two documents are similar or not. Gamma and Gaussian distributions are used for the modeling. Word level retrieval is performed by a feature based similarity search algorithm. For each word image, a 1024-bit binary feature vector is extracted for this purpose. "Query by text" retrieval is a more challenging task because word level segmentation is error prone and word recognition with large lexicon size is still an unsolved problem. The current solution for this problem is to manually annotate the collection, which is costly. By taking the idea from machine translation in textual information retrieval, we propose a statistical approach for word recognition and use the probabilistic annotation results to do language model retrieval on handwritten documents. For all these approaches, their performances are empirically compared on several test collections. The main contributions of this work are a detailed examination of different levels of content-based image retrieval for handwritten documents, and the development of a retrieval system that allows either image or text queries. The new word segmentation method shows an improved performance over a previous method and is useful in forensic document analysis. In addition, a large handwriting database of 3824 pages (about 573,600 labeled words) was created using the proposed transcript-mapping algorithm. This database was used predominantly in this dissertation and it serves as a useful resource for future handwriting analysis and recognition research.


Content-based Handwritten Document Indexing and Retrieval Related Books

Content-based Handwritten Document Indexing and Retrieval
Language: en
Pages: 121
Authors:
Categories:
Type: BOOK - Published: 2008 - Publisher:

DOWNLOAD EBOOK

Information retrieval on textual data has been well studied and its applications (such as web searching) have become ubiquitous in our daily lives. However cont
Handwritten Historical Document Analysis, Recognition, And Retrieval - State Of The Art And Future Trends
Language: en
Pages: 269
Authors: Andreas Fischer
Categories: Computers
Type: BOOK - Published: 2020-11-11 - Publisher: World Scientific

DOWNLOAD EBOOK

In recent years, libraries and archives all around the world have increased their efforts to digitize historical manuscripts. To integrate the manuscripts into
Indexing and Retrieval of Low Quality Handwritten Documents
Language: en
Pages: 101
Authors: Huaigu Cao
Categories:
Type: BOOK - Published: 2008 - Publisher:

DOWNLOAD EBOOK

Decades of the development in document analysis and recognition techniques has made it possible to convert large amount of documents into electronic formats and
Artificial Intelligence for Maximizing Content Based Image Retrieval
Language: en
Pages: 450
Authors: Ma, Zongmin
Categories: Computers
Type: BOOK - Published: 2009-01-31 - Publisher: IGI Global

DOWNLOAD EBOOK

Discusses major aspects of content-based image retrieval (CBIR) using current technologies and applications within the artificial intelligence (AI) field.
Document Analysis Systems VI
Language: en
Pages: 575
Authors: Simone Marinai
Categories: Computers
Type: BOOK - Published: 2004-08-26 - Publisher: Springer Science & Business Media

DOWNLOAD EBOOK

Thisvolumecontainspapersselectedforpresentationatthe6thIAPRWorkshop on Document Analysis Systems (DAS 2004) held during September 8–10, 2004 at the University