Search CORE

275 research outputs found

Segmentation-free Word Spotting for Handwritten Arabic Documents

Author: Chenouni Driss
El Yacoubi Mounîm
Elfakir Youssef
Khaissidi Ghizlane
Mrabti Mostafa
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 07/07/2021
Field of study

In this paper we present an unsupervised segmentation-free method for spotting and searching query, especially, for images documents in handwritten Arabic, for this, Histograms of Oriented Gradients (HOGs) are used as the feature vectors to represent the query and documents image. Then, we compress the descriptors with the product quantization method. Finally, a better representation of the query is obtained by using the Support Vector Machines (SVM)

Re-UNIR

The impact of the image processing in the indexation system

Author: Chenouni Driss
Elfakir Youssef
Khaissidi Ghizlane
Mrabti Mostafa
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/10/2019
Field of study

This paper presents an efficient word spotting system applied to handwritten Arabic documents, where images are represented with bag-of-visual-SIFT descriptors and a sliding window approach is used to locate the regions that are most similar to the query by following the query-by-example paragon. First, a pre-processing step is used to produce a better representation of the most informative features. Secondly, a region-based framework is deployed to represent each local region by a bag-of-visual-SIFT descriptors. Afterward, some experiments are in order to demonstrate the codebook size influence on the efficiency of the system, by analyzing the curse of dimensionality curve. In the end, to measure the similarity score, a floating distance based on the descriptor’s number for each query is adopted. The experimental results prove the efficiency of the proposed processing steps in the word spotting system

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Cross-document word matching for segmentation and retrieval of Ottoman divans

Author: Arifoglu D.
Duygulu P.
Kalpakli M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Cataloged from PDF version of article.Motivated by the need for the automatic indexing and analysis of huge number of documents in Ottoman divan poetry, and for discovering new knowledge to preserve and make alive this heritage, in this study we propose a novel method for segmenting and retrieving words in Ottoman divans. Documents in Ottoman are dif- ficult to segment into words without a prior knowledge of the word. In this study, using the idea that divans have multiple copies (versions) by different writers in different writing styles, and word segmentation in some of those versions may be relatively easier to achieve than in other versions, segmentation of the versions (which are difficult, if not impossible, with traditional techniques) is performed using information carried from the simpler version. One version of a document is used as the source dataset and the other version of the same document is used as the target dataset. Words in the source dataset are automatically extracted and used as queries to be spotted in the target dataset for detecting word boundaries. We present the idea of cross-document word matching for a novel task of segmenting historical documents into words. We propose a matching scheme based on possible combinations of sequence of sub-words. We improve the performance of simple features through considering the words in a context. The method is applied on two versions of Layla and Majnun divan by Fuzuli. The results show that, the proposed word-matching-based segmentation method is promising in finding the word boundaries and in retrieving the words across documents

Bilkent University Institutional Repository

A comprehensive survey of handwritten document benchmarks: structure, usage and evaluation

Author
Publication venue: Springer
Publication date: 24/12/2015
Field of study

Springer - Publisher Connector

A line-based representation for matching words in historical manuscripts

Author: Can E. F.
Duygulu P.
Publication venue: 'Elsevier BV'
Publication date: 01/06/2011
Field of study

Cataloged from PDF version of article.In this study, we propose a new method for retrieving and recognizing words in historical documents. We represent word images with a set of line segments. Then we provide a criterion for word matching based on matching the lines. We carry out experiments on a benchmark dataset consisting of manuscripts by George Washington, as well as on Ottoman manuscripts. (C) 2011 Elsevier B.V. All rights reserved

Bilkent University Institutional Repository