Search CORE

1,344 research outputs found

Text Search in Document Images Based on Hausdorff Distance Measures

Author: Andreev Andrey
Kirov Nikolay
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

The Hausdorff type distances between the sets of points on the plane are the commonly used similarity measures for binary images. In this work we present several such measures in a unified manner and introduce a new, naturally arisen variant of Hausdorff distance. The matching performance of all similarity measures is compared by computer experiments, using real word images from a scanned book

Crossref

New Bulgarian University Scholar Electronic Repository

Hausdorff distances for searching in binary text images

Author: Andreev Andrey
Kirov Nikolay
Publication venue: Institute of Mathematics and Informatics, Bulgarian Academy of Sciences
Publication date: 01/01/2009
Field of study

Hausdorff distance (HD) seems the most efficient instrument for measuring how far two compact non-empty subsets of a metric space are from each other. This paper considers the possibilities provided by HD and some of its modifications used recently by many authors for resemblance between binary text images. Summarizing part of the existing word image matching methods, relied on HD, we investigate a new similar parameterized method which contains almost all of them as particular cases. Numerical experiments for searching words in binary text images are carried out with 333 pages of old Bulgarian typewritten text, 200 printed pages of Bulgarian Chrestomathy from year 1884, and 200 handwritten pages of Slavonic manuscript from year 1574. They outline how the parameters must be set in order to use the advantages of the proposed method for the purposes of word matching in scanned document images

New Bulgarian University Scholar Electronic Repository

Bulgarian Digital Mathematics Library at IMI-BAS

Word Image Matching Based on Hausdorff Distances

Author: Andreev Andrey
Kirov Nikolay
Publication venue: IEEE Computer Society
Publication date: 01/07/2009
Field of study

Hausdorff distance (HD) and its modifications provides one of the best approaches for matching of binary images. This paper proposes a formalism generalizing almost all of these HD based methods. Numerical experiments for searching words in binary text images are carried out with old Bulgarian typewritten text, printed Bulgarian Chrestomathy from 1884 and Slavonic manuscript from 1574

New Bulgarian University Scholar Electronic Repository

A scalable framework for stylometric analysis query processing

Author: Chow Dickson
Nutanong Sarana
Sarwar Raheem
Xu Peter
Yu Chenyun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/10/2016
Field of study

This is an accepted manuscript of an article published by IEEE in 2016 IEEE 16th International Conference on Data Mining (ICDM) on 02/02/2017, available online: https://ieeexplore.ieee.org/document/7837960 The accepted version of the publication may differ from the final published version.Stylometry is the statistical analyses of variationsin the author's literary style. The technique has been used inmany linguistic analysis applications, such as, author profiling, authorship identification, and authorship verification. Over thepast two decades, authorship identification has been extensivelystudied by researchers in the area of natural language processing. However, these studies are generally limited to (i) a small number of candidate authors, and (ii) documents with similar lengths. In this paper, we propose a novel solution by modeling authorship attribution as a set similarity problem to overcome the two stated limitations. We conducted extensive experimental studies on a real dataset collected from an online book archive, Project Gutenberg. Experimental results show that in comparison to existing stylometry studies, our proposed solution can handlea larger number of documents of different lengths written by alarger pool of candidate authors with a high accuracy.Published versio

Wolverhampton Intellectual Repository and E-theses

A Comparison of Nature Inspired Algorithms for Multi-threshold Image Segmentation

Author: Cuevas Erik
Osuna-Enciso Valentín
Sossa Humberto
Publication venue
Publication date: 01/01/2013
Field of study

In the field of image analysis, segmentation is one of the most important preprocessing steps. One way to achieve segmentation is by mean of threshold selection, where each pixel that belongs to a determined class islabeled according to the selected threshold, giving as a result pixel groups that share visual characteristics in the image. Several methods have been proposed in order to solve threshold selectionproblems; in this work, it is used the method based on the mixture of Gaussian functions to approximate the 1D histogram of a gray level image and whose parameters are calculated using three nature inspired algorithms (Particle Swarm Optimization, Artificial Bee Colony Optimization and Differential Evolution). Each Gaussian function approximates thehistogram, representing a pixel class and therefore a threshold point. Experimental results are shown, comparing in quantitative and qualitative fashion as well as the main advantages and drawbacks of each algorithm, applied to multi-threshold problem.Comment: 16 pages, this is a draft of the final version of the article sent to the Journa

arXiv.org e-Print Archive

Red Mexicana de Repositorios Institucionales

Multiple Instance Learning: A Survey of Problem Characteristics and Applications

Author: Carbonneau Marc-André
Cheplygina Veronika
Gagnon Ghyslain
Granger Eric
Publication venue: 'Elsevier BV'
Publication date: 10/12/2016
Field of study

Multiple instance learning (MIL) is a form of weakly supervised learning where training instances are arranged in sets, called bags, and a label is provided for the entire bag. This formulation is gaining interest because it naturally fits various problems and allows to leverage weakly labeled data. Consequently, it has been used in diverse application fields such as computer vision and document classification. However, learning from bags raises important challenges that are unique to MIL. This paper provides a comprehensive survey of the characteristics which define and differentiate the types of MIL problems. Until now, these problem characteristics have not been formally identified and described. As a result, the variations in performance of MIL algorithms from one data set to another are difficult to explain. In this paper, MIL problem characteristics are grouped into four broad categories: the composition of the bags, the types of data distribution, the ambiguity of instance labels, and the task to be performed. Methods specialized to address each category are reviewed. Then, the extent to which these characteristics manifest themselves in key MIL application areas are described. Finally, experiments are conducted to compare the performance of 16 state-of-the-art MIL methods on selected problem characteristics. This paper provides insight on how the problem characteristics affect MIL algorithms, recommendations for future benchmarking and promising avenues for research

arXiv.org e-Print Archive