15 research outputs found
DSD: document sparse-based denoising algorithm
International audienceIn this paper, we present a sparse-based denoising algorithm for scanned documents. This method can be applied to any kind of scanned documents with satisfactory results. Unlike other approaches, the proposed approach encodes noise documents through sparse representation and visual dictionary learning techniques without any prior noise model. Moreover, we propose a precision parameter estimator. Experiments on several datasets demonstrate the robustness of the proposed approach compared to the state-of-the-art methods on document denoising
A symbol spotting approach in graphic documents
This paper addresses the problem of symbol spotting for graphic documents. We propose an approach where each
graphic document is indexed as a text document by using the vector model and an inverted file structure. The method
relies on a visual vocabulary built from a shape descriptor adapted to the document level and invariant under classical
geometric transforms (rotation, scaling and translation). Regions of interest (ROI) selected with high degree of
confidence using a voting strategy are considered as occurrences of a query symbol.
The symbol spotting problem consists in locating all instances of a symbol embedded in documents. The representation
of these symbols is not straightforward by using a good shape (symbol) descriptor because they are not isolated from
their context. Therefore, a common strategy for symbol spotting consists in decomposing documents into components
and in applying a shape descriptor on each of them. A vectorization step is needed for most of the approaches and
usually, only symbols which satisfy some conditions are retrieved (eg. convexity, connectivity, closure, ...). Our objective
is to tackle the problem from a point of view where neither symbol hypothesis nor vectorization step is needed. First of
all, we proposed a descriptor to represent graphic symbols and its extension to document level. Then, we exploit a descriptechnique
based on the concept of visual words for indexing graphic documents and for spotting non-segmented
symbols into documents. Finally, we introduce a voting process on the detected ROI in order to locate instances of a
query symbol...Dans cet article, nous proposons une méthode de localisation de symboles dans des documents graphiques.
Les occurrences du symbole dans un document sont détectées grâce à un processus de vote sur des régions
candidates. L’approche repose sur un vocabulaire visuel et afin de réduire la complexité d’appariement d’un
symbole avec d’autres nous utilisons le modèle vectoriel et une indexation par un fichier inverse. Cette
méthode s’appuie sur un descripteur défini à partir du concept de contexte de forme 1 adapté aux points
d’intérêt. Ce descripteur est invariant à la rotation, à la translation et aux changements d’échelles. Les
résultats expérimentaux sur la recherche de symboles isolés et sur la localisation de symboles nonsegmentés
dans le document sont très prometteurs