15 research outputs found

    DSD: document sparse-based denoising algorithm

    Full text link
    International audienceIn this paper, we present a sparse-based denoising algorithm for scanned documents. This method can be applied to any kind of scanned documents with satisfactory results. Unlike other approaches, the proposed approach encodes noise documents through sparse representation and visual dictionary learning techniques without any prior noise model. Moreover, we propose a precision parameter estimator. Experiments on several datasets demonstrate the robustness of the proposed approach compared to the state-of-the-art methods on document denoising

    A symbol spotting approach in graphic documents

    Get PDF
    This paper addresses the problem of symbol spotting for graphic documents. We propose an approach where each graphic document is indexed as a text document by using the vector model and an inverted file structure. The method relies on a visual vocabulary built from a shape descriptor adapted to the document level and invariant under classical geometric transforms (rotation, scaling and translation). Regions of interest (ROI) selected with high degree of confidence using a voting strategy are considered as occurrences of a query symbol. The symbol spotting problem consists in locating all instances of a symbol embedded in documents. The representation of these symbols is not straightforward by using a good shape (symbol) descriptor because they are not isolated from their context. Therefore, a common strategy for symbol spotting consists in decomposing documents into components and in applying a shape descriptor on each of them. A vectorization step is needed for most of the approaches and usually, only symbols which satisfy some conditions are retrieved (eg. convexity, connectivity, closure, ...). Our objective is to tackle the problem from a point of view where neither symbol hypothesis nor vectorization step is needed. First of all, we proposed a descriptor to represent graphic symbols and its extension to document level. Then, we exploit a descriptechnique based on the concept of visual words for indexing graphic documents and for spotting non-segmented symbols into documents. Finally, we introduce a voting process on the detected ROI in order to locate instances of a query symbol...Dans cet article, nous proposons une méthode de localisation de symboles dans des documents graphiques. Les occurrences du symbole dans un document sont détectées grâce à un processus de vote sur des régions candidates. L’approche repose sur un vocabulaire visuel et afin de réduire la complexité d’appariement d’un symbole avec d’autres nous utilisons le modèle vectoriel et une indexation par un fichier inverse. Cette méthode s’appuie sur un descripteur défini à partir du concept de contexte de forme 1 adapté aux points d’intérêt. Ce descripteur est invariant à la rotation, à la translation et aux changements d’échelles. Les résultats expérimentaux sur la recherche de symboles isolés et sur la localisation de symboles nonsegmentés dans le document sont très prometteurs
    corecore