3,579 research outputs found

    Proceedings of the 2nd Computer Science Student Workshop: Microsoft Istanbul, Turkey, April 9, 2011

    Get PDF

    Contributions au tri automatique de documents et de courrier d'entreprises

    Get PDF
    Ce travail de thèse s inscrit dans le cadre du développement de systèmes de vision industrielle pour le tri automatique de documents et de courriers d entreprises. Les architectures existantes, dont nous avons balayé les spécificités dans les trois premiers chapitres de la thèse, présentent des faiblesses qui se traduisent par des erreurs de lecture et des rejets que l on impute encore trop souvent aux OCR. Or, les étapes responsables de ces rejets et de ces erreurs de lecture sont les premières à intervenir dans le processus. Nous avons ainsi choisi de porter notre contribution sur les aspects inhérents à la segmentation des images de courriers et la localisation de leurs régions d intérêt en investissant une nouvelle approche pyramidale de modélisation par coloration hiérarchique de graphes ; à ce jour, la coloration de graphes n a jamais été exploitée dans un tel contexte. Elle intervient dans notre contribution à toutes les étapes d analyse de la structure des documents ainsi que dans la prise de décision pour la reconnaissance (reconnaissance de la nature du document à traiter et reconnaissance du bloc adresse). Notre architecture a été conçue pour réaliser essentiellement les étapes d analyse de structures et de reconnaissance en garantissant une réelle coopération entres les différents modules d analyse et de décision. Elle s articule autour de trois grandes parties : une partie de segmentation bas niveau (binarisation et recherche de connexités), une partie d extraction de la structure physique par coloration hiérarchique de graphe et une partie de localisation de blocs adresse et de classification de documents. Les algorithmes impliqués dans le système ont été conçus pour leur rapidité d exécution (en adéquation avec les contraintes de temps réels), leur robustesse, et leur compatibilité. Les expérimentations réalisées dans ce contexte sont très encourageantes et offrent également de nouvelles perspectives à une plus grande diversité d images de documents.This thesis deals with the development of industrial vision systems for automatic business documents and mail sorting. These systems need very high processing time, accuracy and precision of results. The current systems are most of time made of sequential modules needing fast and efficient algorithms throughout the processing line: from low to high level stages of analysis and content recognition. The existing architectures that we have described in the three first chapters of the thesis have shown their weaknesses that are expressed by reading errors and OCR rejections. The modules that are responsible of these rejections and reading errors are mostly the first to occur in the processes of image segmentation and interest regions location. Indeed, theses two processes, involving each other, are fundamental for the system performances and the efficiency of the automatic sorting lines. In this thesis, we have chosen to focus on different sides of mail images segmentation and of relevant zones (as address block) location. We have chosen to develop a model based on a new pyramidal approach using a hierarchical graph coloring. As for now, graph coloring has never been exploited in such context. It has been introduced in our contribution at every stage of document layout analysis for the recognition and decision tasks (kind of document or address block recognition). The recognition stage is made about a training process with a unique model of graph b-coloring. Our architecture is basically designed to guarantee a good cooperation bewtween the different modules of decision and analysis for the layout analysis and the recognition stages. It is composed of three main sections: the low-level segmentation (binarisation and connected component labeling), the physical layout extraction by hierarchical graph coloring and the address block location and document sorting. The algorithms involved in the system have been designed for their execution speed (matching with real time constraints), their robustness, and their compatibility. The experimentations made in this context are very encouraging and lead to investigate a wider diversity of document images.VILLEURBANNE-DOC'INSA-Bib. elec. (692669901) / SudocSudocFranceF

    Application of Graph Theory in Computer Science

    Get PDF
    The field of mathematics have important roll in various fields. One of the important area in mathematics is Graph Theory. Which used in structural modeling in many area’s. The structural arrangements of various objects or technologies lead to new inventions and modification in the existing environment for enhancement in those field. The field of graph theory started from problem of Konigsberg bridge in 1735. This paper given an overview of the application of graph theory in heterogeneous field to some extent but mainly focuses on computer science application but uses graph theoretical concepts

    A Deep Understanding of Structural and Functional Behavior of Tabular and Graphical Modules in Technical Documents

    Get PDF
    The rapid increase of published research papers in recent years has escalated the need for automated ways to process and understand them. The successful recognition of the information that is contained in technical documents, depends on the understanding of the document’s individual modalities. These modalities include tables, graphics, diagrams and etc. as defined in Bourbakis’ pioneering work. However, the depth of understanding is correlated to the efficiency of detection and recognition. In this work, a novel methodology is proposed for automatic processing of and understanding of tables and graphics images in technical document. Previous attempts on tables and graphics understanding retrieve only superficial knowledge such as table contents and axis values. However, the focus on capturing the internal associations and relations between the extracted data from each figure is studied here. The proposed methodology is divided into the following steps: 1) figure detection, 2) figure recognition, 3) figure understanding, by figures we mean tables, graphics and diagrams. More specifically, we evaluate different heuristic and learning methods for classifying table and graphics images as part of the detection module. Table recognition and deep understanding includes the extraction of the knowledge that is illustrated in a table image along with the deeper associations between the table variables. The graphics recognition module follows a clustering based approach in order to recognize middle points. Middle points are 2D points where the direction of the curves changes. They delimit the straight line segments that construct the graphics curves. We use these detected middle points in order to understand various features of each line segment and the associations between them. Additionally, we convert the extracted internal tabular associations and the captured curves’ structural and functional behavior into a common and at the same time unique form of representation, which is the Stochastic Petri-net (SPN) graphs. The use of SPN graphs allow for the merging of different document modalities through the functions that describe them, without any prior knowledge about what these functions are. Finally, we achieve a higher level of document understanding through the synergistic merging of the aforementioned SPN graphs that we extract from the table and graphics modalities. We provide results from every step of the document modalities understanding methodologies and the synergistic merging as proof of concept for this research

    Hermes: an Ontology-Based News Personalization Portal

    Get PDF
    Nowadays, news feeds provide Web users with access to an unlimited amount of news items, however only a subset of them is relevant. Therefore, users should be able to select the most relevant concepts, about which they want to retrieve news. Although keyword search engines provide users with the ability to filter news items, they lack the power of understanding the domain where the news items reside. The aim of this paper is to propose a solution that provides users with the ability to ask for news items related to specific concepts they are interested in. This is accomplished by creating an ontology, developing a classifying system that populates the ontology by making use of a knowledge base, and providing an innovative graph representation of the ontology to retrieve relevant news items. A characteristic feature of our approach is the consideration of both concepts and concept relationships for the retrieval of user-relevant items.semantic web; news classification; ontologies; OWL; SPARQL; decision support

    e-Counterfeit: a mobile-server platform for document counterfeit detection

    Full text link
    This paper presents a novel application to detect counterfeit identity documents forged by a scan-printing operation. Texture analysis approaches are proposed to extract validation features from security background that is usually printed in documents as IDs or banknotes. The main contribution of this work is the end-to-end mobile-server architecture, which provides a service for non-expert users and therefore can be used in several scenarios. The system also provides a crowdsourcing mode so labeled images can be gathered, generating databases for incremental training of the algorithms.Comment: 6 pages, 5 figure

    Extraction of Scores and Average From Algerian High-School Degree Transcripts

    Get PDF
    A system for extracting scores and average from Algerian High School Degree Transcripts is proposed. The system extracts the scores and the average based on the localization of the tables gathering this information and it consists of several stages. After preprocessing, the system locates the tables using ruling-lines information as well as other text information. Therefore, the adopted localization approach can work even in the absence of certain ruling-lines or the erasure and discontinuity of lines. After that, the localized tables are segmented into columns and the columns into information cells. Finally, cells labeling is done based on the prior knowledge of the tables structure allowing to identify the scores and the average. Experiments have been conducted on a local dataset in order to evaluate the performances of our system and compare it with three public systems at three levels, and the obtained results show the effectiveness of our system

    Dagstuhl Reports : Volume 1, Issue 2, February 2011

    Get PDF
    Online Privacy: Towards Informational Self-Determination on the Internet (Dagstuhl Perspectives Workshop 11061) : Simone Fischer-Hübner, Chris Hoofnagle, Kai Rannenberg, Michael Waidner, Ioannis Krontiris and Michael Marhöfer Self-Repairing Programs (Dagstuhl Seminar 11062) : Mauro Pezzé, Martin C. Rinard, Westley Weimer and Andreas Zeller Theory and Applications of Graph Searching Problems (Dagstuhl Seminar 11071) : Fedor V. Fomin, Pierre Fraigniaud, Stephan Kreutzer and Dimitrios M. Thilikos Combinatorial and Algorithmic Aspects of Sequence Processing (Dagstuhl Seminar 11081) : Maxime Crochemore, Lila Kari, Mehryar Mohri and Dirk Nowotka Packing and Scheduling Algorithms for Information and Communication Services (Dagstuhl Seminar 11091) Klaus Jansen, Claire Mathieu, Hadas Shachnai and Neal E. Youn

    A Novel Framework for Interactive Visualization and Analysis of Hyperspectral Image Data

    Get PDF
    corecore