388 research outputs found

    Learning Grammars for Architecture-Specific Facade Parsing

    Get PDF
    International audienceParsing facade images requires optimal handcrafted grammar for a given class of buildings. Such a handcrafted grammar is often designed manually by experts. In this paper, we present a novel framework to learn a compact grammar from a set of ground-truth images. To this end, parse trees of ground-truth annotated images are obtained running existing inference algorithms with a simple, very general grammar. From these parse trees, repeated subtrees are sought and merged together to share derivations and produce a grammar with fewer rules. Furthermore, unsupervised clustering is performed on these rules, so that, rules corresponding to the same complex pattern are grouped together leading to a rich compact grammar. Experimental validation and comparison with the state-of-the-art grammar-based methods on four diff erent datasets show that the learned grammar helps in much faster convergence while producing equal or more accurate parsing results compared to handcrafted grammars as well as grammars learned by other methods. Besides, we release a new dataset of facade images from Paris following the Art-deco style and demonstrate the general applicability and extreme potential of the proposed framework

    Analyse d’images de documents patrimoniaux : une approche structurelle à base de texture

    Get PDF
    Over the last few years, there has been tremendous growth in digitizing collections of cultural heritage documents. Thus, many challenges and open issues have been raised, such as information retrieval in digital libraries or analyzing page content of historical books. Recently, an important need has emerged which consists in designing a computer-aided characterization and categorization tool, able to index or group historical digitized book pages according to several criteria, mainly the layout structure and/or typographic/graphical characteristics of the historical document image content. Thus, the work conducted in this thesis presents an automatic approach for characterization and categorization of historical book pages. The proposed approach is applicable to a large variety of ancient books. In addition, it does not assume a priori knowledge regarding document image layout and content. It is based on the use of texture and graph algorithms to provide a rich and holistic description of the layout and content of the analyzed book pages to characterize and categorize historical book pages. The categorization is based on the characterization of the digitized page content by texture, shape, geometric and topological descriptors. This characterization is represented by a structural signature. More precisely, the signature-based characterization approach consists of two main stages. The first stage is extracting homogeneous regions. Then, the second one is proposing a graph-based page signature which is based on the extracted homogeneous regions, reflecting its layout and content. Afterwards, by comparing the different obtained graph-based signatures using a graph-matching paradigm, the similarities of digitized historical book page layout and/or content can be deduced. Subsequently, book pages with similar layout and/or content can be categorized and grouped, and a table of contents/summary of the analyzed digitized historical book can be provided automatically. As a consequence, numerous signature-based applications (e.g. information retrieval in digital libraries according to several criteria, page categorization) can be implemented for managing effectively a corpus or collections of books. To illustrate the effectiveness of the proposed page signature, a detailed experimental evaluation has been conducted in this work for assessing two possible categorization applications, unsupervised page classification and page stream segmentation. In addition, the different steps of the proposed approach have been evaluated on a large variety of historical document images.Les rĂ©cents progrĂšs dans la numĂ©risation des collections de documents patrimoniaux ont ravivĂ© de nouveaux dĂ©fis afin de garantir une conservation durable et de fournir un accĂšs plus large aux documents anciens. En parallĂšle de la recherche d'information dans les bibliothĂšques numĂ©riques ou l'analyse du contenu des pages numĂ©risĂ©es dans les ouvrages anciens, la caractĂ©risation et la catĂ©gorisation des pages d'ouvrages anciens a connu rĂ©cemment un regain d'intĂ©rĂȘt. Les efforts se concentrent autant sur le dĂ©veloppement d'outils rapides et automatiques de caractĂ©risation et catĂ©gorisation des pages d'ouvrages anciens, capables de classer les pages d'un ouvrage numĂ©risĂ© en fonction de plusieurs critĂšres, notamment la structure des mises en page et/ou les caractĂ©ristiques typographiques/graphiques du contenu de ces pages. Ainsi, dans le cadre de cette thĂšse, nous proposons une approche permettant la caractĂ©risation et la catĂ©gorisation automatiques des pages d'un ouvrage ancien. L'approche proposĂ©e se veut indĂ©pendante de la structure et du contenu de l'ouvrage analysĂ©. Le principal avantage de ce travail rĂ©side dans le fait que l'approche s'affranchit des connaissances prĂ©alables, que ce soit concernant le contenu du document ou sa structure. Elle est basĂ©e sur une analyse des descripteurs de texture et une reprĂ©sentation structurelle en graphe afin de fournir une description riche permettant une catĂ©gorisation Ă  partir du contenu graphique (capturĂ© par la texture) et des mises en page (reprĂ©sentĂ©es par des graphes). En effet, cette catĂ©gorisation s'appuie sur la caractĂ©risation du contenu de la page numĂ©risĂ©e Ă  l'aide d'une analyse des descripteurs de texture, de forme, gĂ©omĂ©triques et topologiques. Cette caractĂ©risation est dĂ©finie Ă  l'aide d'une reprĂ©sentation structurelle. Dans le dĂ©tail, l'approche de catĂ©gorisation se dĂ©compose en deux Ă©tapes principales successives. La premiĂšre consiste Ă  extraire des rĂ©gions homogĂšnes. La seconde vise Ă  proposer une signature structurelle Ă  base de texture, sous la forme d'un graphe, construite Ă  partir des rĂ©gions homogĂšnes extraites et reflĂ©tant la structure de la page analysĂ©e. Cette signature assure la mise en Ɠuvre de nombreuses applications pour gĂ©rer efficacement un corpus ou des collections de livres patrimoniaux (par exemple, la recherche d'information dans les bibliothĂšques numĂ©riques en fonction de plusieurs critĂšres, ou la catĂ©gorisation des pages d'un mĂȘme ouvrage). En comparant les diffĂ©rentes signatures structurelles par le biais de la distance d'Ă©dition entre graphes, les similitudes entre les pages d'un mĂȘme ouvrage en termes de leurs mises en page et/ou contenus peuvent ĂȘtre dĂ©duites. Ainsi de suite, les pages ayant des mises en page et/ou contenus similaires peuvent ĂȘtre catĂ©gorisĂ©es, et un rĂ©sumĂ©/une table des matiĂšres de l'ouvrage analysĂ© peut ĂȘtre alors gĂ©nĂ©rĂ© automatiquement. Pour illustrer l'efficacitĂ© de la signature proposĂ©e, une Ă©tude expĂ©rimentale dĂ©taillĂ©e a Ă©tĂ© menĂ©e dans ce travail pour Ă©valuer deux applications possibles de catĂ©gorisation de pages d'un mĂȘme ouvrage, la classification non supervisĂ©e de pages et la segmentation de flux de pages d'un mĂȘme ouvrage. En outre, les diffĂ©rentes Ă©tapes de l'approche proposĂ©e ont donnĂ© lieu Ă  des Ă©valuations par le biais d'expĂ©rimentations menĂ©es sur un large corpus de documents patrimoniaux

    Incorporation of relational information in feature representation for online handwriting recognition of Arabic characters

    Get PDF
    Interest in online handwriting recognition is increasing due to market demand for both improved performance and for extended supporting scripts for digital devices. Robust handwriting recognition of complex patterns of arbitrary scale, orientation and location is elusive to date because reaching a target recognition rate is not trivial for most of the applications in this field. Cursive scripts such as Arabic and Persian with complex character shapes make the recognition task even more difficult. Challenges in the discrimination capability of handwriting recognition systems depend heavily on the effectiveness of the features used to represent the data, the types of classifiers deployed and inclusive databases used for learning and recognition which cover variations in writing styles that introduce natural deformations in character shapes. This thesis aims to improve the efficiency of online recognition systems for Persian and Arabic characters by presenting new formal feature representations, algorithms, and a comprehensive database for online Arabic characters. The thesis contains the development of the first public collection of online handwritten data for the Arabic complete-shape character set. New ideas for incorporating relational information in a feature representation for this type of data are presented. The proposed techniques are computationally efficient and provide compact, yet representative, feature vectors. For the first time, a hybrid classifier is used for recognition of online Arabic complete-shape characters based on the idea of decomposing the input data into variables representing factors of the complete-shape characters and the combined use of the Bayesian network inference and support vector machines. We advocate the usefulness and practicality of the features and recognition methods with respect to the recognition of conventional metrics, such as accuracy and timeliness, as well as unconventional metrics. In particular, we evaluate a feature representation for different character class instances by its level of separation in the feature space. Our evaluation results for the available databases and for our own database of the characters' main shapes confirm a higher efficiency than previously reported techniques with respect to all metrics analyzed. For the complete-shape characters, our techniques resulted in a unique recognition efficiency comparable with the state-of-the-art results for main shape characters

    Incorporating prior knowledge into deep neural networks without handcrafted features

    Get PDF
    Deep learning (DL) is currently the largest area of research within artificial intelligence (AI). This success can largely be attributed to the data-driven nature of the DL algorithms themselves: unlike previous approaches in AI which required handcrafting and significant human intervention, DL models can be implemented and trained with little to no human involvement. The lack of handcrafting, however, can be a two-edged sword. DL algorithms are notorious for producing uninterpretable features, generalising badly to new tasks and relying on extraordinarily large datasets for training. In this thesis, on the assumption that these shortcomings are symptoms of the under-constrained training setup of deep networks, we address the question of how to incorporate knowledge into DL algorithms without reverting to complete handcrafting in order to train more data efficient algorithms. % In this thesis we consider different alternatives to this problem. We start by motivating this line of work with an example of a DL architecture which, inspired by symbolic AI, aims at extracting symbols from a simple environment and using those for quickly learning downstream tasks. Our proof-of-concept model shows that it is possible to address some of the data efficiency issues as well as obtaining more interpretable representations by reasoning at this higher level of abstraction. Our second approach for data-efficiency is based on pre-training: the idea is to pre-train some parts of the DL network on a different, but related, task to first learn useful feature extractors. For our experiments we pre-train the encoder of a reinforcement learning agent on a 3D scene prediction task and then use the features produced by the encoder to train a simulated robot arm on a reaching task. Crucially, unlike previous approaches that could only learn from fixed view-points, we are able to train an agent using observations captured from randomly changing positions around the robot arm, without having to train a separate policy for each observation position. Lastly, we focus on how to build in prior knowledge through the choice of dataset. To this end, instead of training DL models on a single dataset, we train them on a distribution over datasets that captures the space of tasks we are interested in. This training regime produces models that can flexibly adapt to any dataset within the distribution at test time. Crucially they only need a small number of observations in order to adapt their predictions, thus addressing the data-efficiency challenge at test time. We call this family of meta-learning models for few-shot prediction Neural Processes (NPs). In addition to successfully learning how to carry out few-shot regression and classification, NPs produce uncertainty estimates and can generate coherent samples at arbitrary resolutions.Open Acces

    Hierarchical feature extraction from spatiotemporal data for cyber-physical system analytics

    Get PDF
    With the advent of ubiquitous sensing, robust communication and advanced computation, data-driven modeling is increasingly becoming popular for many engineering problems. Eliminating diïŹƒculties of physics-based modeling, avoiding simplifying assumptions and ad hoc empirical models are signiïŹcant among many advantages of data-driven approaches, especially for large-scale complex systems. While classical statistics and signal processing algorithms have been widely used by the engineering community, advanced machine learning techniques have not been suïŹƒciently explored in this regard. This study summarizes various categories of machine learning tools that have been applied or may be a candidate for addressing engineering problems. While there are increasing number of machine learning algorithms, the main steps involved in applying such techniques to the problems consist in: data collection and pre-processing, feature extraction, model training and inference for decision-making. To support decision-making processes in many applications, hierarchical feature extraction is key. Among various feature extraction principles, recent studies emphasize hierarchical approaches of extracting salient features that is carried out at multiple abstraction levels from data. In this context, the focus of the dissertation is towards developing hierarchical feature extraction algorithms within the framework of machine learning in order to solve challenging cyber-physical problems in various domains such as electromechanical systems and agricultural systems. Furthermore, the feature extraction techniques are described using the spatial, temporal and spatiotemporal data types collected from the systems. The wide applicability of such features in solving some selected real-life domain problems are demonstrated throughout this study

    A Defense of Pure Connectionism

    Full text link
    Connectionism is an approach to neural-networks-based cognitive modeling that encompasses the recent deep learning movement in artificial intelligence. It came of age in the 1980s, with its roots in cybernetics and earlier attempts to model the brain as a system of simple parallel processors. Connectionist models center on statistical inference within neural networks with empirically learnable parameters, which can be represented as graphical models. More recent approaches focus on learning and inference within hierarchical generative models. Contra influential and ongoing critiques, I argue in this dissertation that the connectionist approach to cognitive science possesses in principle (and, as is becoming increasingly clear, in practice) the resources to model even the most rich and distinctly human cognitive capacities, such as abstract, conceptual thought and natural language comprehension and production. Consonant with much previous philosophical work on connectionism, I argue that a core principle—that proximal representations in a vector space have similar semantic values—is the key to a successful connectionist account of the systematicity and productivity of thought, language, and other core cognitive phenomena. My work here differs from preceding work in philosophy in several respects: (1) I compare a wide variety of connectionist responses to the systematicity challenge and isolate two main strands that are both historically important and reflected in ongoing work today: (a) vector symbolic architectures and (b) (compositional) vector space semantic models; (2) I consider very recent applications of these approaches, including their deployment on large-scale machine learning tasks such as machine translation; (3) I argue, again on the basis mostly of recent developments, for a continuity in representation and processing across natural language, image processing and other domains; (4) I explicitly link broad, abstract features of connectionist representation to recent proposals in cognitive science similar in spirit, such as hierarchical Bayesian and free energy minimization approaches, and offer a single rebuttal of criticisms of these related paradigms; (5) I critique recent alternative proposals that argue for a hybrid Classical (i.e. serial symbolic)/statistical model of mind; (6) I argue that defending the most plausible form of a connectionist cognitive architecture requires rethinking certain distinctions that have figured prominently in the history of the philosophy of mind and language, such as that between word- and phrase-level semantic content, and between inference and association
    • 

    corecore