277 research outputs found

    Various Types of Learning with Types

    Get PDF
    International audienceThis paper suggests another look on already known grammatical inference approaches based on specialization strategies

    How to Split Recursive Automata

    Get PDF
    International audienceIn this paper, we interpret in terms of operations applying on extended finite state automata some algorithms that have been specified on categorial grammars to learn subclasses of context-free languages. The algorithms considered implement "specialization strategies". This new perspective also helps to understand how it is possible to control the combinatorial explosion that specialization techniques have to face, thanks to a typing approach

    Effective Spoken Language Labeling with Deep Recurrent Neural Networks

    Full text link
    Understanding spoken language is a highly complex problem, which can be decomposed into several simpler tasks. In this paper, we focus on Spoken Language Understanding (SLU), the module of spoken dialog systems responsible for extracting a semantic interpretation from the user utterance. The task is treated as a labeling problem. In the past, SLU has been performed with a wide variety of probabilistic models. The rise of neural networks, in the last couple of years, has opened new interesting research directions in this domain. Recurrent Neural Networks (RNNs) in particular are able not only to represent several pieces of information as embeddings but also, thanks to their recurrent architecture, to encode as embeddings relatively long contexts. Such long contexts are in general out of reach for models previously used for SLU. In this paper we propose novel RNNs architectures for SLU which outperform previous ones. Starting from a published idea as base block, we design new deep RNNs achieving state-of-the-art results on two widely used corpora for SLU: ATIS (Air Traveling Information System), in English, and MEDIA (Hotel information and reservation in France), in French.Comment: 8 pages. Rejected from IJCAI 2017, good remarks overall, but slightly off-topic as from global meta-reviews. Recommendations: 8, 6, 6, 4. arXiv admin note: text overlap with arXiv:1706.0174

    The Crotal SRL System : a Generic Tool Based on Tree-structured CRF

    No full text
    International audienceWe present the Crotal system, used in the CoNLL09 Shared Task. It is based on XCRF, a highly configurable CRF library which can take into account hierarchical relations. This system had never been used in such a context thus the performance is average, but we are confident that there is room for progression

    Adapt a Text-Oriented Chunker for Oral Data: How Much Manual Effort is Necessary?

    Get PDF
    International audienceIn this paper, we try three distinct approaches to chunk transcribed oral data with labeling tools learnt from a corpus of written texts. The purposeis to reach the best possible results with the least possible manual correction or re-learning effort

    SuSE : Subspace Selection embedded in an EM algorithm

    Get PDF
    National audienceSubspace clustering is an extension of traditional clustering that seeks to find clusters embedded in different subspaces within a dataset. This is a particularly important challenge with high dimensional data where the curse of dimensionality occurs. It also has the benefit of providing smaller descriptions of the clusters found. In this field, we show that using probabilistic models provides many advantages over other existing methods. In particular, we show that the difficult problem of the parameter settings of subspace clustering algorithms can be seen as a model selection problem in the framework of probabilistic models. It thus allows us to design a method that does not require any input parameter from the user. We also point out the interest in allowing the clusters to overlap. And finally, we show that it is well suited for detecting the noise that may exist in the data, and that this helps to provide a more understandable representation of the clusters found

    Champs Conditionnels Aléatoires pour l'Annotation d'Arbres

    Get PDF
    National audienceAvec en vue la transformation de documents semi-structurés de type XML, nous nous intéressons au problème de l'annotation de tels documents par apprentissage statistique, à partir d'exemples de documents déjà annotés. Afin de modéliser la probabilité d'une annotation connaissant un document, nous nous plaçons dans le cadre des champs conditionnels aléatoires. Ce modèle a déjà fait ses preuves pour l'annotation de séquences : nous l'adaptons ici aux arbres ordonnés d'arité non bornée. Nous étudions l'expressivité du nouveau modèle ainsi introduit en le comparant aux automates d'arbres stochastiques (ou grammaires régulières probabilistes d'arbres). Nous présentons aussi en détail l'algorithme de recherche de l'annotation la plus probable et l'algorithme d'inférence pour ce modèle. Ces algorithmes sont implantés dans une librairie Tree CRF écrite en JAVA. Ces travaux sont des préliminaires qui nous permettront par la suite d'étudier les applications du modèle pour la transformation de documents

    Learnability of Pregroup Grammars

    Get PDF
    International audienceThis paper investigates the learnability by positive examples in the sense of Gold of Pregroup Grammars. In a first part, Pregroup Grammars are presented and a new parsing strategy is proposed. Then, theoretical learnability and non-learnability results for subclasses of Pregroup Grammars are proved. In the last two parts, we focus on learning Pregroup Grammars from a special kind of input called feature-tagged examples. A learning algorithm based on the parsing strategy presented in the first part is given. Its validity is proved and its properties are examplified
    corecore