47 research outputs found

    Topian 0.1 Reference Manual

    Get PDF
    This document describes Topian ("Topic-based Model layer for Xapian"), a software layer intended to add support for topical models to Xapian

    Free Software for research in Information Retrieval and Textual Clustering

    Get PDF
    The document provides an overview of the main Free ("Open Source") software of interest for research in Information Retrieval, as well as some background on the context. I provides a guideline for choosing appropriate tools

    Inclusion de sens dans la représentation de documents textuels : état de l'art

    Get PDF
    Ce document donne un aperçu de l'état de l'art dans le domaine de la représentation du sens dans les documents textuels

    Large-scale extraction of brain connectivity from the neuroscientific literature

    Get PDF
    Motivation: In neuroscience, as in many other scientific domains, the primary form of knowledge dissemination is through published articles. One challenge for modern neuroinformatics is finding methods to make the knowledge from the tremendous backlog of publications accessible for search, analysis and the integration of such data into computational models. A key example of this is metascale brain connectivity, where results are not reported in a normalized repository. Instead, these experimental results are published in natural language, scattered among individual scientific publications. This lack of normalization and centralization hinders the large-scale integration of brain connectivity results. In this article, we present text-mining models to extract and aggregate brain connectivity results from 13.2 million PubMed abstracts and 630 216 full-text publications related to neuroscience. The brain regions are identified with three different named entity recognizers (NERs) and then normalized against two atlases: the Allen Brain Atlas (ABA) and the atlas from the Brain Architecture Management System (BAMS). We then use three different extractors to assess inter-region connectivity. Results: NERs and connectivity extractors are evaluated against a manually annotated corpus. The complete in litero extraction models are also evaluated against invivo connectivity data from ABA with an estimated precision of 78%. The resulting database contains over 4 million brain region mentions and over 100 000 (ABA) and 122 000 (BAMS) potential brain region connections. This database drastically accelerates connectivity literature review, by providing a centralized repository of connectivity data to neuroscientists. Availability and implementation: The resulting models are publicly available at github.com/BlueBrain/bluima. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin

    Tool for robust stochastic parsing using optimal maximum coverage

    Get PDF
    This report presents a robust syntactic parser that is able to return a "correct" derivation tree even if the grammar cannot generate the input sentence. The following two step solution is prop osed: the finest corresponding most probable optimal maximum coverage is generated first, then the trees from this coverage are glued into one resulting tree. We discuss the implementation of this method with the SLP toolkit and libkp library

    Large-scale extraction of brain connectivity from the neuroscientific literature

    Get PDF
    In neuroscience, as in many other scientific domains, the primary form of knowledge dissemination is through published articles. One challenge for modern neuroinformatics is finding methods to make the knowledge from the tremendous backlog of publications accessible for search, analysis and the integration of such data into computational models. A key example of this is metascale brain connectivity, where results are not reported in a normalized repository. Instead, these experimental results are published in natural language, scattered among individual scientific publications. This lack of normalization and centralization hinders the large-scale integration of brain connectivity results. In this article, we present text-mining models to extract and aggregate brain connectivity results from 13.2 million PubMed abstracts and 630 216 full-text publications related to neuroscience. The brain regions are identified with three different named entity recognizers (NERs) and then normalized against two atlases: the Allen Brain Atlas (ABA) and the atlas from the Brain Architecture Management System (BAMS). We then use three different extractors to assess inter-region connectivity

    INtegrating SPEech acoustic and linguistic Constraints: Baseline System Development

    Get PDF
    In this report, we discuss the initial issues addressed in a research project aiming at the development of an advanced natural speech recognition system for the automatic processing of telephone directory requests. This multi-faceted project involves (1) text processing (labeling and tagging) of a large database of telephone-based natural voice requests (including all kinds of peculiarities), (2) development of robust acoustic models, (3) integrating advanced natural language (syntactic and semantic) constraints, (4) detecting and dealing with a large number of out-of-vocabulary words (proper names), and (5) testing of the resulting system on natural queries. All this work will be performed on the basis of a database containing prompted (read) speech and (simulated) natural requests to information service. This report describes the initial steps that were required to set up a reasonable baseline system and a good research and evaluation framework. More specifically, a significant amount of time was devoted to proper text processing of speaker request transcriptions, in order to create the basis necessary for the lexical and linguistic modeling, as well as for the evaluation of recognition results

    Finding instabilities in the community structure of complex networks

    Full text link
    The problem of finding clusters in complex networks has been extensively studied by mathematicians, computer scientists and, more recently, by physicists. Many of the existing algorithms partition a network into clear clusters, without overlap. We here introduce a method to identify the nodes lying ``between clusters'' and that allows for a general measure of the stability of the clusters. This is done by adding noise over the weights of the edges of the network. Our method can in principle be applied with any clustering algorithm, provided that it works on weighted networks. We present several applications on real-world networks using the Markov Clustering Algorithm (MCL).Comment: 4 pages, 5 figure

    Offline grammar-based recognition of handwritten sentences

    Get PDF
    This paper proposes a sequential coupling of a Hidden Markov Model (HMM) recognizer for offline handwritten English sentences with a probabilistic bottom-up chart parser using Stochastic Context-Free Grammars (SCFG) extracted from a text corpus. Based on extensive experiments, we conclude that syntax analysis helps to improve recognition rates significantly
    corecore