964 research outputs found

    Patterns versus Characters in Subword-aware Neural Language Modeling

    Full text link
    Words in some natural languages can have a composite structure. Elements of this structure include the root (that could also be composite), prefixes and suffixes with which various nuances and relations to other words can be expressed. Thus, in order to build a proper word representation one must take into account its internal structure. From a corpus of texts we extract a set of frequent subwords and from the latter set we select patterns, i.e. subwords which encapsulate information on character nn-gram regularities. The selection is made using the pattern-based Conditional Random Field model with l1l_1 regularization. Further, for every word we construct a new sequence over an alphabet of patterns. The new alphabet's symbols confine a local statistical context stronger than the characters, therefore they allow better representations in Rn{\mathbb{R}}^n and are better building blocks for word representation. In the task of subword-aware language modeling, pattern-based models outperform character-based analogues by 2-20 perplexity points. Also, a recurrent neural network in which a word is represented as a sum of embeddings of its patterns is on par with a competitive and significantly more sophisticated character-based convolutional architecture.Comment: 10 page

    Structured Prediction of Sequences and Trees using Infinite Contexts

    Full text link
    Linguistic structures exhibit a rich array of global phenomena, however commonly used Markov models are unable to adequately describe these phenomena due to their strong locality assumptions. We propose a novel hierarchical model for structured prediction over sequences and trees which exploits global context by conditioning each generation decision on an unbounded context of prior decisions. This builds on the success of Markov models but without imposing a fixed bound in order to better represent global phenomena. To facilitate learning of this large and unbounded model, we use a hierarchical Pitman-Yor process prior which provides a recursive form of smoothing. We propose prediction algorithms based on A* and Markov Chain Monte Carlo sampling. Empirical results demonstrate the potential of our model compared to baseline finite-context Markov models on part-of-speech tagging and syntactic parsing

    3DQ: Compact Quantized Neural Networks for Volumetric Whole Brain Segmentation

    Full text link
    Model architectures have been dramatically increasing in size, improving performance at the cost of resource requirements. In this paper we propose 3DQ, a ternary quantization method, applied for the first time to 3D Fully Convolutional Neural Networks (F-CNNs), enabling 16x model compression while maintaining performance on par with full precision models. We extensively evaluate 3DQ on two datasets for the challenging task of whole brain segmentation. Additionally, we showcase our method's ability to generalize on two common 3D architectures, namely 3D U-Net and V-Net. Outperforming a variety of baselines, the proposed method is capable of compressing large 3D models to a few MBytes, alleviating the storage needs in space critical applications.Comment: Accepted to MICCAI 201

    Label-Dependencies Aware Recurrent Neural Networks

    Full text link
    In the last few years, Recurrent Neural Networks (RNNs) have proved effective on several NLP tasks. Despite such great success, their ability to model \emph{sequence labeling} is still limited. This lead research toward solutions where RNNs are combined with models which already proved effective in this domain, such as CRFs. In this work we propose a solution far simpler but very effective: an evolution of the simple Jordan RNN, where labels are re-injected as input into the network, and converted into embeddings, in the same way as words. We compare this RNN variant to all the other RNN models, Elman and Jordan RNN, LSTM and GRU, on two well-known tasks of Spoken Language Understanding (SLU). Thanks to label embeddings and their combination at the hidden layer, the proposed variant, which uses more parameters than Elman and Jordan RNNs, but far fewer than LSTM and GRU, is more effective than other RNNs, but also outperforms sophisticated CRF models.Comment: 22 pages, 3 figures. Accepted at CICling 2017 conference. Best Verifiability, Reproducibility, and Working Description awar

    Using Regular Languages to Explore the Representational Capacity of Recurrent Neural Architectures

    Get PDF
    The presence of Long Distance Dependencies (LDDs) in sequential data poses significant challenges for computational models. Various recurrent neural architectures have been designed to mitigate this issue. In order to test these state-of-the-art architectures, there is growing need for rich benchmarking datasets. However, one of the drawbacks of existing datasets is the lack of experimental control with regards to the presence and/or degree of LDDs. This lack of control limits the analysis of model performance in relation to the specific challenge posed by LDDs. One way to address this is to use synthetic data having the properties of subregular languages. The degree of LDDs within the generated data can be controlled through the k parameter, length of the generated strings, and by choosing appropriate forbidden strings. In this paper, we explore the capacity of different RNN extensions to model LDDs, by evaluating these models on a sequence of SPk synthesized datasets, where each subsequent dataset exhibits a longer degree of LDD. Even though SPk are simple languages, the presence of LDDs does have significant impact on the performance of recurrent neural architectures, thus making them prime candidate in benchmarking tasks.Comment: International Conference of Artificial Neural Networks (ICANN) 201

    A Multi-Armed Bandit to Smartly Select a Training Set from Big Medical Data

    Full text link
    With the availability of big medical image data, the selection of an adequate training set is becoming more important to address the heterogeneity of different datasets. Simply including all the data does not only incur high processing costs but can even harm the prediction. We formulate the smart and efficient selection of a training dataset from big medical image data as a multi-armed bandit problem, solved by Thompson sampling. Our method assumes that image features are not available at the time of the selection of the samples, and therefore relies only on meta information associated with the images. Our strategy simultaneously exploits data sources with high chances of yielding useful samples and explores new data regions. For our evaluation, we focus on the application of estimating the age from a brain MRI. Our results on 7,250 subjects from 10 datasets show that our approach leads to higher accuracy while only requiring a fraction of the training data.Comment: MICCAI 2017 Proceeding

    Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus

    Get PDF
    The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning

    Test-retest reliability of temporal and spatial gait characteristics measured with an instrumented walkway system (GAITRite(®))

    Get PDF
    BACKGROUND: The purpose of this study was to determine the test-retest reliability of temporal and spatial gait measurements over a one-week period as measured using an instrumented walkway system (GAITRite(®)). METHODS: Subjects were tested on two occasions one week apart. Measurements were made at preferred and fast walking speeds using the GAITRite(® )system. Measurements tested included walking speed, step length, stride length, base of support, step time, stride time, swing time, stance time, single and double support times, and toe in-toe out angle. RESULTS: Twenty-one healthy subjects participated in this study. The group consisted of 12 men and 9 women, with an average age of 34 years (range: 19 – 59 years). At preferred walking speed, all gait measurements had ICC's of 0.92 and higher, except base of support which had an ICC of 0.80. At fast walking speed all gait measurements had ICC's above 0.89 except base of support (ICC = 0.79), CONCLUSIONS: Spatial-temporal gait measurements demonstrate good to excellent test-retest reliability over a one-week time span

    Consensus statements on the utility of defining ARDS and the utility of past and current definitions of ARDS—protocol for a Delphi study

    Get PDF
    Introduction: Acute respiratory distress syndrome (ARDS), marked by acute hypoxemia and bilateral pulmonary infiltrates, has been defined in multiple ways since its first description. This Delphi study aims to collect global opinions on the conceptual framework of ARDS, assess the usefulness of components within current and past definitions and investigate the role of subphenotyping. The varied expertise of the panel will provide valuable insights for refining future ARDS definitions and improving clinical management. Methods: A diverse panel of 35–40 experts will be selected based on predefined criteria. Multiple choice questions (MCQs) or 7-point Likert-scale statements will be used in the iterative Delphi rounds to achieve consensus on key aspects related to the utility of definitions and subphenotyping. The Delphi rounds will be continued until a stable agreement or disagreement is achieved for all statements. Analysis: Consensus will be considered as reached when a choice in MCQs or Likert-scale statement achieved ≥80% of votes for agreement or disagreement. The stability will be checked by non-parametric χ2 tests or Kruskal Wallis test starting from the second round of Delphi process. A p-value ≥0.05 will be used to define stability. Ethics and dissemination: The study will be conducted in full concordance with the principles of the Declaration of Helsinki and will be reported according to CREDES guidance. This study has been granted an ethical approval waiver by the NMC Healthcare Regional Research Ethics Committee, Dubai (NMCHC/CR/DXB/REC/APP/002), owing to the nature of the research. Informed consent will be obtained from all panellists before the start of the Delphi process. The study will be published in a peer-review journal with the authorship agreed as per ICMJE requirements. Trial registration number: NCT06159465
    • …
    corecore