962 research outputs found

    Patterns versus Characters in Subword-aware Neural Language Modeling

    Full text link
    Words in some natural languages can have a composite structure. Elements of this structure include the root (that could also be composite), prefixes and suffixes with which various nuances and relations to other words can be expressed. Thus, in order to build a proper word representation one must take into account its internal structure. From a corpus of texts we extract a set of frequent subwords and from the latter set we select patterns, i.e. subwords which encapsulate information on character nn-gram regularities. The selection is made using the pattern-based Conditional Random Field model with l1l_1 regularization. Further, for every word we construct a new sequence over an alphabet of patterns. The new alphabet's symbols confine a local statistical context stronger than the characters, therefore they allow better representations in Rn{\mathbb{R}}^n and are better building blocks for word representation. In the task of subword-aware language modeling, pattern-based models outperform character-based analogues by 2-20 perplexity points. Also, a recurrent neural network in which a word is represented as a sum of embeddings of its patterns is on par with a competitive and significantly more sophisticated character-based convolutional architecture.Comment: 10 page

    Structured Prediction of Sequences and Trees using Infinite Contexts

    Full text link
    Linguistic structures exhibit a rich array of global phenomena, however commonly used Markov models are unable to adequately describe these phenomena due to their strong locality assumptions. We propose a novel hierarchical model for structured prediction over sequences and trees which exploits global context by conditioning each generation decision on an unbounded context of prior decisions. This builds on the success of Markov models but without imposing a fixed bound in order to better represent global phenomena. To facilitate learning of this large and unbounded model, we use a hierarchical Pitman-Yor process prior which provides a recursive form of smoothing. We propose prediction algorithms based on A* and Markov Chain Monte Carlo sampling. Empirical results demonstrate the potential of our model compared to baseline finite-context Markov models on part-of-speech tagging and syntactic parsing

    3DQ: Compact Quantized Neural Networks for Volumetric Whole Brain Segmentation

    Full text link
    Model architectures have been dramatically increasing in size, improving performance at the cost of resource requirements. In this paper we propose 3DQ, a ternary quantization method, applied for the first time to 3D Fully Convolutional Neural Networks (F-CNNs), enabling 16x model compression while maintaining performance on par with full precision models. We extensively evaluate 3DQ on two datasets for the challenging task of whole brain segmentation. Additionally, we showcase our method's ability to generalize on two common 3D architectures, namely 3D U-Net and V-Net. Outperforming a variety of baselines, the proposed method is capable of compressing large 3D models to a few MBytes, alleviating the storage needs in space critical applications.Comment: Accepted to MICCAI 201

    Label-Dependencies Aware Recurrent Neural Networks

    Full text link
    In the last few years, Recurrent Neural Networks (RNNs) have proved effective on several NLP tasks. Despite such great success, their ability to model \emph{sequence labeling} is still limited. This lead research toward solutions where RNNs are combined with models which already proved effective in this domain, such as CRFs. In this work we propose a solution far simpler but very effective: an evolution of the simple Jordan RNN, where labels are re-injected as input into the network, and converted into embeddings, in the same way as words. We compare this RNN variant to all the other RNN models, Elman and Jordan RNN, LSTM and GRU, on two well-known tasks of Spoken Language Understanding (SLU). Thanks to label embeddings and their combination at the hidden layer, the proposed variant, which uses more parameters than Elman and Jordan RNNs, but far fewer than LSTM and GRU, is more effective than other RNNs, but also outperforms sophisticated CRF models.Comment: 22 pages, 3 figures. Accepted at CICling 2017 conference. Best Verifiability, Reproducibility, and Working Description awar

    Using Regular Languages to Explore the Representational Capacity of Recurrent Neural Architectures

    Get PDF
    The presence of Long Distance Dependencies (LDDs) in sequential data poses significant challenges for computational models. Various recurrent neural architectures have been designed to mitigate this issue. In order to test these state-of-the-art architectures, there is growing need for rich benchmarking datasets. However, one of the drawbacks of existing datasets is the lack of experimental control with regards to the presence and/or degree of LDDs. This lack of control limits the analysis of model performance in relation to the specific challenge posed by LDDs. One way to address this is to use synthetic data having the properties of subregular languages. The degree of LDDs within the generated data can be controlled through the k parameter, length of the generated strings, and by choosing appropriate forbidden strings. In this paper, we explore the capacity of different RNN extensions to model LDDs, by evaluating these models on a sequence of SPk synthesized datasets, where each subsequent dataset exhibits a longer degree of LDD. Even though SPk are simple languages, the presence of LDDs does have significant impact on the performance of recurrent neural architectures, thus making them prime candidate in benchmarking tasks.Comment: International Conference of Artificial Neural Networks (ICANN) 201

    A Multi-Armed Bandit to Smartly Select a Training Set from Big Medical Data

    Full text link
    With the availability of big medical image data, the selection of an adequate training set is becoming more important to address the heterogeneity of different datasets. Simply including all the data does not only incur high processing costs but can even harm the prediction. We formulate the smart and efficient selection of a training dataset from big medical image data as a multi-armed bandit problem, solved by Thompson sampling. Our method assumes that image features are not available at the time of the selection of the samples, and therefore relies only on meta information associated with the images. Our strategy simultaneously exploits data sources with high chances of yielding useful samples and explores new data regions. For our evaluation, we focus on the application of estimating the age from a brain MRI. Our results on 7,250 subjects from 10 datasets show that our approach leads to higher accuracy while only requiring a fraction of the training data.Comment: MICCAI 2017 Proceeding

    Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus

    Get PDF
    The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning

    Test-retest reliability of temporal and spatial gait characteristics measured with an instrumented walkway system (GAITRite(®))

    Get PDF
    BACKGROUND: The purpose of this study was to determine the test-retest reliability of temporal and spatial gait measurements over a one-week period as measured using an instrumented walkway system (GAITRite(®)). METHODS: Subjects were tested on two occasions one week apart. Measurements were made at preferred and fast walking speeds using the GAITRite(® )system. Measurements tested included walking speed, step length, stride length, base of support, step time, stride time, swing time, stance time, single and double support times, and toe in-toe out angle. RESULTS: Twenty-one healthy subjects participated in this study. The group consisted of 12 men and 9 women, with an average age of 34 years (range: 19 – 59 years). At preferred walking speed, all gait measurements had ICC's of 0.92 and higher, except base of support which had an ICC of 0.80. At fast walking speed all gait measurements had ICC's above 0.89 except base of support (ICC = 0.79), CONCLUSIONS: Spatial-temporal gait measurements demonstrate good to excellent test-retest reliability over a one-week time span

    Randomized trial of bilateral versus single internal-thoracic-artery grafts

    Get PDF
    Background: The use of bilateral internal thoracic (mammary) arteries for coronary-artery bypass grafting (CABG) may improve long-term outcomes as compared with the use of a single internal-thoracic-artery plus vein grafts. Methods: We randomly assigned patients scheduled for CABG to undergo single or bilateral internal-thoracic-artery grafting in 28 cardiac surgical centers in seven countries. The primary outcome was death from any cause at 10 years. The composite of death from any cause, myocardial infarction, or stroke was a secondary outcome. Interim analyses were prespecified at 5 years of follow-up. Results: A total of 3102 patients were enrolled; 1554 were randomly assigned to undergo single internal-thoracic-artery grafting (the single-graft group) and 1548 to undergo bilateral internal-thoracic-artery grafting (the bilateral-graft group). At 5 years of follow-up, the rate of death was 8.7% in the bilateral-graft group and 8.4% in the single-graft group (hazard ratio, 1.04; 95% confidence interval [CI], 0.81 to 1.32; P=0.77), and the rate of the composite of death from any cause, myocardial infarction, or stroke was 12.2% and 12.7%, respectively (hazard ratio, 0.96; 95% CI, 0.79 to 1.17; P=0.69). The rate of sternal wound complication was 3.5% in the bilateral-graft group versus 1.9% in the single-graft group (P=0.005), and the rate of sternal reconstruction was 1.9% versus 0.6% (P=0.002). Conclusions: Among patients undergoing CABG, there was no significant difference between those receiving single internal-thoracic-artery grafts and those receiving bilateral internal-thoracic-artery grafts with regard to mortality or the rates of cardiovascular events at 5 years of follow-up. There were more sternal wound complications with bilateral internal-thoracic-artery grafting than with single internal-thoracic-artery grafting. Ten-year follow-up is ongoing. (Funded by the British Heart Foundation and others; ART Current Controlled Trials number, ISRCTN46552265.
    • …
    corecore