962 research outputs found
Patterns versus Characters in Subword-aware Neural Language Modeling
Words in some natural languages can have a composite structure. Elements of
this structure include the root (that could also be composite), prefixes and
suffixes with which various nuances and relations to other words can be
expressed. Thus, in order to build a proper word representation one must take
into account its internal structure. From a corpus of texts we extract a set of
frequent subwords and from the latter set we select patterns, i.e. subwords
which encapsulate information on character -gram regularities. The selection
is made using the pattern-based Conditional Random Field model with
regularization. Further, for every word we construct a new sequence over an
alphabet of patterns. The new alphabet's symbols confine a local statistical
context stronger than the characters, therefore they allow better
representations in and are better building blocks for word
representation. In the task of subword-aware language modeling, pattern-based
models outperform character-based analogues by 2-20 perplexity points. Also, a
recurrent neural network in which a word is represented as a sum of embeddings
of its patterns is on par with a competitive and significantly more
sophisticated character-based convolutional architecture.Comment: 10 page
Structured Prediction of Sequences and Trees using Infinite Contexts
Linguistic structures exhibit a rich array of global phenomena, however
commonly used Markov models are unable to adequately describe these phenomena
due to their strong locality assumptions. We propose a novel hierarchical model
for structured prediction over sequences and trees which exploits global
context by conditioning each generation decision on an unbounded context of
prior decisions. This builds on the success of Markov models but without
imposing a fixed bound in order to better represent global phenomena. To
facilitate learning of this large and unbounded model, we use a hierarchical
Pitman-Yor process prior which provides a recursive form of smoothing. We
propose prediction algorithms based on A* and Markov Chain Monte Carlo
sampling. Empirical results demonstrate the potential of our model compared to
baseline finite-context Markov models on part-of-speech tagging and syntactic
parsing
3DQ: Compact Quantized Neural Networks for Volumetric Whole Brain Segmentation
Model architectures have been dramatically increasing in size, improving
performance at the cost of resource requirements. In this paper we propose 3DQ,
a ternary quantization method, applied for the first time to 3D Fully
Convolutional Neural Networks (F-CNNs), enabling 16x model compression while
maintaining performance on par with full precision models. We extensively
evaluate 3DQ on two datasets for the challenging task of whole brain
segmentation. Additionally, we showcase our method's ability to generalize on
two common 3D architectures, namely 3D U-Net and V-Net. Outperforming a variety
of baselines, the proposed method is capable of compressing large 3D models to
a few MBytes, alleviating the storage needs in space critical applications.Comment: Accepted to MICCAI 201
Label-Dependencies Aware Recurrent Neural Networks
In the last few years, Recurrent Neural Networks (RNNs) have proved effective
on several NLP tasks. Despite such great success, their ability to model
\emph{sequence labeling} is still limited. This lead research toward solutions
where RNNs are combined with models which already proved effective in this
domain, such as CRFs. In this work we propose a solution far simpler but very
effective: an evolution of the simple Jordan RNN, where labels are re-injected
as input into the network, and converted into embeddings, in the same way as
words. We compare this RNN variant to all the other RNN models, Elman and
Jordan RNN, LSTM and GRU, on two well-known tasks of Spoken Language
Understanding (SLU). Thanks to label embeddings and their combination at the
hidden layer, the proposed variant, which uses more parameters than Elman and
Jordan RNNs, but far fewer than LSTM and GRU, is more effective than other
RNNs, but also outperforms sophisticated CRF models.Comment: 22 pages, 3 figures. Accepted at CICling 2017 conference. Best
Verifiability, Reproducibility, and Working Description awar
Using Regular Languages to Explore the Representational Capacity of Recurrent Neural Architectures
The presence of Long Distance Dependencies (LDDs) in sequential data poses
significant challenges for computational models. Various recurrent neural
architectures have been designed to mitigate this issue. In order to test these
state-of-the-art architectures, there is growing need for rich benchmarking
datasets. However, one of the drawbacks of existing datasets is the lack of
experimental control with regards to the presence and/or degree of LDDs. This
lack of control limits the analysis of model performance in relation to the
specific challenge posed by LDDs. One way to address this is to use synthetic
data having the properties of subregular languages. The degree of LDDs within
the generated data can be controlled through the k parameter, length of the
generated strings, and by choosing appropriate forbidden strings. In this
paper, we explore the capacity of different RNN extensions to model LDDs, by
evaluating these models on a sequence of SPk synthesized datasets, where each
subsequent dataset exhibits a longer degree of LDD. Even though SPk are simple
languages, the presence of LDDs does have significant impact on the performance
of recurrent neural architectures, thus making them prime candidate in
benchmarking tasks.Comment: International Conference of Artificial Neural Networks (ICANN) 201
Dutch parallel corpus: a balanced parallel corpus for Dutch-English and Dutch-French
status: publishe
A Multi-Armed Bandit to Smartly Select a Training Set from Big Medical Data
With the availability of big medical image data, the selection of an adequate
training set is becoming more important to address the heterogeneity of
different datasets. Simply including all the data does not only incur high
processing costs but can even harm the prediction. We formulate the smart and
efficient selection of a training dataset from big medical image data as a
multi-armed bandit problem, solved by Thompson sampling. Our method assumes
that image features are not available at the time of the selection of the
samples, and therefore relies only on meta information associated with the
images. Our strategy simultaneously exploits data sources with high chances of
yielding useful samples and explores new data regions. For our evaluation, we
focus on the application of estimating the age from a brain MRI. Our results on
7,250 subjects from 10 datasets show that our approach leads to higher accuracy
while only requiring a fraction of the training data.Comment: MICCAI 2017 Proceeding
Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus
The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning
Test-retest reliability of temporal and spatial gait characteristics measured with an instrumented walkway system (GAITRite(®))
BACKGROUND: The purpose of this study was to determine the test-retest reliability of temporal and spatial gait measurements over a one-week period as measured using an instrumented walkway system (GAITRite(®)). METHODS: Subjects were tested on two occasions one week apart. Measurements were made at preferred and fast walking speeds using the GAITRite(® )system. Measurements tested included walking speed, step length, stride length, base of support, step time, stride time, swing time, stance time, single and double support times, and toe in-toe out angle. RESULTS: Twenty-one healthy subjects participated in this study. The group consisted of 12 men and 9 women, with an average age of 34 years (range: 19 – 59 years). At preferred walking speed, all gait measurements had ICC's of 0.92 and higher, except base of support which had an ICC of 0.80. At fast walking speed all gait measurements had ICC's above 0.89 except base of support (ICC = 0.79), CONCLUSIONS: Spatial-temporal gait measurements demonstrate good to excellent test-retest reliability over a one-week time span
Randomized trial of bilateral versus single internal-thoracic-artery grafts
Background: The use of bilateral internal thoracic (mammary) arteries for coronary-artery bypass grafting (CABG) may improve long-term outcomes as compared with the use of a single internal-thoracic-artery plus vein grafts. Methods: We randomly assigned patients scheduled for CABG to undergo single or bilateral internal-thoracic-artery grafting in 28 cardiac surgical centers in seven countries. The primary outcome was death from any cause at 10 years. The composite of death from any cause, myocardial infarction, or stroke was a secondary outcome. Interim analyses were prespecified at 5 years of follow-up. Results: A total of 3102 patients were enrolled; 1554 were randomly assigned to undergo single internal-thoracic-artery grafting (the single-graft group) and 1548 to undergo bilateral internal-thoracic-artery grafting (the bilateral-graft group). At 5 years of follow-up, the rate of death was 8.7% in the bilateral-graft group and 8.4% in the single-graft group (hazard ratio, 1.04; 95% confidence interval [CI], 0.81 to 1.32; P=0.77), and the rate of the composite of death from any cause, myocardial infarction, or stroke was 12.2% and 12.7%, respectively (hazard ratio, 0.96; 95% CI, 0.79 to 1.17; P=0.69). The rate of sternal wound complication was 3.5% in the bilateral-graft group versus 1.9% in the single-graft group (P=0.005), and the rate of sternal reconstruction was 1.9% versus 0.6% (P=0.002). Conclusions: Among patients undergoing CABG, there was no significant difference between those receiving single internal-thoracic-artery grafts and those receiving bilateral internal-thoracic-artery grafts with regard to mortality or the rates of cardiovascular events at 5 years of follow-up. There were more sternal wound complications with bilateral internal-thoracic-artery grafting than with single internal-thoracic-artery grafting. Ten-year follow-up is ongoing. (Funded by the British Heart Foundation and others; ART Current Controlled Trials number, ISRCTN46552265.
- …