13,526 research outputs found
Towards a Universal Wordnet by Learning from Combined Evidenc
Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction of a large-scale multilingual lexical database where words of many languages are hierarchically organized in terms of their meanings and their semantic relations to other words. This resource is bootstrapped from WordNet, a well-known English-language resource. Our approach extends WordNet with around 1.5 million meaning links for 800,000 words in over 200 languages, drawing on evidence extracted from a variety of resources including existing (monolingual) wordnets, (mostly bilingual) translation dictionaries, and parallel corpora. Graph-based scoring functions and statistical learning techniques are used to iteratively integrate this information and build an output graph. Experiments show that this wordnet has a high level of precision and coverage, and that it can be useful in applied tasks such as cross-lingual text classification
On the Feasibility of Transfer-learning Code Smells using Deep Learning
Context: A substantial amount of work has been done to detect smells in
source code using metrics-based and heuristics-based methods. Machine learning
methods have been recently applied to detect source code smells; however, the
current practices are considered far from mature. Objective: First, explore the
feasibility of applying deep learning models to detect smells without extensive
feature engineering, just by feeding the source code in tokenized form. Second,
investigate the possibility of applying transfer-learning in the context of
deep learning models for smell detection. Method: We use existing metric-based
state-of-the-art methods for detecting three implementation smells and one
design smell in C# code. Using these results as the annotated gold standard, we
train smell detection models on three different deep learning architectures.
These architectures use Convolution Neural Networks (CNNs) of one or two
dimensions, or Recurrent Neural Networks (RNNs) as their principal hidden
layers. For the first objective of our study, we perform training and
evaluation on C# samples, whereas for the second objective, we train the models
from C# code and evaluate the models over Java code samples. We perform the
experiments with various combinations of hyper-parameters for each model.
Results: We find it feasible to detect smells using deep learning methods. Our
comparative experiments find that there is no clearly superior method between
CNN-1D and CNN-2D. We also observe that performance of the deep learning models
is smell-specific. Our transfer-learning experiments show that
transfer-learning is definitely feasible for implementation smells with
performance comparable to that of direct-learning. This work opens up a new
paradigm to detect code smells by transfer-learning especially for the
programming languages where the comprehensive code smell detection tools are
not available
Deepr: A Convolutional Net for Medical Records
Feature engineering remains a major bottleneck when creating predictive
systems from electronic medical records. At present, an important missing
element is detecting predictive regular clinical motifs from irregular episodic
records. We present Deepr (short for Deep record), a new end-to-end deep
learning system that learns to extract features from medical records and
predicts future risk automatically. Deepr transforms a record into a sequence
of discrete elements separated by coded time gaps and hospital transfers. On
top of the sequence is a convolutional neural net that detects and combines
predictive local clinical motifs to stratify the risk. Deepr permits
transparent inspection and visualization of its inner working. We validate
Deepr on hospital data to predict unplanned readmission after discharge. Deepr
achieves superior accuracy compared to traditional techniques, detects
meaningful clinical motifs, and uncovers the underlying structure of the
disease and intervention space
- …