29,250 research outputs found
Ontology of core data mining entities
In this article, we present OntoDM-core, an ontology of core data mining
entities. OntoDM-core defines themost essential datamining entities in a three-layered
ontological structure comprising of a specification, an implementation and an application
layer. It provides a representational framework for the description of mining
structured data, and in addition provides taxonomies of datasets, data mining tasks,
generalizations, data mining algorithms and constraints, based on the type of data.
OntoDM-core is designed to support a wide range of applications/use cases, such as
semantic annotation of data mining algorithms, datasets and results; annotation of
QSAR studies in the context of drug discovery investigations; and disambiguation of
terms in text mining. The ontology has been thoroughly assessed following the practices
in ontology engineering, is fully interoperable with many domain resources and
is easy to extend
Rationale in Development Chat Messages: An Exploratory Study
Chat messages of development teams play an increasingly significant role in
software development, having replaced emails in some cases. Chat messages
contain information about discussed issues, considered alternatives and
argumentation leading to the decisions made during software development. These
elements, defined as rationale, are invaluable during software evolution for
documenting and reusing development knowledge. Rationale is also essential for
coping with changes and for effective maintenance of the software system.
However, exploiting the rationale hidden in the chat messages is challenging
due to the high volume of unstructured messages covering a wide range of
topics. This work presents the results of an exploratory study examining the
frequency of rationale in chat messages, the completeness of the available
rationale and the potential of automatic techniques for rationale extraction.
For this purpose, we apply content analysis and machine learning techniques on
more than 8,700 chat messages from three software development projects. Our
results show that chat messages are a rich source of rationale and that machine
learning is a promising technique for detecting rationale and identifying
different rationale elements.Comment: 11 pages, 6 figures. The 14th International Conference on Mining
Software Repositories (MSR'17
Explicit Interaction Model towards Text Classification
Text classification is one of the fundamental tasks in natural language
processing. Recently, deep neural networks have achieved promising performance
in the text classification task compared to shallow models. Despite of the
significance of deep models, they ignore the fine-grained (matching signals
between words and classes) classification clues since their classifications
mainly rely on the text-level representations. To address this problem, we
introduce the interaction mechanism to incorporate word-level matching signals
into the text classification task. In particular, we design a novel framework,
EXplicit interAction Model (dubbed as EXAM), equipped with the interaction
mechanism. We justified the proposed approach on several benchmark datasets
including both multi-label and multi-class text classification tasks. Extensive
experimental results demonstrate the superiority of the proposed method. As a
byproduct, we have released the codes and parameter settings to facilitate
other researches.Comment: 8 page
AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content Analysis
Recently, sound recognition has been used to identify sounds, such as car and
river. However, sounds have nuances that may be better described by
adjective-noun pairs such as slow car, and verb-noun pairs such as flying
insects, which are under explored. Therefore, in this work we investigate the
relation between audio content and both adjective-noun pairs and verb-noun
pairs. Due to the lack of datasets with these kinds of annotations, we
collected and processed the AudioPairBank corpus consisting of a combined total
of 1,123 pairs and over 33,000 audio files. One contribution is the previously
unavailable documentation of the challenges and implications of collecting
audio recordings with these type of labels. A second contribution is to show
the degree of correlation between the audio content and the labels through
sound recognition experiments, which yielded results of 70% accuracy, hence
also providing a performance benchmark. The results and study in this paper
encourage further exploration of the nuances in audio and are meant to
complement similar research performed on images and text in multimedia
analysis.Comment: This paper is a revised version of "AudioSentibank: Large-scale
Semantic Ontology of Acoustic Concepts for Audio Content Analysis
- …