99,586 research outputs found
Ontology alignment based on word embedding and random forest classification.
Ontology alignment is crucial for integrating heterogeneous data sources and forms an important component for realising the goals of the semantic web. Accordingly, several ontology alignment techniques have been proposed and used for discovering correspondences between the concepts (or entities) of different ontologies. However, these techniques mostly depend on string-based similarities which are unable to handle the vocabulary mismatch problem. Also, determining which similarity measures to use and how to effectively combine them in alignment systems are challenges that have persisted in this area. In this work, we introduce a random forest classifier approach for ontology alignment which relies on word embedding to discover semantic similarities between concepts. Specifically, we combine string-based and semantic similarity measures to form feature vectors that are used by the classifier model to determine when concepts match. By harnessing background knowledge and relying on minimal information from the ontologies, our approach can deal with knowledge-light ontological resources. It also eliminates the need for learning the aggregation weights of multiple similarity measures. Our experiments using Ontology Alignment Evaluation Initiative (OAEI) dataset and real-world ontologies highlight the utility of our approach and show that it can outperform state-of-the-art alignment systems
Learning to select data for transfer learning with Bayesian Optimization
Domain similarity measures can be used to gauge adaptability and select
suitable data for transfer learning, but existing approaches define ad hoc
measures that are deemed suitable for respective tasks. Inspired by work on
curriculum learning, we propose to \emph{learn} data selection measures using
Bayesian Optimization and evaluate them across models, domains and tasks. Our
learned measures outperform existing domain similarity measures significantly
on three tasks: sentiment analysis, part-of-speech tagging, and parsing. We
show the importance of complementing similarity with diversity, and that
learned measures are -- to some degree -- transferable across models, domains,
and even tasks.Comment: EMNLP 2017. Code available at:
https://github.com/sebastianruder/learn-to-select-dat
Adaptive image retrieval using a graph model for semantic feature integration
The variety of features available to represent multimedia data constitutes a rich pool of information. However, the plethora of data poses a challenge in terms of feature selection and integration for effective retrieval. Moreover, to further improve effectiveness, the
retrieval model should ideally incorporate context-dependent feature representations to allow for retrieval on a higher semantic level. In this paper we present a retrieval model and learning framework for the purpose of interactive information retrieval. We describe
how semantic relations between multimedia objects based on user interaction can be learnt and then integrated with visual and textual features into a unified framework. The framework models both feature similarities and semantic relations in a single graph. Querying in this model is implemented using the theory of random walks. In addition, we present ideas to implement short-term learning from relevance feedback. Systematic experimental results validate the effectiveness of the proposed approach for image retrieval. However, the model is not restricted to the image domain and could easily be employed for retrieving multimedia data (and even a combination of different domains, eg images, audio and text documents)
Using entropy-based local weighting to improve similarity assessment
This paper enhances and analyses the power of local weighted similarity measures. The paper proposes a new entropy-based local weighting algorithm to be used in similarity assessment to improve the performance of the CBR retrieval task. It has been carried out a comparative analysis of the performance of unweighted similarity measures, global weighted similarity measures, and local weighting similarity measures. The testing has been done using several similarity measures, and some data sets from the UCI Machine Learning Database Repository and other environmental databases.Postprint (published version
A Systematic Comparison of Music Similarity Adaptation Approaches
In order to support individual user perspectives and different retrieval tasks, music similarity can no longer be considered as a static element of Music Information Retrieval (MIR) systems. Various approaches have been proposed recently that allow dynamic adaptation of music similarity measures. This paper provides a systematic comparison of algorithms for metric learning and higher-level facet distance weighting on the MagnaTagATune dataset. A crossvalidation variant taking into account clip availability is presented. Applied on user generated similarity data, its effect on adaptation performance is analyzed. Special attention is paid to the amount of training data necessary for making similarity predictions on unknown data, the number of model parameters and the amount of information available about the music itself. 1
Semantic Sort: A Supervised Approach to Personalized Semantic Relatedness
We propose and study a novel supervised approach to learning statistical
semantic relatedness models from subjectively annotated training examples. The
proposed semantic model consists of parameterized co-occurrence statistics
associated with textual units of a large background knowledge corpus. We
present an efficient algorithm for learning such semantic models from a
training sample of relatedness preferences. Our method is corpus independent
and can essentially rely on any sufficiently large (unstructured) collection of
coherent texts. Moreover, the approach facilitates the fitting of semantic
models for specific users or groups of users. We present the results of
extensive range of experiments from small to large scale, indicating that the
proposed method is effective and competitive with the state-of-the-art.Comment: 37 pages, 8 figures A short version of this paper was already
published at ECML/PKDD 201
- âŠ