57,107 research outputs found
Computational Approaches to Measuring the Similarity of Short Contexts : A Review of Applications and Methods
Measuring the similarity of short written contexts is a fundamental problem
in Natural Language Processing. This article provides a unifying framework by
which short context problems can be categorized both by their intended
application and proposed solution. The goal is to show that various problems
and methodologies that appear quite different on the surface are in fact very
closely related. The axes by which these categorizations are made include the
format of the contexts (headed versus headless), the way in which the contexts
are to be measured (first-order versus second-order similarity), and the
information used to represent the features in the contexts (micro versus macro
views). The unifying thread that binds together many short context applications
and methods is the fact that similarity decisions must be made between contexts
that share few (if any) words in common.Comment: 23 page
Clustering documents with active learning using Wikipedia
Wikipedia has been applied as a background knowledge base to various text mining problems, but very few attempts have been made to utilize it for document clustering. In this paper we propose to exploit the semantic knowledge in Wikipedia for clustering, enabling the automatic grouping of documents with similar themes. Although clustering is intrinsically unsupervised, recent research has shown that incorporating supervision improves clustering performance, even when limited supervision is provided. The approach presented in this paper applies supervision using active learning. We first utilize Wikipedia to create a concept-based representation of a text document, with each concept associated to a Wikipedia article. We then exploit the semantic relatedness between Wikipedia concepts to find pair-wise instance-level constraints for supervised clustering, guiding clustering towards the direction indicated by the constraints. We test our approach on three standard text document datasets. Empirical results show that our basic document representation strategy yields comparable performance to previous attempts; and adding constraints improves clustering performance further by up to 20%
Meaning-focused and Quantum-inspired Information Retrieval
In recent years, quantum-based methods have promisingly integrated the
traditional procedures in information retrieval (IR) and natural language
processing (NLP). Inspired by our research on the identification and
application of quantum structures in cognition, more specifically our work on
the representation of concepts and their combinations, we put forward a
'quantum meaning based' framework for structured query retrieval in text
corpora and standardized testing corpora. This scheme for IR rests on
considering as basic notions, (i) 'entities of meaning', e.g., concepts and
their combinations and (ii) traces of such entities of meaning, which is how
documents are considered in this approach. The meaning content of these
'entities of meaning' is reconstructed by solving an 'inverse problem' in the
quantum formalism, consisting of reconstructing the full states of the entities
of meaning from their collapsed states identified as traces in relevant
documents. The advantages with respect to traditional approaches, such as
Latent Semantic Analysis (LSA), are discussed by means of concrete examples.Comment: 11 page
- ā¦