1,021 research outputs found

    A Word Sense-Oriented User Interface for Interactive Multilingual Text Retrieval

    Get PDF
    In this paper we present an interface for supporting a user in an interactive cross-language search process using semantic classes. In order to enable users to access multilingual information, different problems have to be solved: disambiguating and translating the query words, as well as categorizing and presenting the results appropriately. Therefore, we first give a brief introduction to word sense disambiguation, cross-language text retrieval and document categorization and finally describe recent achievements of our research towards an interactive multilingual retrieval system. We focus especially on the problem of browsing and navigation of the different word senses in one source and possibly several target languages. In the last part of the paper, we discuss the developed user interface and its functionalities in more detail

    Evaluating the semantic web: a task-based approach

    Get PDF
    The increased availability of online knowledge has led to the design of several algorithms that solve a variety of tasks by harvesting the Semantic Web, i.e. by dynamically selecting and exploring a multitude of online ontologies. Our hypothesis is that the performance of such novel algorithms implicity provides an insight into the quality of the used ontologies and thus opens the way to a task-based evaluation of the Semantic Web. We have investigated this hypothesis by studying the lessons learnt about online ontologies when used to solve three tasks: ontology matching, folksonomy enrichment, and word sense disambiguation. Our analysis leads to a suit of conclusions about the status of the Semantic Web, which highlight a number of strengths and weaknesses of the semantic information available online and complement the findings of other analysis of the Semantic Web landscape

    Bridging the gap between folksonomies and the semantic web: an experience report

    Get PDF
    Abstract. While folksonomies allow tagging of similar resources with a variety of tags, their content retrieval mechanisms are severely hampered by being agnostic to the relations that exist between these tags. To overcome this limitation, several methods have been proposed to find groups of implicitly inter-related tags. We believe that content retrieval can be further improved by making the relations between tags explicit. In this paper we propose the semantic enrichment of folksonomy tags with explicit relations by harvesting the Semantic Web, i.e., dynamically selecting and combining relevant bits of knowledge from online ontologies. Our experimental results show that, while semantic enrichment needs to be aware of the particular characteristics of folksonomies and the Semantic Web, it is beneficial for both.

    From Word to Sense Embeddings: A Survey on Vector Representations of Meaning

    Get PDF
    Over the past years, distributed semantic representations have proved to be effective and flexible keepers of prior knowledge to be integrated into downstream applications. This survey focuses on the representation of meaning. We start from the theoretical background behind word vector space models and highlight one of their major limitations: the meaning conflation deficiency, which arises from representing a word with all its possible meanings as a single vector. Then, we explain how this deficiency can be addressed through a transition from the word level to the more fine-grained level of word senses (in its broader acceptation) as a method for modelling unambiguous lexical meaning. We present a comprehensive overview of the wide range of techniques in the two main branches of sense representation, i.e., unsupervised and knowledge-based. Finally, this survey covers the main evaluation procedures and applications for this type of representation, and provides an analysis of four of its important aspects: interpretability, sense granularity, adaptability to different domains and compositionality.Comment: 46 pages, 8 figures. Published in Journal of Artificial Intelligence Researc

    Applying Wikipedia to Interactive Information Retrieval

    Get PDF
    There are many opportunities to improve the interactivity of information retrieval systems beyond the ubiquitous search box. One idea is to use knowledge bases—e.g. controlled vocabularies, classification schemes, thesauri and ontologies—to organize, describe and navigate the information space. These resources are popular in libraries and specialist collections, but have proven too expensive and narrow to be applied to everyday webscale search. Wikipedia has the potential to bring structured knowledge into more widespread use. This online, collaboratively generated encyclopaedia is one of the largest and most consulted reference works in existence. It is broader, deeper and more agile than the knowledge bases put forward to assist retrieval in the past. Rendering this resource machine-readable is a challenging task that has captured the interest of many researchers. Many see it as a key step required to break the knowledge acquisition bottleneck that crippled previous efforts. This thesis claims that the roadblock can be sidestepped: Wikipedia can be applied effectively to open-domain information retrieval with minimal natural language processing or information extraction. The key is to focus on gathering and applying human-readable rather than machine-readable knowledge. To demonstrate this claim, the thesis tackles three separate problems: extracting knowledge from Wikipedia; connecting it to textual documents; and applying it to the retrieval process. First, we demonstrate that a large thesaurus-like structure can be obtained directly from Wikipedia, and that accurate measures of semantic relatedness can be efficiently mined from it. Second, we show that Wikipedia provides the necessary features and training data for existing data mining techniques to accurately detect and disambiguate topics when they are mentioned in plain text. Third, we provide two systems and user studies that demonstrate the utility of the Wikipedia-derived knowledge base for interactive information retrieval

    Exploratory Search on Mobile Devices

    Get PDF
    The goal of this thesis is to provide a general framework (MobEx) for exploratory search especially on mobile devices. The central part is the design, implementation, and evaluation of several core modules for on-demand unsupervised information extraction well suited for exploratory search on mobile devices and creating the MobEx framework. These core processing elements, combined with a multitouch - able user interface specially designed for two families of mobile devices, i.e. smartphones and tablets, have been finally implemented in a research prototype. The initial information request, in form of a query topic description, is issued online by a user to the system. The system then retrieves web snippets by using standard search engines. These snippets are passed through a chain of NLP components which perform an ondemand or ad-hoc interactive Query Disambiguation, Named Entity Recognition, and Relation Extraction task. By on-demand or ad-hoc we mean the components are capable to perform their operations on an unrestricted open domain within special time constraints. The result of the whole process is a topic graph containing the detected associated topics as nodes and the extracted relation ships as labelled edges between the nodes. The Topic Graph is presented to the user in different ways depending on the size of the device she is using. Various evaluations have been conducted that help us to understand the potentials and limitations of the framework and the prototype

    Clustering documents with active learning using Wikipedia

    Get PDF
    Wikipedia has been applied as a background knowledge base to various text mining problems, but very few attempts have been made to utilize it for document clustering. In this paper we propose to exploit the semantic knowledge in Wikipedia for clustering, enabling the automatic grouping of documents with similar themes. Although clustering is intrinsically unsupervised, recent research has shown that incorporating supervision improves clustering performance, even when limited supervision is provided. The approach presented in this paper applies supervision using active learning. We first utilize Wikipedia to create a concept-based representation of a text document, with each concept associated to a Wikipedia article. We then exploit the semantic relatedness between Wikipedia concepts to find pair-wise instance-level constraints for supervised clustering, guiding clustering towards the direction indicated by the constraints. We test our approach on three standard text document datasets. Empirical results show that our basic document representation strategy yields comparable performance to previous attempts; and adding constraints improves clustering performance further by up to 20%
    corecore