6,064 research outputs found
An effective, low-cost measure of semantic relatedness obtained from Wikipedia links
This paper describes a new technique for obtaining measures of semantic relatedness. Like other recent approaches, it uses Wikipedia to provide structured world knowledge about the terms of interest. Out approach is unique in that it does so using the hyperlink structure of Wikipedia rather than its category hierarchy or textual content. Evaluation with manually defined measures of semantic relatedness reveals this to be an effective compromise between the ease of computation of the former approach and the accuracy of the latter
Mining Domain-Specific Thesauri from Wikipedia: A case study
Domain-specific thesauri are high-cost, high-maintenance, high-value knowledge structures. We show how the classic thesaurus structure of terms and links can be mined automatically from Wikipedia. In a comparison with a professional thesaurus for agriculture we find that Wikipedia contains a substantial proportion of its concepts and semantic relations; furthermore it has impressive coverage of contemporary documents in the domain. Thesauri derived using our techniques capitalize on existing public efforts and tend to reflect contemporary language usage better than their costly, painstakingly-constructed manual counterparts
A competitive environment for exploratory query expansion
Most information workers query digital libraries many times a day. Yet people have little opportunity to hone their skills in a controlled environment, or compare their performance with others in an objective way. Conversely, although search engine logs record how users evolve queries, they lack crucial information about the user's intent. This paper describes an environment for exploratory query expansion that pits users against each other and lets them compete, and practice, in their own time and on their own workstation. The system captures query evolution behavior on predetermined information-seeking tasks. It is publicly available, and the code is open source so that others can set up their own competitive environments
Extracting corpus specific knowledge bases from Wikipedia
Thesauri are useful knowledge structures for assisting information retrieval. Yet their production is labor-intensive, and few domains have comprehensive thesauri that cover domain-specific concepts and contemporary usage. One approach, which has been attempted without much success for decades, is to seek statistical natural language processing algorithms that work on free text. Instead, we propose to replace costly professional indexers with thousands of dedicated amateur volunteers--namely, those that are producing Wikipedia. This vast, open encyclopedia represents a rich tapestry of topics and semantics and a huge investment of human effort and judgment. We show how this can be directly exploited to provide WikiSauri: manually-defined yet inexpensive thesaurus structures that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We also offer concrete evidence of the effectiveness of WikiSauri for assisting information retrieval
Emergent charge ordering in near half doped NaCoO
We have utilized neutron powder diffraction to probe the crystal structure of
layered NaCoO near the half doping composition of 0.46 over the
temperature range of 2 to 600K. Our measurements show evidence of a dynamic
transition in the motion of Na-ions at 300K which coincides with the onset of a
near zero thermal expansion in the in-plane lattice constants. The effect of
the Na-ordering on the CoO layer is reflected in the octahedral
distortion of the two crystallographically inequivalent Co-sites and is evident
even at high temperatures. We find evidence of a weak charge separation into
stripes of Co and Co,
below \Tco=150K. We argue that changes in the Na(1)-O bond lengths observed at
the magnetic transition at \tm=88K reflect changes in the electronic state of
the CoO layerComment: 7 pages, 6 figures, in press Phys. Rev.
Clustering documents with active learning using Wikipedia
Wikipedia has been applied as a background knowledge base to various text mining problems, but very few attempts have been made to utilize it for document clustering. In this paper we propose to exploit the semantic knowledge in Wikipedia for clustering, enabling the automatic grouping of documents with similar themes. Although clustering is intrinsically unsupervised, recent research has shown that incorporating supervision improves clustering performance, even when limited supervision is provided. The approach presented in this paper applies supervision using active learning. We first utilize Wikipedia to create a concept-based representation of a text document, with each concept associated to a Wikipedia article. We then exploit the semantic relatedness between Wikipedia concepts to find pair-wise instance-level constraints for supervised clustering, guiding clustering towards the direction indicated by the constraints. We test our approach on three standard text document datasets. Empirical results show that our basic document representation strategy yields comparable performance to previous attempts; and adding constraints improves clustering performance further by up to 20%
Applying Wikipedia to Interactive Information Retrieval
There are many opportunities to improve the interactivity of information retrieval systems beyond the ubiquitous search box. One idea is to use knowledge bases—e.g. controlled vocabularies, classification schemes, thesauri and ontologies—to organize, describe and navigate the information space. These resources are popular in libraries and specialist collections, but have proven too expensive and narrow to be applied to everyday webscale search. Wikipedia has the potential to bring structured knowledge into more widespread use. This online, collaboratively generated encyclopaedia is one of the largest and most consulted reference works in existence. It is broader, deeper and more agile than the knowledge bases put forward to assist retrieval in the past. Rendering this resource machine-readable is a challenging task that has captured the interest of many researchers. Many see it as a key step required to break the knowledge acquisition bottleneck that crippled previous efforts. This thesis claims that the roadblock can be sidestepped: Wikipedia can be applied effectively to open-domain information retrieval with minimal natural language processing or information extraction. The key is to focus on gathering and applying human-readable rather than machine-readable knowledge. To demonstrate this claim, the thesis tackles three separate problems: extracting knowledge from Wikipedia; connecting it to textual documents; and applying it to the retrieval process. First, we demonstrate that a large thesaurus-like structure can be obtained directly from Wikipedia, and that accurate measures of semantic relatedness can be efficiently mined from it. Second, we show that Wikipedia provides the necessary features and training data for existing data mining techniques to accurately detect and disambiguate topics when they are mentioned in plain text. Third, we provide two systems and user studies that demonstrate the utility of the Wikipedia-derived knowledge base for interactive information retrieval
Recommended from our members
Power and politics in requirements engineering: embracing the dark side?
This vision paper considers the role of power and politics in requirements engineering (RE). It offers a working definition of both terms and reviews the existing literature both in RE and related disciplines. It argues that, given the increased complexity, uncertainty and organisational embeddedness faced by RE in practice, power and politics have become increasingly relevant factors that have not been adequately considered. Building upon recent relevant research, a research agenda is proposed that presents a methodological framework which examines power and politics through the structure of power relations and the process of decision-making. This framework will require validation through empirical research as a first step to developing models of power and politics that could be of practical use for RE. Although the potential problems faced by the study of power and politics in an RE context are acknowledged, it is argued that the potential benefits could be significant
Recommended from our members
Requirements Engineering as Creative Problem Solving: A Research Agenda for Idea Finding
This vision paper frames requirements engineering as a creative problem solving process. Its purpose is to enable requirements researchers and practitioners to recruit relevant theories, models, techniques and tools from creative problem solving to understand and support requirements processes more effectively. It uses 4 drivers to motivate the case for requirements engineering as a creative problem solving process. It then maps established requirements activities onto one of the longest-established creative problem solving processes, and uses these mappings to locate opportunities for the application of creative problem solving in requirements engineering. The second half of the paper describes selected creativity theories, techniques, software tools and training that can be adopted to improve requirements engineering research and practice. The focus is on support for problem and idea finding - two creative problem solving processes that our investigation revealed are poorly supported in requirements engineering. The paper ends with a research agenda to incorporate creative processes, techniques, training and tools in requirements projects
- …