52,911 research outputs found

    Integrating multiple windows and document features for expert finding

    Get PDF
    Expert finding is a key task in enterprise search and has recently attracted lots of attention from both research and industry communities. Given a search topic, a prominent existing approach is to apply some information retrieval (IR) system to retrieve top ranking documents, which will then be used to derive associations between experts and the search topic based on cooccurrences. However, we argue that expert finding is more sensitive to multiple levels of associations and document features that current expert finding systems insufficiently address, including (a) multiple levels of associations between experts and search topics, (b) document internal structure, and (c) document authority. We propose a novel approach that integrates the above-mentioned three aspects as well as a query expansion technique in a two-stage model for expert finding. A systematic evaluation is conducted on TREC collections to test the performance of our approach as well as the effects of multiple windows, document features, and query expansion. These experimental results show that query expansion can dramatically improve expert finding performance with statistical significance. For three well-known IR models with or without query expansion, document internal structures help improve a single window-based approach but without statistical significance, while our novel multiple window-based approach can significantly improve the performance of a single window-based approach both with and without document internal structures

    Modeling document features for expert finding

    Get PDF
    We argue that expert finding is sensitive to multiple document features in an organization, and therefore, can benefit from the incorporation of these document features. We propose a unified language model, which integrates multiple document features, namely, multiple levels of associations, PageRank, indegree, internal document structure, and URL length. Our experiments on two TREC Enterprise Track collections, i.e., the W3C and CSIRO datasets, demonstrate that the natures of the two organizational intranets and two types of expert finding tasks, i.e., key contact finding for CSIRO and knowledgeable person finding for W3C, influence the effectiveness of different document features. Our work provides insights into which document features work for certain types of expert finding tasks, and helps design expert finding strategies that are effective for different scenarios

    Social Search with Missing Data: Which Ranking Algorithm?

    Get PDF
    Online social networking tools are extremely popular, but can miss potential discoveries latent in the social 'fabric'. Matchmaking services which can do naive profile matching with old database technology are too brittle in the absence of key data, and even modern ontological markup, though powerful, can be onerous at data-input time. In this paper, we present a system called BuddyFinder which can automatically identify buddies who can best match a user's search requirements specified in a term-based query, even in the absence of stored user-profiles. We deploy and compare five statistical measures, namely, our own CORDER, mutual information (MI), phi-squared, improved MI and Z score, and two TF/IDF based baseline methods to find online users who best match the search requirements based on 'inferred profiles' of these users in the form of scavenged web pages. These measures identify statistically significant relationships between online users and a term-based query. Our user evaluation on two groups of users shows that BuddyFinder can find users highly relevant to search queries, and that CORDER achieved the best average ranking correlations among all seven algorithms and improved the performance of both baseline methods

    The Open University at TREC 2007 Enterprise Track

    Get PDF
    The Multimedia and Information Systems group at the Knowledge Media Institute of the Open University participated in the Expert Search and Document Search tasks of the Enterprise Track in TREC 2007. In both the document and expert search tasks, we have studied the effect of anchor texts in addition to document contents, document authority, url length, query expansion, and relevance feedback in improving search effectiveness. In the expert search task, we have continued using a two-stage language model consisting of a document relevance and cooccurrence models. The document relevance model is equivalent to our approach in the document search task. We have used our innovative multiple-window-based cooccurrence approach. The assumption is that there are multiple levels of associations between an expert and his/her expertise. Our experimental results show that the introduction of additional features in addition to document contents has improved the retrieval effectiveness

    The OU Linked Open Data: production and consumption

    Get PDF
    The aim of this paper is to introduce the current efforts toward the release and exploitation of The Open University's (OU) Linked Open Data (LOD). We introduce the work that has been done within the LUCERO project in order to select, extract and structure subsets of information contained within the OU data sources and migrate and expose this information as part of the LOD cloud. To show the potential of such exposure we also introduce three different prototypes that exploit this new educational resource: (1) the OU expert search system, a tool focused on fnding the best experts for a certain topic within the OU staff; (2) the Buddy Study system, a tool that relies on Facebook information to identify common interest among friends and recommend potential courses within the OU that `buddies' can study together, and; (3) Linked OpenLearn, an application that enables exploring linked courses, Podcasts and tags to OpenLearn units. Its aim is to enhance the browsing experience for students, by detecting relevant educational resources on fly while reading an OpenLearn unit

    Identifying Unclear Questions in Community Question Answering Websites

    Get PDF
    Thousands of complex natural language questions are submitted to community question answering websites on a daily basis, rendering them as one of the most important information sources these days. However, oftentimes submitted questions are unclear and cannot be answered without further clarification questions by expert community members. This study is the first to investigate the complex task of classifying a question as clear or unclear, i.e., if it requires further clarification. We construct a novel dataset and propose a classification approach that is based on the notion of similar questions. This approach is compared to state-of-the-art text classification baselines. Our main finding is that the similar questions approach is a viable alternative that can be used as a stepping stone towards the development of supportive user interfaces for question formulation.Comment: Proceedings of the 41th European Conference on Information Retrieval (ECIR '19), 201

    A CD-ROM Based Agricultural Information Retrieval System

    Get PDF
    An information retrieval system for agricultural extension was developed using CD-ROM technology as the primary medium for information delivery. Object-oriented database techniques were used to organize the information. Conventional retrieval techniques including hypertext, fulltext searching, and relational databases, and decision support programs such as expert systems were integrated into a complete package for accessing information stored on the CDROM. A multimedia user interface was developed to provide a variety of capabilities including computer graphics and high-resolution digitized images. Information for the disk was gathered and entered using extension publications which were tagged using an SGML-based document markup language. The fully operational CD-ROM system has been implemented in all 67 county extension offices in Flori

    Factors shaping the evolution of electronic documentation systems

    Get PDF
    The main goal is to prepare the space station technical and managerial structure for likely changes in the creation, capture, transfer, and utilization of knowledge. By anticipating advances, the design of Space Station Project (SSP) information systems can be tailored to facilitate a progression of increasingly sophisticated strategies as the space station evolves. Future generations of advanced information systems will use increases in power to deliver environmentally meaningful, contextually targeted, interconnected data (knowledge). The concept of a Knowledge Base Management System is emerging when the problem is focused on how information systems can perform such a conversion of raw data. Such a system would include traditional management functions for large space databases. Added artificial intelligence features might encompass co-existing knowledge representation schemes; effective control structures for deductive, plausible, and inductive reasoning; means for knowledge acquisition, refinement, and validation; explanation facilities; and dynamic human intervention. The major areas covered include: alternative knowledge representation approaches; advanced user interface capabilities; computer-supported cooperative work; the evolution of information system hardware; standardization, compatibility, and connectivity; and organizational impacts of information intensive environments

    The relationship of (perceived) epistemic cognition to interaction with resources on the internet

    Get PDF
    Information seeking and processing are key literacy practices. However, they are activities that students, across a range of ages, struggle with. These information seeking processes can be viewed through the lens of epistemic cognition: beliefs regarding the source, justification, complexity, and certainty of knowledge. In the research reported in this article we build on established research in this area, which has typically used self-report psychometric and behavior data, and information seeking tasks involving closed-document sets. We take a novel approach in applying established self-report measures to a large-scale, naturalistic, study environment, pointing to the potential of analysis of dialogue, web-navigation тАУ including sites visited тАУ and other trace data, to support more traditional self-report mechanisms. Our analysis suggests that prior work demonstrating relationships between self-report indicators is not paralleled in investigation of the hypothesized relationships between self-report and trace-indicators. However, there are clear epistemic features of this trace data. The article thus demonstrates the potential of behavioral learning analytic data in understanding how epistemic cognition is brought to bear in rich information seeking and processing tasks
    • тАж
    corecore