268,291 research outputs found

    An Exponentiation Method for XML Element Retrieval

    Get PDF
    XML document is now widely used for modelling and storing structured documents. The structure is very rich and carries important information about contents and their relationships, for example, e-Commerce. XML data-centric collections require query terms allowing users to specify constraints on the document structure; mapping structure queries and assigning the weight are significant for the set of possibly relevant documents with respect to structural conditions. In this paper, we present an extension to the MEXIR search system that supports the combination of structural and content queries in the form of content-and-structure queries, which we call the Exponentiation function. It has been shown the structural information improve the effectiveness of the search system up to 52.60% over the baseline BM25 at MAP

    Combining relevance information in a synchronous collaborative information retrieval environment

    Get PDF
    Traditionally information retrieval (IR) research has focussed on a single user interaction modality, where a user searches to satisfy an information need. Recent advances in both web technologies, such as the sociable web of Web 2.0, and computer hardware, such as tabletop interface devices, have enabled multiple users to collaborate on many computer-related tasks. Due to these advances there is an increasing need to support two or more users searching together at the same time, in order to satisfy a shared information need, which we refer to as Synchronous Collaborative Information Retrieval. Synchronous Collaborative Information Retrieval (SCIR) represents a significant paradigmatic shift from traditional IR systems. In order to support an effective SCIR search, new techniques are required to coordinate users' activities. In this chapter we explore the effectiveness of a sharing of knowledge policy on a collaborating group. Sharing of knowledge refers to the process of passing relevance information across users, if one user finds items of relevance to the search task then the group should benefit in the form of improved ranked lists returned to each searcher. In order to evaluate the proposed techniques we simulate two users searching together through an incremental feedback system. The simulation assumes that users decide on an initial query with which to begin the collaborative search and proceed through the search by providing relevance judgments to the system and receiving a new ranked list. In order to populate these simulations we extract data from the interaction logs of various experimental IR systems from previous Text REtrieval Conference (TREC) workshops

    Trading Data for Discounts: An Exploration of Unstructured Data Through Machine Learning in Wearable Technology

    Get PDF
    The development of computing sensor devices with the capability of tracking an individual’s activity changed the way we live and move. The data collected and generated from wearable technology provides implications to the user for leading a healthy, more active lifestyle; however, the potential data uses extend beyond the user. Significant opportunity exists in the insurance industry as it relates to discounting premiums. The purpose of this research was to provide insight as to whether insurance companies should consider offering discount on premiums for policyholders who use wearable technology to track their personal fitness by identifying and suggesting potential groups of consumers to target these discounts toward. Using the platform R, researchers collected and analyzed tweets about four leading wearable technology companies including Fitbit, Jawbone, Misfit, and Withings. Both unsupervised and supervised learning techniques were pursued during the study in the form of topic modeling and artificial intelligence. Through detailed analysis, researchers determined that companies may want to consider reducing premiums for wearable technology users who use the devices for weight loss, as it would benefit both policyholders and insurance companies

    Term-Specific Eigenvector-Centrality in Multi-Relation Networks

    Get PDF
    Fuzzy matching and ranking are two information retrieval techniques widely used in web search. Their application to structured data, however, remains an open problem. This article investigates how eigenvector-centrality can be used for approximate matching in multi-relation graphs, that is, graphs where connections of many different types may exist. Based on an extension of the PageRank matrix, eigenvectors representing the distribution of a term after propagating term weights between related data items are computed. The result is an index which takes the document structure into account and can be used with standard document retrieval techniques. As the scheme takes the shape of an index transformation, all necessary calculations are performed during index tim

    Finding the right answer: an information retrieval approach supporting knowledge sharing

    Get PDF
    Knowledge Management can be defined as the effective strategies to get the right piece of knowledge to the right person in the right time. Having the main purpose of providing users with information items of their interest, recommender systems seem to be quite valuable for organizational knowledge management environments. Here we present KARe (Knowledgeable Agent for Recommendations), a multiagent recommender system that supports users sharing knowledge in a peer-to-peer environment. Central to this work is the assumption that social interaction is essential for the creation and dissemination of new knowledge. Supporting social interaction, KARe allows users to share knowledge through questions and answers. This paper describes KARe�s agent-oriented architecture and presents its recommendation algorithm

    TopicViz: Semantic Navigation of Document Collections

    Full text link
    When people explore and manage information, they think in terms of topics and themes. However, the software that supports information exploration sees text at only the surface level. In this paper we show how topic modeling -- a technique for identifying latent themes across large collections of documents -- can support semantic exploration. We present TopicViz, an interactive environment for information exploration. TopicViz combines traditional search and citation-graph functionality with a range of novel interactive visualizations, centered around a force-directed layout that links documents to the latent themes discovered by the topic model. We describe several use scenarios in which TopicViz supports rapid sensemaking on large document collections

    Initiating organizational memories using ontology-based network analysis as a bootstrapping tool

    Get PDF
    An important problem for many kinds of knowledge systems is their initial set-up. It is difficult to choose the right information to include in such systems, and the right information is also a prerequisite for maximizing the uptake and relevance. To tackle this problem, most developers adopt heavyweight solutions and rely on a faithful continuous interaction with users to create and improve content. In this paper, we explore the use of an automatic, lightweight ontology-based solution to the bootstrapping problem, in which domain-describing ontologies are analysed to uncover significant yet implicit relationships between instances. We illustrate the approach by using such an analysis to provide content automatically for the initial set-up of an organizational memory

    Is Kelly Shifting Under Google’s Feet? New Ninth Circuit Impact on the Google Library Project Litigation

    Get PDF
    The Google Library Project presents what many consider to be the perfect fair-use problem. The legal debate surrounding the Library Project has centered on the Ninth Circuit’s Kelly v. Arriba Soft. Yet recent case law presents new arguments for both sides of the Library Project litigation. This iBrief analyzes two Ninth Circuit district court decisions on fair use, Field v. Google, Inc. and Perfect 10 v. Google, Inc., and their impact on the Library Project litigation

    Extracting corpus specific knowledge bases from Wikipedia

    Get PDF
    Thesauri are useful knowledge structures for assisting information retrieval. Yet their production is labor-intensive, and few domains have comprehensive thesauri that cover domain-specific concepts and contemporary usage. One approach, which has been attempted without much success for decades, is to seek statistical natural language processing algorithms that work on free text. Instead, we propose to replace costly professional indexers with thousands of dedicated amateur volunteers--namely, those that are producing Wikipedia. This vast, open encyclopedia represents a rich tapestry of topics and semantics and a huge investment of human effort and judgment. We show how this can be directly exploited to provide WikiSauri: manually-defined yet inexpensive thesaurus structures that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We also offer concrete evidence of the effectiveness of WikiSauri for assisting information retrieval