1,343 research outputs found

    Continuous Top-k Queries over Real-Time Web Streams

    Get PDF
    The Web has become a large-scale real-time information system forcing us to revise both how to effectively assess relevance of information for a user and how to efficiently implement information retrieval and dissemination functionality. To increase information relevance, Real-time Web applications such as Twitter and Facebook, extend content and social-graph relevance scores with " real-time " user generated events (e.g. re-tweets, replies, likes). To accommodate high arrival rates of information items and user events we explore a pub-lish/subscribe paradigm in which we index queries and update on the fly their results each time a new item and relevant events arrive. In this setting, we need to process continuous top-k text queries combining both static and dynamic scores. To the best of our knowledge, this is the first work addressing how non-predictable, dynamic scores can be handled in a continuous top-k query setting

    Efficient clustering and document retrival by query keywords

    Get PDF
    User penchants are shown by a set of keywords. A central server monitors the document stream and continuously reports to each user the top-k documents that are most relevant to her keywords. Our unprejudiced is to backing large numbers of users and high stream rates, while energizing the top-k results almost instantly. Our clarification walks out on the customary frequency-ordered indexing approach. As an alternative, it trails an identifier-ordering paradigm that ensembles better the nature of the problem. When supplemented with a new, locally adaptive method, our method offers confirmed optimality the number of well-thought-out queries per stream event, and direction of extent shorter retort time than the contemporary state-of-the-art

    Continuous Top-k monitoring on document streams

    Get PDF

    The 1995 Science Information Management and Data Compression Workshop

    Get PDF
    This document is the proceedings from the 'Science Information Management and Data Compression Workshop,' which was held on October 26-27, 1995, at the NASA Goddard Space Flight Center, Greenbelt, Maryland. The Workshop explored promising computational approaches for handling the collection, ingestion, archival, and retrieval of large quantities of data in future Earth and space science missions. It consisted of fourteen presentations covering a range of information management and data compression approaches that are being or have been integrated into actual or prototypical Earth or space science data information systems, or that hold promise for such an application. The Workshop was organized by James C. Tilton and Robert F. Cromp of the NASA Goddard Space Flight Center

    Semantic Interaction in Web-based Retrieval Systems : Adopting Semantic Web Technologies and Social Networking Paradigms for Interacting with Semi-structured Web Data

    Get PDF
    Existing web retrieval models for exploration and interaction with web data do not take into account semantic information, nor do they allow for new forms of interaction by employing meaningful interaction and navigation metaphors in 2D/3D. This thesis researches means for introducing a semantic dimension into the search and exploration process of web content to enable a significantly positive user experience. Therefore, an inherently dynamic view beyond single concepts and models from semantic information processing, information extraction and human-machine interaction is adopted. Essential tasks for semantic interaction such as semantic annotation, semantic mediation and semantic human-computer interaction were identified and elaborated for two general application scenarios in web retrieval: Web-based Question Answering in a knowledge-based dialogue system and semantic exploration of information spaces in 2D/3D

    Entity-Oriented Search

    Get PDF
    This open access book covers all facets of entity-oriented search—where “search” can be interpreted in the broadest sense of information access—from a unified point of view, and provides a coherent and comprehensive overview of the state of the art. It represents the first synthesis of research in this broad and rapidly developing area. Selected topics are discussed in-depth, the goal being to establish fundamental techniques and methods as a basis for future research and development. Additional topics are treated at a survey level only, containing numerous pointers to the relevant literature. A roadmap for future research, based on open issues and challenges identified along the way, rounds out the book. The book is divided into three main parts, sandwiched between introductory and concluding chapters. The first two chapters introduce readers to the basic concepts, provide an overview of entity-oriented search tasks, and present the various types and sources of data that will be used throughout the book. Part I deals with the core task of entity ranking: given a textual query, possibly enriched with additional elements or structural hints, return a ranked list of entities. This core task is examined in a number of different variants, using both structured and unstructured data collections, and numerous query formulations. In turn, Part II is devoted to the role of entities in bridging unstructured and structured data. Part III explores how entities can enable search engines to understand the concepts, meaning, and intent behind the query that the user enters into the search box, and how they can provide rich and focused responses (as opposed to merely a list of documents)—a process known as semantic search. The final chapter concludes the book by discussing the limitations of current approaches, and suggesting directions for future research. Researchers and graduate students are the primary target audience of this book. A general background in information retrieval is sufficient to follow the material, including an understanding of basic probability and statistics concepts as well as a basic knowledge of machine learning concepts and supervised learning algorithms

    High Performance Computing for DNA Sequence Alignment and Assembly

    Get PDF
    Recent advances in DNA sequencing technology have dramatically increased the scale and scope of DNA sequencing. These data are used for a wide variety of important biological analyzes, including genome sequencing, comparative genomics, transcriptome analysis, and personalized medicine but are complicated by the volume and complexity of the data involved. Given the massive size of these datasets, computational biology must draw on the advances of high performance computing. Two fundamental computations in computational biology are read alignment and genome assembly. Read alignment maps short DNA sequences to a reference genome to discover conserved and polymorphic regions of the genome. Genome assembly computes the sequence of a genome from many short DNA sequences. Both computations benefit from recent advances in high performance computing to efficiently process the huge datasets involved, including using highly parallel graphics processing units (GPUs) as high performance desktop processors, and using the MapReduce framework coupled with cloud computing to parallelize computation to large compute grids. This dissertation demonstrates how these technologies can be used to accelerate these computations by orders of magnitude, and have the potential to make otherwise infeasible computations practical
    • …
    corecore