71,468 research outputs found

    Execution Performance Issues in Full-Text Information Retrieval

    Get PDF
    The task of an information retrieval system is to identify documents that will satisfy a user’s information need. Effective fulfillment of this task has long been an active area of research, leading to sophisticated retrieval models for representing information content in documents and queries and measuring similarity between the two. The maturity and proven effectiveness of these systems has resulted in demand for increased capacity, performance, scalability, and functionality, especially as information retrieval is integrated into more traditional database management environments. In this dissertation we explore a number of functionality and performance issues in information retrieval. First, we consider creation and modification of the document collection, concentrating on management of the inverted file index. An inverted file architecture based on a persistent object store is described and experimental results are presented for inverted file creation and modification. Our architecture provides performance that scales well with document collection size and the database features supported by the persistent object store provide many solutions to issues that arise during integration of information retrieval into vii more general database environments. We then turn to query evaluation speed and introduce a new optimization technique for statistical ranking retrieval systems that support structured queries. Experimental results from a variety of query sets show that execution time can be reduced by more than 50% with no noticeable impact on retrieval effectiveness, making these more complex retrieval models attractive alternatives for environments that demand high performance

    Persistent Memory Programming Abstractions in Context of Concurrent Applications

    Full text link
    The advent of non-volatile memory (NVM) technologies like PCM, STT, memristors and Fe-RAM is believed to enhance the system performance by getting rid of the traditional memory hierarchy by reducing the gap between memory and storage. This memory technology is considered to have the performance like that of DRAM and persistence like that of disks. Thus, it would also provide significant performance benefits for big data applications by allowing in-memory processing of large data with the lowest latency to persistence. Leveraging the performance benefits of this memory-centric computing technology through traditional memory programming is not trivial and the challenges aggravate for parallel/concurrent applications. To this end, several programming abstractions have been proposed like NVthreads, Mnemosyne and intel's NVML. However, deciding upon a programming abstraction which is easier to program and at the same time ensures the consistency and balances various software and architectural trade-offs is openly debatable and active area of research for NVM community. We study the NVthreads, Mnemosyne and NVML libraries by building a concurrent and persistent set and open addressed hash-table data structure application. In this process, we explore and report various tradeoffs and hidden costs involved in building concurrent applications for persistence in terms of achieving efficiency, consistency and ease of programming with these NVM programming abstractions. Eventually, we evaluate the performance of the set and hash-table data structure applications. We observe that NVML is easiest to program with but is least efficient and Mnemosyne is most performance friendly but involves significant programming efforts to build concurrent and persistent applications.Comment: Accepted in HiPC SRS 201

    Elevating commodity storage with the SALSA host translation layer

    Full text link
    To satisfy increasing storage demands in both capacity and performance, industry has turned to multiple storage technologies, including Flash SSDs and SMR disks. These devices employ a translation layer that conceals the idiosyncrasies of their mediums and enables random access. Device translation layers are, however, inherently constrained: resources on the drive are scarce, they cannot be adapted to application requirements, and lack visibility across multiple devices. As a result, performance and durability of many storage devices is severely degraded. In this paper, we present SALSA: a translation layer that executes on the host and allows unmodified applications to better utilize commodity storage. SALSA supports a wide range of single- and multi-device optimizations and, because is implemented in software, can adapt to specific workloads. We describe SALSA's design, and demonstrate its significant benefits using microbenchmarks and case studies based on three applications: MySQL, the Swift object store, and a video server.Comment: Presented at 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS

    On Constructing Persistent Identifiers with Persistent Resolution Targets

    Get PDF
    Persistent Identifiers (PID) are the foundation referencing digital assets in scientific publications, books, and digital repositories. In its realization, PIDs contain metadata and resolving targets in form of URLs that point to data sets located on the network. In contrast to PIDs, the target URLs are typically changing over time; thus, PIDs need continuous maintenance -- an effort that is increasing tremendously with the advancement of e-Science and the advent of the Internet-of-Things (IoT). Nowadays, billions of sensors and data sets are subject of PID assignment. This paper presents a new approach of embedding location independent targets into PIDs that allows the creation of maintenance-free PIDs using content-centric network technology and overlay networks. For proving the validity of the presented approach, the Handle PID System is used in conjunction with Magnet Link access information encoding, state-of-the-art decentralized data distribution with BitTorrent, and Named Data Networking (NDN) as location-independent data access technology for networks. Contrasting existing approaches, no green-field implementation of PID or major modifications of the Handle System is required to enable location-independent data dissemination with maintenance-free PIDs.Comment: Published IEEE paper of the FedCSIS 2016 (SoFAST-WS'16) conference, 11.-14. September 2016, Gdansk, Poland. Also available online: http://ieeexplore.ieee.org/document/7733372
    • …
    corecore