71,468 research outputs found
Execution Performance Issues in Full-Text Information Retrieval
The task of an information retrieval system is to identify documents that will satisfy a user’s information need. Effective fulfillment of this task has long been an active area of research, leading to sophisticated retrieval models for representing information content in documents and queries and measuring similarity between the two. The maturity and proven effectiveness of these systems has resulted in demand for increased capacity, performance, scalability, and functionality, especially as information retrieval is integrated into more traditional database management environments. In this dissertation we explore a number of functionality and performance issues in information retrieval. First, we consider creation and modification of the document collection, concentrating on management of the inverted file index. An inverted file architecture based on a persistent object store is described and experimental results are presented for inverted file creation and modification. Our architecture provides performance that scales well with document collection size and the database features supported by the persistent object store provide many solutions to issues that arise during integration of information retrieval into vii more general database environments. We then turn to query evaluation speed and introduce a new optimization technique for statistical ranking retrieval systems that support structured queries. Experimental results from a variety of query sets show that execution time can be reduced by more than 50% with no noticeable impact on retrieval effectiveness, making these more complex retrieval models attractive alternatives for environments that demand high performance
Persistent Memory Programming Abstractions in Context of Concurrent Applications
The advent of non-volatile memory (NVM) technologies like PCM, STT,
memristors and Fe-RAM is believed to enhance the system performance by getting
rid of the traditional memory hierarchy by reducing the gap between memory and
storage. This memory technology is considered to have the performance like that
of DRAM and persistence like that of disks. Thus, it would also provide
significant performance benefits for big data applications by allowing
in-memory processing of large data with the lowest latency to persistence.
Leveraging the performance benefits of this memory-centric computing technology
through traditional memory programming is not trivial and the challenges
aggravate for parallel/concurrent applications. To this end, several
programming abstractions have been proposed like NVthreads, Mnemosyne and
intel's NVML. However, deciding upon a programming abstraction which is easier
to program and at the same time ensures the consistency and balances various
software and architectural trade-offs is openly debatable and active area of
research for NVM community.
We study the NVthreads, Mnemosyne and NVML libraries by building a concurrent
and persistent set and open addressed hash-table data structure application. In
this process, we explore and report various tradeoffs and hidden costs involved
in building concurrent applications for persistence in terms of achieving
efficiency, consistency and ease of programming with these NVM programming
abstractions. Eventually, we evaluate the performance of the set and hash-table
data structure applications. We observe that NVML is easiest to program with
but is least efficient and Mnemosyne is most performance friendly but involves
significant programming efforts to build concurrent and persistent
applications.Comment: Accepted in HiPC SRS 201
Elevating commodity storage with the SALSA host translation layer
To satisfy increasing storage demands in both capacity and performance,
industry has turned to multiple storage technologies, including Flash SSDs and
SMR disks. These devices employ a translation layer that conceals the
idiosyncrasies of their mediums and enables random access. Device translation
layers are, however, inherently constrained: resources on the drive are scarce,
they cannot be adapted to application requirements, and lack visibility across
multiple devices. As a result, performance and durability of many storage
devices is severely degraded.
In this paper, we present SALSA: a translation layer that executes on the
host and allows unmodified applications to better utilize commodity storage.
SALSA supports a wide range of single- and multi-device optimizations and,
because is implemented in software, can adapt to specific workloads. We
describe SALSA's design, and demonstrate its significant benefits using
microbenchmarks and case studies based on three applications: MySQL, the Swift
object store, and a video server.Comment: Presented at 2018 IEEE 26th International Symposium on Modeling,
Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS
On Constructing Persistent Identifiers with Persistent Resolution Targets
Persistent Identifiers (PID) are the foundation referencing digital assets in
scientific publications, books, and digital repositories. In its realization,
PIDs contain metadata and resolving targets in form of URLs that point to data
sets located on the network. In contrast to PIDs, the target URLs are typically
changing over time; thus, PIDs need continuous maintenance -- an effort that is
increasing tremendously with the advancement of e-Science and the advent of the
Internet-of-Things (IoT). Nowadays, billions of sensors and data sets are
subject of PID assignment. This paper presents a new approach of embedding
location independent targets into PIDs that allows the creation of
maintenance-free PIDs using content-centric network technology and overlay
networks. For proving the validity of the presented approach, the Handle PID
System is used in conjunction with Magnet Link access information encoding,
state-of-the-art decentralized data distribution with BitTorrent, and Named
Data Networking (NDN) as location-independent data access technology for
networks. Contrasting existing approaches, no green-field implementation of PID
or major modifications of the Handle System is required to enable
location-independent data dissemination with maintenance-free PIDs.Comment: Published IEEE paper of the FedCSIS 2016 (SoFAST-WS'16) conference,
11.-14. September 2016, Gdansk, Poland. Also available online:
http://ieeexplore.ieee.org/document/7733372
- …