9,144 research outputs found

    Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

    Full text link
    Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

    Icarus: Towards a Multistore Database System

    Get PDF
    The last years have seen a vast diversification on the database market. In contrast to the "one-size-fits-all" paradigm according to which systems have been designed in the past, today's database management systems (DBMSs) are tuned for particular workloads. This has led to DBMSs optimized for high performance, high throughput read/write workload in online transaction processing (OLTP) and systems optimized for complex analytical queries (OLAP). However, this approach reaches a limit when systems have to deal with mixed workloads that are neither pure OLAP nor pure OLTP workloads. In such cases, polystores are increasingly gaining popularity. Rather than supporting one single database paradigm and addressing one particular workload, polystores encompass several DBMSs that store data in different schemas and allow to route requests at a per-query-level to the most appropriate system. In this paper, we introduce the polystore Icarus. In our evaluation based on a workload that combines OLTP and OLAP elements, We show that Icarus is able to speed-up queries up to a factor of 3 by properly routing queries to the best underlying DBMS

    On the expressiveness and trade-offs of large scale tuple stores

    Get PDF
    Proceedings of On the Move to Meaningful Internet Systems (OTM)Massive-scale distributed computing is a challenge at our doorstep. The current exponential growth of data calls for massive-scale capabilities of storage and processing. This is being acknowledged by several major Internet players embracing the cloud computing model and offering first generation distributed tuple stores. Having all started from similar requirements, these systems ended up providing a similar service: A simple tuple store interface, that allows applications to insert, query, and remove individual elements. Further- more, while availability is commonly assumed to be sustained by the massive scale itself, data consistency and freshness is usually severely hindered. By doing so, these services focus on a specific narrow trade-off between consistency, availability, performance, scale, and migration cost, that is much less attractive to common business needs. In this paper we introduce DataDroplets, a novel tuple store that shifts the current trade-off towards the needs of common business users, pro- viding additional consistency guarantees and higher level data process- ing primitives smoothing the migration path for existing applications. We present a detailed comparison between DataDroplets and existing systems regarding their data model, architecture and trade-offs. Prelim- inary results of the system's performance under a realistic workload are also presented

    Equivalence-based Security for Querying Encrypted Databases: Theory and Application to Privacy Policy Audits

    Full text link
    Motivated by the problem of simultaneously preserving confidentiality and usability of data outsourced to third-party clouds, we present two different database encryption schemes that largely hide data but reveal enough information to support a wide-range of relational queries. We provide a security definition for database encryption that captures confidentiality based on a notion of equivalence of databases from the adversary's perspective. As a specific application, we adapt an existing algorithm for finding violations of privacy policies to run on logs encrypted under our schemes and observe low to moderate overheads.Comment: CCS 2015 paper technical report, in progres

    A Review of Atrial Fibrillation Detection Methods as a Service

    Get PDF
    Atrial Fibrillation (AF) is a common heart arrhythmia that often goes undetected, and even if it is detected, managing the condition may be challenging. In this paper, we review how the RR interval and Electrocardiogram (ECG) signals, incorporated into a monitoring system, can be useful to track AF events. Were such an automated system to be implemented, it could be used to help manage AF and thereby reduce patient morbidity and mortality. The main impetus behind the idea of developing a service is that a greater data volume analyzed can lead to better patient outcomes. Based on the literature review, which we present herein, we introduce the methods that can be used to detect AF efficiently and automatically via the RR interval and ECG signals. A cardiovascular disease monitoring service that incorporates one or multiple of these detection methods could extend event observation to all times, and could therefore become useful to establish any AF occurrence. The development of an automated and efficient method that monitors AF in real time would likely become a key component for meeting public health goals regarding the reduction of fatalities caused by the disease. Yet, at present, significant technological and regulatory obstacles remain, which prevent the development of any proposed system. Establishment of the scientific foundation for monitoring is important to provide effective service to patients and healthcare professionals
    • …
    corecore