500 research outputs found

    Analysis of outsourcing data to the cloud using autonomous key generation

    Get PDF
    Cloud computing, a technology that enables users to store and manage their data at a low cost and high availability, has been emerging for the past few decades because of the many services it provides. One of the many services cloud computing provides to its users is data storage. The majority of the users of this service are still concerned to outsource their data due to the integrity and confidentiality issues, as well as performance and cost issues, that come along with it. These issues make it necessary to encrypt data prior to outsourcing it to the cloud. However, encrypting data prior to outsourcing makes searching the data obsolete, lowering the functionality of the cloud. Most existing cloud storage schemes often prioritize security over performance and functionality, or vice versa. In this thesis, the cloud storage service is explored, and the aspects of security, performance, and functionality are analyzed in order to investigate the trade-offs of the service. DSB-SEIS, a scheme with encryption intensity selection, an autonomous key generation algorithm that allows users to control the encryption intensity of their files, as well as other features is developed in order to find a balance between performance, security, and functionality. The features that DSB-SEIS contains are deduplication, assured deletion, and searchable encryption. The effect of encryption intensity selection on encryption, decryption, and key generation is explored, and the performance and security of DSB-SEIS are evaluated. The MapReduce framework is also used to investigate the DSB-SEIS algorithm performance with big data. Analysis demonstrates that the encryption intensity selection algorithm generates a manageable number of encryption keys based on the confidentiality of data while not adding significant overhead on encryption or decryption --Abstract, page iii

    Evidence based medical query system on large scale data

    Get PDF
    Title from PDF of title page, viewed on July 30, 2014Thesis advisor: Yugyung LeeVitaIncludes bibliographical references (pages 62-65)Thesis (M. S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2014As huge amounts of data are created rapidly, the demand for the integration and analysis of such data has been growing steadily. It is especially essential to retrieve relevant and accurate evidence in healthcare and biomedical research. Even though query systems based on Ontology, Medical Subject Headings (MeSH), or keyword searches are available, query systems based on evidence and effective retrieval of data from large collections of clinical data are not sufficiently available. This thesis proposes a novel approach to analyze big data sets collected from Clinical trials research and discover significant evidence and association patterns with respect to conditions, treatment, and medication side effects. Our approach makes use of machine learning techniques in the Apache Hadoop framework with support from MetaMap and RxNorm. In this thesis, a heuristic measure of empirical evidence was newly designed considering the association degree of conditions, treatment, and medication side effects and the percentage of people affected. The Apriori algorithm was used to discover strong positive association rules with various measures including support, and confidence. We have examined a large and complex data set (12,327 study results) from clinicaltrials.gov and identified 8,291 strong association rules and 59,228 combinations with 432,841 subjects, 1761 conditions, 2836 drugs, and 27 side effects. The significance of these association patterns was evaluated in terms of the impact factor representing the percentage of the population with a high rate of side effects. Using these association rules and combination strengths, an evidence based query system was implemented to answer some integral questions. This query system also provided an interface to retrieve relevant publications from PubMed. The searching outcomes from this query system are compared with those from the PubMed search based on medical subject headings.Abstract -- Illustrations -- Tables -- Introductions -- Related work -- Evidence based medical query model -- Implementation -- Results & Evaluation -- Conclusion and future work -- Reference

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
    • …
    corecore