9 research outputs found

    QUERY PERFORMANCE EVALUATION OVER HEALTH DATA

    Get PDF
    International audienceIn recent years, there has been a significant increase in the number and variety of application scenarios studied under the e-health. Each application generates an immense data that is growing constantly. In this context, it becomes an important challenge to store and analyze the data efficiently and economically via conventional database management tools. The traditional relational database systems may sometimes not answer the requirements of the increased type, volume, velocity and dynamic structure of the new datasets. Effective healthcare data management and its transformation into information/knowledge are therefore challenging issues. So, organizations especially hospitals and medical centers that deal with immense data, either have to purchase new systems or re-tool what they already have. The new data models so-called NOSQL, its management tool Hadoop Distributed File Systems is replacing RDBMs especially in real-time healthcare data analytics processes. It becomes a real challenge to perform complex reporting in these applications as the size of the data grows exponentially. Along with that, there is customers demand complex analysis and reporting on those data. Compared to the traditional DBs, Hadoop Framework is designed to process a large volume of data. In this study, we examine the query performance of a traditional DBs and Big Data platforms on healthcare data. In this paper, we try to explore whether it is really necessary to invest on big data environment to run queries on the high volume data or this can also be done with the current relational database management systems and their supporting hardware infrastructure. We present our experience and a comprehensive performance evaluation of data management systems in the context of application performance

    Pemanfaatan Database Kependudukan Terdistribusi pada Ragam Aplikasi Sistem Informasi di Pemerintah Kabupaten/kota

    Full text link
    Government of Indonesia in this regard Kemendagri is currently implementing an e-ID card (e-KTP) program that was developed by the Single Identification Number (SIN). The program is expected to generate an accurate database of the national population. Availability of better population database will provide the maximum benefit if it can be utilized in various applications of information systems at the regency. Distributed database population is a potential development of e-Gov better, through the development of various primary and secondary/derivative information systems with integrated of database, middleware, and applications which developed by web service technologies. Evaluation of system performance continuously also need to be done as part of the process in the life cycle of information systems

    PEMANFAATAN DATABASE KEPENDUDUKAN TERDISTRIBUSI PADA RAGAM APLIKASI SISTEM INFORMASI DI PEMERINTAH KABUPATEN/KOTA

    Get PDF
    Abstract : Government of Indonesia in this regard Kemendagri is currently implementing an e-ID card (e-KTP) program that was developed by the Single Identification Number (SIN). The program is expected to generate an accurate database of the national population. Availability of better population database will provide the maximum benefit if it can be utilized in various applications of information systems at the regency. Distributed database population is a potential development of e-Gov better, through the development of various primary and secondary/derivative information systems with integrated of database, middleware, and applications which developed by web service technologies. Evaluation of system performance continuously also need to be done as part of the process in the life cycle of information systems. Keywords: e-Government, e-KTP, integration, population database, web servic

    Performance Analysis Of Scalable Sql And Nosql Databases : A Quantitative Approach

    Get PDF
    Benchmarking is a common method in evaluating and choosing a NoSQL database. There are already lots of benchmarking reports available in internet and research papers. Most of the benchmark reports measure the database performance only by overall throughput and latency. This is an adequate performance analysis but need not to be the end. We define some new perspectives which also need to be considered during NoSQL performance analysis. We have demonstrated this approach by benchmarking HBase, MongoDB and sharded MySQL using YCSB. Based on the results we observe that NoSQL databases do not consider the capability of the data nodes while assigning data to it. And these databases\u27 performance is seriously affected by the bottleneck nodes and the databases are not attempting to resolve this bottleneck situation automatically

    Big Data in the Cloud: A Survey

    Get PDF
    Big Data has become a hot topic across several business areas requiring the storage and processing of huge volumes of data. Cloud computing leverages Big Data by providing high storage and processing capabilities and enables corporations to consume resources in a pay-as-you-go model making clouds the optimal environment for storing and processing huge quantities of data. By using virtualized resources, Cloud can scale very easily, be highly available and provide massive storage capacity and processing power. This paper surveys existing databases models to store and process Big Data within a Cloud environment. Particularly, we detail the following traditional NoSQL databases: BigTable, Cassandra, DynamoDB, HBase, Hypertable, and MongoDB. The MapReduce framework and its developments Apache Spark, HaLoop, Twister, and other alternatives such as Apache Giraph, GraphLab, Pregel and MapD - a novel platform that uses GPU processing to accelerate Big Data processing - are also analyzed. Finally, we present two case studies that demonstrate the successful use of Big Data within Cloud environments and the challenges that must be addressed in the future

    SparkIR: a Scalable Distributed Information Retrieval Engine over Spark

    Get PDF
    Search engines have to deal with a huge amount of data (e.g., billions of documents in the case of the Web) and find scalable and efficient ways to produce effective search results. In this thesis, we propose to use Spark framework, an in memory distributed big data processing framework, and leverage its powerful capabilities of handling large amount of data to build an efficient and scalable experimental search engine over textual documents. The proposed system, SparkIR, can serve as a research framework for conducting information retrieval (IR) experiments. SparkIR supports two indexing schemes, document-based partitioning and term-based partitioning, to adopt document-at-a-time (DAAT) and term-at-a-time (TAAT) query evaluation methods. Moreover, it offers static and dynamic pruning to improve the retrieval efficiency. For static pruning, it employs champion list and tiering, while for dynamic pruning, it uses MaxScore top k retrieval. We evaluated the performance of SparkIR using ClueWeb12-B13 collection that contains about 50M English Web pages. Experiments over different subsets of the collection and compared the Elasticsearch baseline show that SparkIR exhibits reasonable efficiency and scalability performance overall for both indexing and retrieval. Implemented as an open-source library over Spark, users of SparkIR can also benefit from other Spark libraries (e.g., MLlib and GraphX), which, therefore, eliminates the need of usin
    corecore