306 research outputs found

    Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

    Full text link
    Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

    An Enhanced Query Optimization Implemented in Hadoop using Bio-Inspired Algorithm with HDFS Technique

    Get PDF
    A more effective method for massive data query optimization using HDFS and the Bio-inspired algorithm. Big Data configuration and query optimization are the two phases of the process. To remove redundant data, the input data is first per-processed using HDFS. Then, utilizing entropy calculation, features like closed frequent pattern, support, and confidence are extracted and managed. The Bio-inspired Horse Herd approach is used to group pertinent information based on this outcome. In the second step, the Big Data queries are used to obtain the same features. The optimized query is then located using the Bio-inspired technique, and the similarity assessment procedure is run. The proposed algorithm, according to this research, outperforms other ones that is unique in use. It is challenging to determine the veracity of this claim without more information regarding the experimental setup and the precise measures employed to assess the algorithm's effectiveness. Furthermore, it is unknown how the proposed algorithm stacks up against other cutting-edge query optimization methods. Finally, the assess has efficiency of using this method, more optimistic query achieved and comparison analysis are proved

    Creation of column-oriented NoSQL databases automatically in Big Data environments and its impact on energy consumption

    Get PDF
    This study investigates the automatic creation of column-oriented NoSQL databases in Big Data environments and their impact on energy consumption. Traditional row-oriented databases face limitations in handling large volumes of data, resulting in slower query response times and energy inefficiencies. In contrast, column-oriented NoSQL databases store data in columns, enabling efficient compression, retrieval, and query processing. Innovative techniques are employed to automatically create these databases, optimizing performance and minimizing manual intervention. Storing data in a columnar format reduces storage requirements and power consumption while improving data locality and reducing I/O operations. This study emphasizes the benefits of adopting column-oriented NoSQL databases, including improved performance, scalability, and energy efficiency in Big Data environments

    Benchmarking Big Data SQL Frameworks

    Get PDF

    ACTiCLOUD: Enabling the Next Generation of Cloud Applications

    Get PDF
    Despite their proliferation as a dominant computing paradigm, cloud computing systems lack effective mechanisms to manage their vast amounts of resources efficiently. Resources are stranded and fragmented, ultimately limiting cloud systems' applicability to large classes of critical applications that pose non-moderate resource demands. Eliminating current technological barriers of actual fluidity and scalability of cloud resources is essential to strengthen cloud computing's role as a critical cornerstone for the digital economy. ACTiCLOUD proposes a novel cloud architecture that breaks the existing scale-up and share-nothing barriers and enables the holistic management of physical resources both at the local cloud site and at distributed levels. Specifically, it makes advancements in the cloud resource management stacks by extending state-of-the-art hypervisor technology beyond the physical server boundary and localized cloud management system to provide a holistic resource management within a rack, within a site, and across distributed cloud sites. On top of this, ACTiCLOUD will adapt and optimize system libraries and runtimes (e.g., JVM) as well as ACTiCLOUD-native applications, which are extremely demanding, and critical classes of applications that currently face severe difficulties in matching their resource requirements to state-of-the-art cloud offerings

    ANALISIS ANTENATAL CARE (ANC) PADA SURVEILANS KESEHATAN IBU DAN ANAK DENGAN TAHAPAN AGREGASI PIPELINE NOSQL

    Get PDF
    Case 30.8 percent of Indonesian children under five are stunted. Bantul is a district in the Province of D.I. Yogyakarta, Indonesia, is a locus of stunting. Bantul has ten villages. The ten villages include Patalan Jetis Village, Canden Jetis Village, Terong Dlingo Village, Argodadi Sedayu Village, Triharjo Pandak Village, Triwidadi Pajangan Village, Jatimulyo Dlingo Village, Datangharjo Sewon Village, Sendangsari Pajangan Village, and Trimulyo Jetis Village. The research focuses on the village of Argodadi Sedayu. In the village of Argodadi Sedayu, Antenatal Care (ANC) research would be conducted. Antenatal Care (ANC) is a pregnancy check by a doctor or midwife. Therefore, Antenatal Care Analysis (ANC) is needed to determine whether diet, parenting, and sanitation are well programmed. Antenatal care (ANC) research framework was a model of method improvement. The method improvement model consists of indicators, proposed methods, objectives, and measurements. The indicators consist of monitoring instruments and health visits. The proposed method uses an aggregation pipeline stage. The data was processed in the aggregation pipeline stage. The data were obtained from the time series data surveillance dataset. The research objective was to analyze the research results accurately according to the proposed method. Measurement of indicator analysis with the application of the dashboard as a performance indicator on the research results. Practically, it is hoped that the research results could consider the health office and related institutions in reducing or even elevating Argodadi Sedayu Village in Yogyakarta as a non-locus of stunting using massive monitoring of diet, parenting, and sanitation well programmed

    On the security of NoSQL cloud database services

    Get PDF
    Processing a vast volume of data generated by web, mobile and Internet-enabled devices, necessitates a scalable and flexible data management system. Database-as-a-Service (DBaaS) is a new cloud computing paradigm, promising a cost-effective and scalable, fully-managed database functionality meeting the requirements of online data processing. Although DBaaS offers many benefits it also introduces new threats and vulnerabilities. While many traditional data processing threats remain, DBaaS introduces new challenges such as confidentiality violation and information leakage in the presence of privileged malicious insiders and adds new dimension to the data security. We address the problem of building a secure DBaaS for a public cloud infrastructure where, the Cloud Service Provider (CSP) is not completely trusted by the data owner. We present a high level description of several architectures combining modern cryptographic primitives for achieving this goal. A novel searchable security scheme is proposed to leverage secure query processing in presence of a malicious cloud insider without disclosing sensitive information. A holistic database security scheme comprised of data confidentiality and information leakage prevention is proposed in this dissertation. The main contributions of our work are: (i) A searchable security scheme for non-relational databases of the cloud DBaaS; (ii) Leakage minimization in the untrusted cloud. The analysis of experiments that employ a set of established cryptographic techniques to protect databases and minimize information leakage, proves that the performance of the proposed solution is bounded by communication cost rather than by the cryptographic computational effort

    Scalable Architecture for Integrated Batch and Streaming Analysis of Big Data

    Get PDF
    Thesis (Ph.D.) - Indiana University, Computer Sciences, 2015As Big Data processing problems evolve, many modern applications demonstrate special characteristics. Data exists in the form of both large historical datasets and high-speed real-time streams, and many analysis pipelines require integrated parallel batch processing and stream processing. Despite the large size of the whole dataset, most analyses focus on specific subsets according to certain criteria. Correspondingly, integrated support for efficient queries and post- query analysis is required. To address the system-level requirements brought by such characteristics, this dissertation proposes a scalable architecture for integrated queries, batch analysis, and streaming analysis of Big Data in the cloud. We verify its effectiveness using a representative application domain - social media data analysis - and tackle related research challenges emerging from each module of the architecture by integrating and extending multiple state-of-the-art Big Data storage and processing systems. In the storage layer, we reveal that existing text indexing techniques do not work well for the unique queries of social data, which put constraints on both textual content and social context. To address this issue, we propose a flexible indexing framework over NoSQL databases to support fully customizable index structures, which can embed necessary social context information for efficient queries. The batch analysis module demonstrates that analysis workflows consist of multiple algorithms with different computation and communication patterns, which are suitable for different processing frameworks. To achieve efficient workflows, we build an integrated analysis stack based on YARN, and make novel use of customized indices in developing sophisticated analysis algorithms. In the streaming analysis module, the high-dimensional data representation of social media streams poses special challenges to the problem of parallel stream clustering. Due to the sparsity of the high-dimensional data, traditional synchronization method becomes expensive and severely impacts the scalability of the algorithm. Therefore, we design a novel strategy that broadcasts the incremental changes rather than the whole centroids of the clusters to achieve scalable parallel stream clustering algorithms. Performance tests using real applications show that our solutions for parallel data loading/indexing, queries, analysis tasks, and stream clustering all significantly outperform implementations using current state-of-the-art technologies

    A software architecture for electro-mobility services: a milestone for sustainable remote vehicle capabilities

    Get PDF
    To face the tough competition, changing markets and technologies in automotive industry, automakers have to be highly innovative. In the previous decades, innovations were electronics and IT-driven, which increased exponentially the complexity of vehicle’s internal network. Furthermore, the growing expectations and preferences of customers oblige these manufacturers to adapt their business models and to also propose mobility-based services. One other hand, there is also an increasing pressure from regulators to significantly reduce the environmental footprint in transportation and mobility, down to zero in the foreseeable future. This dissertation investigates an architecture for communication and data exchange within a complex and heterogeneous ecosystem. This communication takes place between various third-party entities on one side, and between these entities and the infrastructure on the other. The proposed solution reduces considerably the complexity of vehicle communication and within the parties involved in the ODX life cycle. In such an heterogeneous environment, a particular attention is paid to the protection of confidential and private data. Confidential data here refers to the OEM’s know-how which is enclosed in vehicle projects. The data delivered by a car during a vehicle communication session might contain private data from customers. Our solution ensures that every entity of this ecosystem has access only to data it has the right to. We designed our solution to be non-technological-coupling so that it can be implemented in any platform to benefit from the best environment suited for each task. We also proposed a data model for vehicle projects, which improves query time during a vehicle diagnostic session. The scalability and the backwards compatibility were also taken into account during the design phase of our solution. We proposed the necessary algorithms and the workflow to perform an efficient vehicle diagnostic with considerably lower latency and substantially better complexity time and space than current solutions. To prove the practicality of our design, we presented a prototypical implementation of our design. Then, we analyzed the results of a series of tests we performed on several vehicle models and projects. We also evaluated the prototype against quality attributes in software engineering
    • …
    corecore