6,284 research outputs found

    Options in Scan Processing for Shared-Disk Parallel Database Systems

    Get PDF
    Shared-disk database systems offer a high degree of freedom in the allocation of workload compared to shared-nothing architectures. This creates a great potential for load balancing but also introduces additional complexity into the process of query scheduling. This report surveys the problems and opportunities faced in scan processing in a shared-disk environment. We list the parameters to tune and the decisions to make, as well as some known solutions and commonsense considerations, in order to identify the most promising areas of future research

    Multidimensional Range Queries on Modern Hardware

    Full text link
    Range queries over multidimensional data are an important part of database workloads in many applications. Their execution may be accelerated by using multidimensional index structures (MDIS), such as kd-trees or R-trees. As for most index structures, the usefulness of this approach depends on the selectivity of the queries, and common wisdom told that a simple scan beats MDIS for queries accessing more than 15%-20% of a dataset. However, this wisdom is largely based on evaluations that are almost two decades old, performed on data being held on disks, applying IO-optimized data structures, and using single-core systems. The question is whether this rule of thumb still holds when multidimensional range queries (MDRQ) are performed on modern architectures with large main memories holding all data, multi-core CPUs and data-parallel instruction sets. In this paper, we study the question whether and how much modern hardware influences the performance ratio between index structures and scans for MDRQ. To this end, we conservatively adapted three popular MDIS, namely the R*-tree, the kd-tree, and the VA-file, to exploit features of modern servers and compared their performance to different flavors of parallel scans using multiple (synthetic and real-world) analytical workloads over multiple (synthetic and real-world) datasets of varying size, dimensionality, and skew. We find that all approaches benefit considerably from using main memory and parallelization, yet to varying degrees. Our evaluation indicates that, on current machines, scanning should be favored over parallel versions of classical MDIS even for very selective queries

    A Framework for Fast Classification Algorithms

    Get PDF
    Today, due to globalization of the world the size of data set is increasing, it is necessary to discover the knowledge. The discovery of knowledge can be typically in the form of association rules, classification rules, clustering, discovery of frequent episodes and deviation detection. Fast and accurate classifiers for large databases are an important task in data mining. There is growing evidence that integrating classification and association rules mining, classification approaches based on heuristic, greedy search like decision tree induction. Emerging associative classification algorithms have shown good promises on producing accurate classifiers. In this paper we focus on performance of associative classification and present a parallel model for classifier building. For classifier building some parallel-distributed algorithms have been proposed for decision tree induction but so far no such work has been reported for associative classification

    CEP-DTHP : A Complex Event Processing using the Dual-Tier Hybrid Paradigm Over the Stream Mining Process

    Get PDF
    CEP is a widely used technique for the reliability and recognition of arbitrarily complex patterns in enormous data streams with great performance in real time. Real-time detection of crucial events and rapid response to them are the key goals of sophisticated event processing.  The performance of event processing systems can be improved by parallelizing CEP evaluation procedures. Utilizing CEP in parallel while deploying a multi-core or distributed environment is one of the most popular and widely recognized tackles to accomplish the goal. This paper demonstrates the ability to use an unusual parallelization strategy to effectively process complicated events over streams of data. This method depends on a dual-tier hybrid paradigm that combines several parallelism levels. Thread-level or task-level parallelism (TLP) and Data-level parallelism (DLP) were combined in this research. Many threads or instruction sequences from a comparable application can run concurrently under the TLP paradigm. In the DLP paradigm, instruc-tions from a single stream operate on several data streams at the same time. In our suggested model, there are four major stages: data mining, pre-processing, load shedding, and optimization. The first phase is online data mining, following which the data is materialized into a publicly available solution that combines a CEP engine with a library. Next, data pre-processing encompasses the efficient adaptation of the content or format of raw data from many, perhaps diverse sources. Finally, parallelization approaches have been created to reduce CEP processing time. By providing this two-type parallelism, our proposed solution combines the benefits of DLP and TLP while addressing their constraints. The JAVA tool will be used to assess the suggested technique. The performance of the suggested technique is compared to that of other current ways for determining the efficacy and efficiency of the proposed algorithm

    Enhancing Energy Production with Exascale HPC Methods

    Get PDF
    High Performance Computing (HPC) resources have become the key actor for achieving more ambitious challenges in many disciplines. In this step beyond, an explosion on the available parallelism and the use of special purpose processors are crucial. With such a goal, the HPC4E project applies new exascale HPC techniques to energy industry simulations, customizing them if necessary, and going beyond the state-of-the-art in the required HPC exascale simulations for different energy sources. In this paper, a general overview of these methods is presented as well as some specific preliminary results.The research leading to these results has received funding from the European Union's Horizon 2020 Programme (2014-2020) under the HPC4E Project (www.hpc4e.eu), grant agreement n° 689772, the Spanish Ministry of Economy and Competitiveness under the CODEC2 project (TIN2015-63562-R), and from the Brazilian Ministry of Science, Technology and Innovation through Rede Nacional de Pesquisa (RNP). Computer time on Endeavour cluster is provided by the Intel Corporation, which enabled us to obtain the presented experimental results in uncertainty quantification in seismic imagingPostprint (author's final draft

    Intelligent services for big data science

    Get PDF
    Cities are areas where Big Data is having a real impact. Town planners and administration bodies just need the right tools at their fingertips to consume all the data points that a town or city generates and then be able to turn that into actions that improve peoples' lives. In this case, Big Data is definitely a phenomenon that has a direct impact on the quality of life for those of us that choose to live in a town or city. Smart Cities of tomorrow will rely not only on sensors within the city infrastructure, but also on a large number of devices that will willingly sense and integrate their data into technological platforms used for introspection into the habits and situations of individuals and city-large communities. Predictions say that cities will generate over 4.1 terabytes per day per square kilometer of urbanized land area by 2016. Handling efficiently such amounts of data is already a challenge. In this paper we present our solutions designed to support next-generation Big Data applications. We first present CAPIM, a platform designed to automate the process of collecting and aggregating context information on a large scale. It integrates services designed to collect context data (location, user's profile and characteristics, as well as the environment). Later on, we present a concrete implementation of an Intelligent Transportation System designed on top of CAPIM. The application is designed to assist users and city officials better understand traffic problems in large cities. Finally, we present a solution to handle efficient storage of context data on a large scale. The combination of these services provides support for intelligent Smart City applications, for actively and autonomously adaptation and smart provision of services and content, using the advantages of contextual information.Peer ReviewedPostprint (author's final draft

    Bio-inspired computation for big data fusion, storage, processing, learning and visualization: state of the art and future directions

    Get PDF
    This overview gravitates on research achievements that have recently emerged from the confluence between Big Data technologies and bio-inspired computation. A manifold of reasons can be identified for the profitable synergy between these two paradigms, all rooted on the adaptability, intelligence and robustness that biologically inspired principles can provide to technologies aimed to manage, retrieve, fuse and process Big Data efficiently. We delve into this research field by first analyzing in depth the existing literature, with a focus on advances reported in the last few years. This prior literature analysis is complemented by an identification of the new trends and open challenges in Big Data that remain unsolved to date, and that can be effectively addressed by bio-inspired algorithms. As a second contribution, this work elaborates on how bio-inspired algorithms need to be adapted for their use in a Big Data context, in which data fusion becomes crucial as a previous step to allow processing and mining several and potentially heterogeneous data sources. This analysis allows exploring and comparing the scope and efficiency of existing approaches across different problems and domains, with the purpose of identifying new potential applications and research niches. Finally, this survey highlights open issues that remain unsolved to date in this research avenue, alongside a prescription of recommendations for future research.This work has received funding support from the Basque Government (Eusko Jaurlaritza) through the Consolidated Research Group MATHMODE (IT1294-19), EMAITEK and ELK ARTEK programs. D. Camacho also acknowledges support from the Spanish Ministry of Science and Education under PID2020-117263GB-100 grant (FightDIS), the Comunidad Autonoma de Madrid under S2018/TCS-4566 grant (CYNAMON), and the CHIST ERA 2017 BDSI PACMEL Project (PCI2019-103623, Spain)
    • …
    corecore