160 research outputs found

    Enhancing the scalability of a genetic algorithm to discover quantitative association rules in large-scale datasets

    Get PDF
    Association rule mining is a well-known methodology to discover significant and apparently hidden relations among attributes in a subspace of instances from datasets. Genetic algorithms have been extensively used to find interesting association rules. However, the rule-matching task of such techniques usually requires high computational and memory requirements. The use of efficient computational techniques has become a task of the utmost importance due to the high volume of generated data nowadays. Hence, this paper aims at improving the scalability of quantitative association rule mining techniques based on genetic algorithms to handle large-scale datasets without quality loss in the results obtained. For this purpose, a new representation of the individuals, new genetic operators and a windowing-based learning scheme are proposed to achieve successfully such challenging task. Specifically, the proposed techniques are integrated into the multi-objective evolutionary algorithm named QARGA-M to assess their performances. Both the standard version and the enhanced one of QARGA-M have been tested in several datasets that present different number of attributes and instances. Furthermore, the proposed methodologies have been integrated into other existing techniques based in genetic algorithms to discover quantitative association rules. The comparative analysis performed shows significant improvements of QARGA-M and other existing genetic algorithms in terms of computational costs without losing quality in the results when the proposed techniques are applied.Ministerio de Ciencia y Tecnología TIN2011- 28956-C02-02Junta de Andalucía TIC-7528Junta de Andalucía P12-TIC-1728Universidad Pablo de Olavide APPB81309

    GASSER: An Auto-Tunable System for General Sliding-Window Streaming Operators on GPUs

    Get PDF
    Today's stream processing systems handle high-volume data streams in an efficient manner. To achieve this goal, they are designed to scale out on large clusters of commodity machines. However, despite the efficient use of distributed architectures, they lack support to co-processors like graphical processing units (GPUs) ready to accelerate data-parallel tasks. The main reason for this lack of integration is that GPU processing and the streaming paradigm have different processing models, with GPUs needing a bulk of data present at once while the streaming paradigm advocates a tuple-at-a-time processing model. This paper contributes to fill this gap by proposing Gasser, a system for offloading the execution of sliding-window operators on GPUs. The system focuses on completely general functions by targeting the parallel processing of non-incremental queries that are not supported by the few existing GPU-based streaming prototypes. Furthermore, Gasser provides an auto-tuning approach able to automatically find the optimal value of the configuration parameters (i.e., batch length and the degree of parallelism) needed to optimize throughput and latency with the given query and data stream. The experimental part assesses the performance efficiency of Gasser by comparing its peak throughput and latency against Apache Flink, a popular and scalable streaming system. Furthermore, we evaluate the penalty induced by supporting completely general queries against the performance achieved by the state-of-the-art solution specifically optimized for incremental queries. Finally, we show the speed and accuracy of the auto-tuning approach adopted by Gasser, which is able to self-configure the system by finding the right configuration parameters without manual tuning by the users

    WaveScript: A Case-Study in Applying a Distributed Stream-Processing Language

    Get PDF
    Applications that combine live data streams with embedded, parallel,and distributed processing are becoming more commonplace. WaveScriptis a domain-specific language that brings high-level, type-safe,garbage-collected programming to these domains. This is made possibleby three primary implementation techniques. First, we employ a novelevaluation strategy that uses a combination of interpretation andreification to partially evaluate programs into stream dataflowgraphs. Second, we use profile-driven compilation to enable manyoptimizations that are normally only available in the synchronous(rather than asynchronous) dataflow domain. Finally, we incorporatean extensible system for rewrite rules to capture algebraic propertiesin specific domains (such as signal processing).We have used our language to build and deploy a sensor-network for theacoustic localization of wild animals, in particular, theYellow-Bellied marmot. We evaluate WaveScript's performance on thisapplication, showing that it yields good performance on both embeddedand desktop-class machines, including distributed execution andsubstantial parallel speedups. Our language allowed us to implementthe application rapidly, while outperforming a previous Cimplementation by over 35%, using fewer than half the lines of code.We evaluate the contribution of our optimizations to this success

    Neuroevolutionary learning in nonstationary environments

    Get PDF
    This work presents a new neuro-evolutionary model, called NEVE (Neuroevolutionary Ensemble), based on an ensemble of Multi-Layer Perceptron (MLP) neural networks for learning in nonstationary environments. NEVE makes use of quantum-inspired evolutionary models to automatically configure the ensemble members and combine their output. The quantum-inspired evolutionary models identify the most appropriate topology for each MLP network, select the most relevant input variables, determine the neural network weights and calculate the voting weight of each ensemble member. Four different approaches of NEVE are developed, varying the mechanism for detecting and treating concepts drifts, including proactive drift detection approaches. The proposed models were evaluated in real and artificial datasets, comparing the results obtained with other consolidated models in the literature. The results show that the accuracy of NEVE is higher in most cases and the best configurations are obtained using some mechanism for drift detection. These results reinforce that the neuroevolutionary ensemble approach is a robust choice for situations in which the datasets are subject to sudden changes in behaviour

    Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training

    Full text link
    Training deep neural networks (DNNs) is becoming increasingly more resource- and energy-intensive every year. Unfortunately, existing works primarily focus on optimizing DNN training for faster completion, often without considering the impact on energy efficiency. In this paper, we observe that common practices to improve training performance can often lead to inefficient energy usage. More importantly, we demonstrate that there is a tradeoff between energy consumption and performance optimization. To this end, we propose Zeus, an optimization framework to navigate this tradeoff by automatically finding optimal job- and GPU-level configurations for recurring DNN training jobs. Zeus uses an online exploration-exploitation approach in conjunction with just-in-time energy profiling, averting the need for expensive offline measurements, while adapting to data drifts over time. Our evaluation shows that Zeus can improve the energy efficiency of DNN training by 15.3%-75.8% for diverse workloads.Comment: NSDI 2023 | Homepage https://ml.energy/zeu

    Leveraging the entity matching performance through adaptive indexing and efficient parallelization

    Get PDF
    Entity Matching (EM), ou seja, a tarefa de identificar entidades que se referem a um mesmo objeto do mundo real, é uma tarefa importante e difícil para a integração e limpeza de fontes de dados. Uma das maiores dificuldades para a realização desta tarefa, na era de Big Data, é o tempo de execução elevado gerado pela natureza quadrática da execução da tarefa. Para minimizar a carga de trabalho preservando a qualidade na detecção de entidades similares, tanto para uma ou mais fontes de dados, foram propostos os chamados métodos de indexação ou blocagem. Estes métodos particionam o conjunto de dados em subconjuntos (blocos) de entidades potencialmente similares, rotulando-as com chaves de bloco, e restringem a execução da tarefa de EM entre entidades pertencentes ao mesmo bloco. Apesar de promover uma diminuição considerável no número de comparações realizadas, os métodos de indexação ainda podem gerar grandes quantidades de comparações, dependendo do tamanho dos conjuntos de dados envolvidos e/ou do número de entidades por índice (ou bloco). Assim, para reduzir ainda mais o tempo de execução, a tarefa de EM pode ser realizada em paralelo com o uso de modelos de programação tais como MapReduce e Spark. Contudo, a eficácia e a escalabilidade de abordagens baseadas nestes modelos depende fortemente da designação de dados feita da fase de map para a fase de reduce, para o caso de MapReduce, e da designação de dados entre as operações de transformação, para o caso de Spark. A robustez da estratégia de designação de dados é crucial para se alcançar alta eficiência, ou seja, otimização na manipulação de dados enviesados (conjuntos de dados grandes que podem causar gargalos de memória) e no balanceamento da distribuição da carga de trabalho entre os nós da infraestrutura distribuída. Assim, considerando que a investigação de abordagens que promovam a execução eficiente, em modo batch ou tempo real, de métodos de indexação adaptativa de EM no contexto da computação distribuída ainda não foi contemplada na literatura, este trabalho consiste em propor um conjunto de abordagens capaz de executar a indexação adaptativas de EM de forma eficiente, em modo batch ou tempo real, utilizando os modelos programáticos MapReduce e Spark. O desempenho das abordagens propostas é analisado em relação ao estado da arte utilizando infraestruturas de cluster e fontes de dados reais. Os resultados mostram que as abordagens propostas neste trabalho apresentam padrões que evidenciam o aumento significativo de desempenho da tarefa de EM distribuída promovendo, assim, uma redução no tempo de execução total e a preservação da qualidade da detecção de pares de entidades similares.Entity Matching (EM), i.e., the task of identifying all entities referring to the same realworld object, is an important and difficult task for data sources integration and cleansing. A major difficulty for this task performance, in the Big Data era, is the quadratic nature of the task execution. To minimize the workload and still maintain high levels of matching quality, for both single or multiple data sources, the indexing (blocking) methods were proposed. Such methods work by partitioning the input data into blocks of similar entities, according to an entity attribute, or a combination of them, commonly called “blocking key”, and restricting the EM process to entities that share the same blocking key (i.e., belong to the same block). In spite to promote a considerable decrease in the number of comparisons executed, indexing methods can still generate large amounts of comparisons, depending on the size of the data sources involved and/or the number of entities per index (or block). Thus, to further minimize the execution time, the EM task can be performed in parallel using programming models such as MapReduce and Spark. However, the effectiveness and scalability of MapReduce and Spark-based implementations for data-intensive tasks depend on the data assignment made from map to reduce tasks, in the case of MapReduce, and the data assignment between the transformation operations, in the case of Spark. The robustness of this assignment strategy is crucial to achieve skewed data handling (large sets of data can cause memory bottlenecks) and balanced workload distribution among all nodes of the distributed infrastructure. Thus, considering that studies about approaches that perform the efficient execution of adaptive indexing EM methods, in batch or real-time modes, in the context of parallel computing are an open gap according to the literature, this work proposes a set of parallel approaches capable of performing efficient adaptive indexing EM approaches using MapReduce and Spark in batch or real-time modes. The proposed approaches are compared to state-of-the-art ones in terms of performance using real cluster infrastructures and data sources. The results carried so far show evidences that the performance of the proposed approaches is significantly increased, enabling a decrease in the overall runtime while preserving the quality of similar entities detection

    High-Performance Computing Algorithms for Constructing Inverted Files on Emerging Multicore Processors

    Get PDF
    Current trends in processor architectures increasingly include more cores on a single chip and more complex memory hierarchies, and such a trend is likely to continue in the foreseeable future. These processors offer unprecedented opportunities for speeding up demanding computations if the available resources can be effectively utilized. Simultaneously, parallel programming languages such as OpenMP and MPI have been commonly used on clusters of multicore CPUs while newer programming languages such as OpenCL and CUDA have been widely adopted on recent heterogeneous systems and GPUs respectively. The main goal of this dissertation is to develop techniques and methodologies for exploiting these emerging parallel architectures and parallel programming languages to solve large scale irregular applications such as the construction of inverted files. The extraction of inverted files from large collections of documents forms a critical component of all information retrieval systems including web search engines. In this problem, the disk I/O throughput is the major performance bottleneck especially when intermediate results are written onto disks. In addition to the I/O bottleneck, a number of synchronization and consistency issues must be resolved in order to build the dictionary and postings lists efficiently. To address these issues, we introduce a dictionary data structure using a hybrid of trie and B-trees and a high-throughput pipeline strategy that completely avoids the use of disks as temporary storage for intermediate results, while ensuring the consumption of the input data at a high rate. The high-throughput pipelined strategy produces parallel parsed streams that are consumed at the same rate by parallel indexers. The pipelined strategy is implemented on a single multicore CPU as well as on a cluster of such nodes. We were able to achieve a throughput of more than 262MB/s on the ClueWeb09 dataset on a single node. On a cluster of 32 nodes, our experimental results show scalable performance using different metrics, significantly improving on prior published results. On the other hand, we develop a new approach for handling time-evolving documents using additional small temporal indexing structures. The lifetime of the collection is partitioned into multiple time windows, which guarantees a very fast temporal query response time at a small space overhead relative to the non-temporal case. Extensive experimental results indicate that the overhead in both indexing and querying is small in this more complicated case, and the query performance can indeed be improved using finer temporal partitioning of the collection. Finally, we employ GPUs to accelerate the indexing process for building inverted files and to develop a very fast algorithm for the highly irregular list ranking problem. For the indexing problem, the workload is split between CPUs and GPUs in such a way that the strengths of both architectures are exploited. For the list ranking problem involved in the decompression of inverted files, an optimized GPU algorithm is introduced by reducing the problem to a large number of fine grain computations in such a way that the processing cost per element is shown to be close to the best possible

    Scalable and fault-tolerant data stream processing on multi-core architectures

    Get PDF
    With increasing data volumes and velocity, many applications are shifting from the classical “process-after-store” paradigm to a stream processing model: data is produced and consumed as continuous streams. Stream processing captures latency-sensitive applications as diverse as credit card fraud detection and high-frequency trading. These applications are expressed as queries of algebraic operations (e.g., aggregation) over the most recent data using windows, i.e., finite evolving views over the input streams. To guarantee correct results, streaming applications require precise window semantics (e.g., temporal ordering) for operations that maintain state. While high processing throughput and low latency are performance desiderata for stateful streaming applications, achieving both poses challenges. Computing the state of overlapping windows causes redundant aggregation operations: incremental execution (i.e., reusing previous results) reduces latency but prevents parallelization; at the same time, parallelizing window execution for stateful operations with precise semantics demands ordering guarantees and state access coordination. Finally, streams and state must be recovered to produce consistent and repeatable results in the event of failures. Given the rise of shared-memory multi-core CPU architectures and high-speed networking, we argue that it is possible to address these challenges in a single node without compromising window semantics, performance, or fault-tolerance. In this thesis, we analyze, design, and implement stream processing engines (SPEs) that achieve high performance on multi-core architectures. To this end, we introduce new approaches for in-memory processing that address the previous challenges: (i) for overlapping windows, we provide a family of window aggregation techniques that enable computation sharing based on the algebraic properties of aggregation functions; (ii) for parallel window execution, we balance parallelism and incremental execution by developing abstractions for both and combining them to a novel design; and (iii) for reliable single-node execution, we enable strong fault-tolerance guarantees without sacrificing performance by reducing the required disk I/O bandwidth using a novel persistence model. We combine the above to implement an SPE that processes hundreds of millions of tuples per second with sub-second latencies. These results reveal the opportunity to reduce resource and maintenance footprint by replacing cluster-based SPEs with single-node deployments.Open Acces

    Point spread function estimation of solar surface images with a cooperative particle swarm optmization on GPUS

    Get PDF
    Orientador : Prof. Dr. Daniel WeingaertnerDissertação (mestrado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa: Curitiba, 21/02/3013Bibliografia : fls. 81-86Resumo: Apresentamos um método para a estimativa da função de espalhamento pontual (PSF) de imagens de superfície solar obtidas por telescópios terrestres e corrompidas pela atmosfera. A estimativa e feita obtendo-se a fase da frente de onda usando um conjunto de imagens de curta exposto, a reconstrucão de granulado optico do objeto observado e um modelo PSF parametrizado por polinómios de Zernikes. Estimativas da fase da frente de onda e do PSF sao computados atraves da minimizacao de uma funcao de erro com um metodo de otimizacão cooperativa por nuvens de partículas (CPSO), implementados em OpenCL para tirar vantagem do ambiente altamente paralelo Um metodo de calibracao e apresentado para ajustar os parâmetros do que as unidade de processamento gráfico (GPU) provem. algoritmo para resultados de baixo custo, resultando em solidas estimativas tanto para imagens de baixa frequencia quanto para imagens de alta frequencia. Os resultados mostram que o metodo apresentado possui râpida convergencia e e robusto a degradacao causada por ruídos. Experimentos executados em uma placa NVidia Tesla C2050 computaram 100 PSFs com 50 polinómios de Zernike em " 36 minutos. Ao aumentar-se o námero de coeficientes de Zernike dez vezes, de 50 para 500, o tempo de execucão aumentou somente 17%, o que demonstra que o algoritmo proposto e pouco afetado pelo numero de Zernikes utilizado.Abstract: We present a method for estimating the point spread function (PSF) of solar surface images acquired from ground telescopes and degraded by atmosphere. The estimation is done by retrieving the wavefront phase using a set of short exposures, the speckle reconstruction of the observed object and a PSF model parametrized by Zernike polynomials. Estimates of the wavefront phase and the PSF are computed by minimizing an error function with a cooperative particle swarm optimization method (CPSO), implemented in OpenCL to take advantage of highly parallel graphical processing units (GPUs). A calibration method is presented to adjust the algorithm parameters for low cost results, providing solid estimations for both low frequency and high frequency images. Results show that the method has a fast convergence and is robust to noise degradation. Experiments run on an NVidia Tesla C2050 were able to compute 100 PSFs with 50 Zernike polynomials in " 36 minutes. The increase on the number of Zernike coefficients tenfold, from 50 to 500, caused the increase of 17% on the execution time, showing that the proposed algorithm is only slightly affected by the number of Zernikes used
    corecore