229 research outputs found

    Evaluation and optimization of Big Data Processing on High Performance Computing Systems

    Get PDF
    Programa Oficial de Doutoramento en Investigación en Tecnoloxías da Información. 524V01[Resumo] Hoxe en día, moitas organizacións empregan tecnoloxías Big Data para extraer información de grandes volumes de datos. A medida que o tamaño destes volumes crece, satisfacer as demandas de rendemento das aplicacións de procesamento de datos masivos faise máis difícil. Esta Tese céntrase en avaliar e optimizar estas aplicacións, presentando dúas novas ferramentas chamadas BDEv e Flame-MR. Por unha banda, BDEv analiza o comportamento de frameworks de procesamento Big Data como Hadoop, Spark e Flink, moi populares na actualidade. BDEv xestiona a súa configuración e despregamento, xerando os conxuntos de datos de entrada e executando cargas de traballo previamente elixidas polo usuario. Durante cada execución, BDEv extrae diversas métricas de avaliación que inclúen rendemento, uso de recursos, eficiencia enerxética e comportamento a nivel de microarquitectura. Doutra banda, Flame-MR permite optimizar o rendemento de aplicacións Hadoop MapReduce. En xeral, o seu deseño baséase nunha arquitectura dirixida por eventos capaz de mellorar a eficiencia dos recursos do sistema mediante o solapamento da computación coas comunicacións. Ademais de reducir o número de copias en memoria que presenta Hadoop, emprega algoritmos eficientes para ordenar e mesturar os datos. Flame-MR substitúe o motor de procesamento de datos MapReduce de xeito totalmente transparente, polo que non é necesario modificar o código de aplicacións xa existentes. A mellora de rendemento de Flame-MR foi avaliada de maneira exhaustiva en sistemas clúster e cloud, executando tanto benchmarks estándar coma aplicacións pertencentes a casos de uso reais. Os resultados amosan unha redución de entre un 40% e un 90% do tempo de execución das aplicacións. Esta Tese proporciona aos usuarios e desenvolvedores de Big Data dúas potentes ferramentas para analizar e comprender o comportamento de frameworks de procesamento de datos e reducir o tempo de execución das aplicacións sen necesidade de contar con coñecemento experto para elo.[Resumen] Hoy en día, muchas organizaciones utilizan tecnologías Big Data para extraer información de grandes volúmenes de datos. A medida que el tamaño de estos volúmenes crece, satisfacer las demandas de rendimiento de las aplicaciones de procesamiento de datos masivos se vuelve más difícil. Esta Tesis se centra en evaluar y optimizar estas aplicaciones, presentando dos nuevas herramientas llamadas BDEv y Flame-MR. Por un lado, BDEv analiza el comportamiento de frameworks de procesamiento Big Data como Hadoop, Spark y Flink, muy populares en la actualidad. BDEv gestiona su configuración y despliegue, generando los conjuntos de datos de entrada y ejecutando cargas de trabajo previamente elegidas por el usuario. Durante cada ejecución, BDEv extrae diversas métricas de evaluación que incluyen rendimiento, uso de recursos, eficiencia energética y comportamiento a nivel de microarquitectura. Por otro lado, Flame-MR permite optimizar el rendimiento de aplicaciones Hadoop MapReduce. En general, su diseño se basa en una arquitectura dirigida por eventos capaz de mejorar la eficiencia de los recursos del sistema mediante el solapamiento de la computación con las comunicaciones. Además de reducir el número de copias en memoria que presenta Hadoop, utiliza algoritmos eficientes para ordenar y mezclar los datos. Flame-MR reemplaza el motor de procesamiento de datos MapReduce de manera totalmente transparente, por lo que no se necesita modificar el código de aplicaciones ya existentes. La mejora de rendimiento de Flame-MR ha sido evaluada de manera exhaustiva en sistemas clúster y cloud, ejecutando tanto benchmarks estándar como aplicaciones pertenecientes a casos de uso reales. Los resultados muestran una reducción de entre un 40% y un 90% del tiempo de ejecución de las aplicaciones. Esta Tesis proporciona a los usuarios y desarrolladores de Big Data dos potentes herramientas para analizar y comprender el comportamiento de frameworks de procesamiento de datos y reducir el tiempo de ejecución de las aplicaciones sin necesidad de contar con conocimiento experto para ello.[Abstract] Nowadays, Big Data technologies are used by many organizations to extract valuable information from large-scale datasets. As the size of these datasets increases, meeting the huge performance requirements of data processing applications becomes more challenging. This Thesis focuses on evaluating and optimizing these applications by proposing two new tools, namely BDEv and Flame-MR. On the one hand, BDEv allows to thoroughly assess the behavior of widespread Big Data processing frameworks such as Hadoop, Spark and Flink. It manages the configuration and deployment of the frameworks, generating the input datasets and launching the workloads specified by the user. During each workload, it automatically extracts several evaluation metrics that include performance, resource utilization, energy efficiency and microarchitectural behavior. On the other hand, Flame-MR optimizes the performance of existing Hadoop MapReduce applications. Its overall design is based on an event-driven architecture that improves the efficiency of the system resources by pipelining data movements and computation. Moreover, it avoids redundant memory copies present in Hadoop, while also using efficient sort and merge algorithms for data processing. Flame-MR replaces the underlying MapReduce data processing engine in a transparent way and thus the source code of existing applications does not require to be modified. The performance benefits provided by Flame- MR have been thoroughly evaluated on cluster and cloud systems by using both standard benchmarks and real-world applications, showing reductions in execution time that range from 40% to 90%. This Thesis provides Big Data users with powerful tools to analyze and understand the behavior of data processing frameworks and reduce the execution time of the applications without requiring expert knowledge

    BDEv 3.0: energy efficiency and microarchitectural characterization of Big Data processing frameworks

    Get PDF
    This is a post-peer-review, pre-copyedit version of an article published in Future Generation Computer Systems. The final authenticated version is available online at: https://doi.org/10.1016/j.future.2018.04.030[Abstract] As the size of Big Data workloads keeps increasing, the evaluation of distributed frameworks becomes a crucial task in order to identify potential performance bottlenecks that may delay the processing of large datasets. While most of the existing works generally focus only on execution time and resource utilization, analyzing other important metrics is key to fully understanding the behavior of these frameworks. For example, microarchitecture-level events can bring meaningful insights to characterize the interaction between frameworks and hardware. Moreover, energy consumption is also gaining increasing attention as systems scale to thousands of cores. This work discusses the current state of the art in evaluating distributed processing frameworks, while extending our Big Data Evaluator tool (BDEv) to extract energy efficiency and microarchitecture-level metrics from the execution of representative Big Data workloads. An experimental evaluation using BDEv demonstrates its usefulness to bring meaningful information from popular frameworks such as Hadoop, Spark and Flink.Ministerio de Economía, Industria y Competitividad; TIN2016-75845-PMinisterio de Educación; FPU14/02805Ministerio de Educación; FPU15/0338

    Performance Evaluation and Benchmarking of Event Processing Systems

    Get PDF
    Tese de Doutoramento em Ciências e Tecnologias da Informação apresentada à Faculdade de Ciências e Tecnologia da Universidade de Coimbra.Esta dissertação tem por objetivo estudar e comparar o desempenho dos sistemas de processamento de eventos, bem como propor novas técnicas que melhorem sua eficiência e escalabilidade. Nos últimos anos os sistemas de processamento de eventos têm tido uma difusão bastante rápida, tanto no meio acadêmico, onde deram origem a vários projetos de investigação, como na indústria, onde fomentaram o aparecimento de dezenas de startups e fazem-se hoje presentes nos mais diversos domínios de aplicação. No entanto, tem-se observado uma falta generalizada de informação, metodologias de avaliação e ferramentas no que diz respeito ao desempenho das plataformas de processamento de eventos. Até recentemente, não era conhecido ao certo que fatores afetam mais o seu desempenho, se os sistemas seriam capazes de escalar e adaptar-se às mudanças frequentes nas condições de carga, ou se teriam alguma limitação específica. Além disso, a falta de benchmarks padronizados impedia que se estabelecesse qualquer comparação objetiva entre os diversos produtos. Este trabalho visa preencher estas lacunas, e para isso foram abordados quatro tópicos principais. Primeiramente, desenvolvemos o framework FINCoS, um conjunto de ferramentas de benchmarking para a geração de carga e medição de desempenho de sistemas de processamento de eventos. O framework foi especificamente concebido de modo a ser independente dos produtos testados e da carga de trabalho utilizada, permitindo, assim, a sua reutilização em diversos estudos de desempenho e benchmarks. Em seguida, definimos uma série de microbenchmarks e conduzimos um estudo alargado de desempenho envolvendo três sistemas distintos. Essa análise não só permitiu identificar alguns fatores críticos para o desempenho das plataformas de processamento de eventos, como também expôs limitações importantes dos produtos, tais como má utilização de recursos e falhas devido à falta de memória. A partir dos resultados obtidos, passamos a nos dedicar à investigação de melhorias de desempenho. A fim de aprimorar a utilização de recursos, propusemos novos algoritmos e avaliamos esquemas de organização de dados alternativos que não só reduziram substancialmente o consumo de memória, como também se mostraram significativamente mais eficientes ao nível da microarquitetura. Para dirimir o problema de falta de memória, propusemos SlideM, um algoritmo de paginação que seletivamente envia partes do estado de queries contínuas para disco quando a memória física se torna-se insuficiente. Desenvolvemos também uma estratégia baseada no algoritmo SlideM para partilhar recursos computacionais durante o processamento de queries simultâneas. Concluímos esta dissertação propondo o benchmark Pairs. O benchmark visa avaliar a capacidade das plataformas de processamento de eventos em responder rapidamente a números progressivamente maiores de queries e taxas de entrada de dados cada vez mais altas. Para isso, a carga de trabalho do benchmark foi cuidadosamente concebida de modo a exercitar as operações encontradas com maior frequência em aplicações reais de processamento de eventos, tais como agregação, correlação e detecção de padrões. O benchmark Pairs também se diferencia de propostas anteriores em áreas relacionadas por permitir avaliar outros aspectos fundamentais, como adaptabilidade e escalabilidade com relação ao número de queries. De uma forma geral, esperamos que os resultados e propostas apresentados neste trabalho venham a contribuir para ampliar o entendimento acerca do desempenho das plataformas de processamento de eventos, e sirvam como estímulo para novos projetos de investigação que levem a melhorias adicionais à geração atual de sistemas.This thesis aims at studying, comparing, and improving the performance and scalability of event processing (EP) systems. In the last 15 years, event processing systems have gained increased attention from academia and industry, having found application in a number of mission-critical scenarios and motivated the onset of several research projects and specialized startups. Nonetheless, there has been a general lack of information, evaluation methodologies and tools in what concerns the performance of EP platforms. Until recently, it was not clear which factors impact most their performance, if the systems would scale well and adapt to changes in load conditions or if they had any serious limitations. Moreover, the lack of standardized benchmarks hindered any objective comparison among the diverse platforms. In this thesis, we tackle these problems by acting in several fronts. First, we developed FINCoS, a set of benchmarking tools for load generation and performance measurement of event processing systems. The framework has been designed to be independent on any particular workload or product so that it can be reused in multiple performance studies and benchmark kits. FINCoS has been made publicly available under the terms of the GNU General Public License and is also currently hosted at the Standard Performance Evaluation Corporation (SPEC) repository of peer-reviewed tools for quantitative system evaluation and analysis. We then defined a set of microbenchmarks and used them to conduct an extensive performance study on three EP systems. This analysis helped identifying critical factors affecting the performance of event processing platforms and exposed important limitations of the products, such as poor utilization of resources, trashing or failures in the presence of memory shortages, and no/incipient query plan sharing capabilities. With these results in hands, we moved our focus to performance enhancement. To improve resource utilization, we proposed novel algorithms and evaluated alternative data organization schemes that not only reduce substantially memory consumption, but also are significantly more efficient at the microarchitectural level. Our experimental evaluation corroborated the efficacy of the proposed optimizations: together they provided a 6-fold reduction in memory usage and order-of-magnitude increase on query throughput. In addition, we addressed the problem of memory-constrained applications by introducing SlideM, an optimal buffer management algorithm that selectively offloads sliding windows state to disk when main memory becomes insufficient. We also developed a strategy based on SlideM to share computational resources when processing multiple aggregation queries over overlapping sliding windows. Our experimental results demonstrate that, contrary to common sense, storing windows data on disk can be appropriate even for applications with very high event arrival rates. We concluded this thesis by proposing the Pairs benchmark. Pairs was designed to assess the ability of EP platforms in processing increasingly larger numbers of simultaneous queries and event arrival rates while providing quick answers. The benchmark workload exercises several common features that appear repeatedly in most event processing applications, including event filtering, aggregation, correlation and pattern detection. Furthermore, differently from previous proposals in related areas, Pairs allows evaluating important aspects of event processing systems such as adaptivity and query scalability. In general, we expect that the findings and proposals presented in this thesis serve to broaden the understanding on the performance of event processing platforms and open avenues for additional improvements in the current generation of EP systems.FCT Nº 45121/200

    Assessing malware detection using hardware performance counters

    Get PDF
    Despite the use of modern anti-virus (AV) software, malware is a prevailing threat to today's computing systems. AV software cannot cope with the increasing number of evasive malware, calling for more robust malware detection techniques. Out of the many proposed methods for malware detection, researchers have suggested microarchitecture-based mechanisms for detection of malicious software in a system. For example, Intel embeds a shadow stack in their modern architectures that maintains the integrity between function calls and their returns by tracking the function's return address. Any malicious program that exploits an application to overflow the return addresses can be restrained using the shadow stack. Researchers also propose the use of Hardware Performance Counters (HPCs). HPCs are counters embedded in modern computing architectures that count the occurrence of architectural events, such as cache hits, clock cycles, and integer instructions. Malware detectors that leverage HPCs create a profile of an application by reading the counter values periodically. Subsequently, researchers use supervised machine learning-based (ML) classification techniques to differentiate malicious profiles amongst benign ones. It is important to note that HPCs count the occurrence of microarchitectural events during execution of the program. However, whether a program is malicious or benign is the high-level behavior of a program. Since HPCs do not surveil the high-level behavior of an application, we hypothesize that the counters may fail to capture the difference in the behavioral semantics of a malicious and benign software. To investigate whether HPCs capture the behavioral semantics of the program, we recreate the experimental setup from the previously proposed systems. To this end, we leverage HPCs to profile applications such as MS-Office and Chrome as benign applications and known malware binaries as malicious applications. Standard ML classifiers demand a normally distributed dataset, where the variance is independent of the mean of the data points. To transform the profile into more normal-like distribution and to avoid over-fitting the machine learning models, we employ power transform on the profiles of the applications. Moreover, HPCs can monitor a broad range of hardware-based events. We use Principal Component Analysis (PCA) for selecting the top performance events that show maximum variation in the least number of features amongst all the applications profiled. Finally, we train twelve supervised machine learning classifiers such as Support Vector Machine (SVM) and MultiLayer Perceptron (MLPs) on the profiles from the applications. We model each classifier as a binary classifier, where the two classes are 'Benignware' and 'Malware.' Our results show that for the 'Malware' class, the average recall and F2-score across the twelve classifiers is 0.22 and 0.70 respectively. The low recall score shows that the ML classifiers tag malware as benignware. Even though we exercise a statistical approach for selecting our features, the classifiers are not able to distinguish between malware and benignware based on the hardware-based events monitored by the HPCs. The incapability of the profiles from HPCs in capturing the behavioral characteristic of an application force us to question the use of HPCs as malware detectors

    New Techniques for On-line Testing and Fault Mitigation in GPUs

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Approaches to multiprocessor error recovery using an on-chip interconnect subsystem

    Get PDF
    For future multicores, a dedicated interconnect subsystem for on-chip monitors was found to be highly beneficial in terms of scalability, performance and area. In this thesis, such a monitor network (MNoC) is used for multicores to support selective error identification and recovery and maintain target chip reliability in the context of dynamic voltage and frequency scaling (DVFS). A selective shared memory multiprocessor recovery is performed using MNoC in which, when an error is detected, only the group of processors sharing an application with the affected processors are recovered. Although the use of DVFS in contemporary multicores provides significant protection from unpredictable thermal events, a potential side effect can be an increased processor exposure to soft errors. To address this issue, a flexible fault prevention and recovery mechanism has been developed to selectively enable a small amount of per-core dual modular redundancy (DMR) in response to increased vulnerability, as measured by the processor architectural vulnerability factor (AVF). Our new algorithm for DMR deployment aims to provide a stable effective soft error rate (SER) by using DMR in response to DVFS caused by thermal events. The algorithm is implemented in real-time on the multicore using MNoC and controller which evaluates thermal information and multicore performance statistics in addition to error information. DVFS experiments with a multicore simulator using standard benchmarks show an average 6% improvement in overall power consumption and a stable SER by using selective DMR versus continuous DMR deployment

    TimeWeaver: A Tool for Hybrid Worst-Case Execution Time Analysis

    Get PDF
    Many embedded control applications have real-time requirements. If the application is safety-relevant, worst-case execution time bounds have to be determined in order to demonstrate deadline adherence. For high-performance multi-core architectures with degraded timing predictability, WCET bounds can be computed by hybrid WCET analysis which combines static analysis with timing measurements. This article focuses on a novel tool for hybrid WCET analysis based on non-intrusive instruction-level real-time tracing

    Performance and Microarchitectural Analysis for Image Quality Assessment

    Get PDF
    This thesis presents performance analysis for five matured Image Quality Assessment algorithms: VSNR, MAD, MSSIM, BLIINDS, and VIF, using the VTune ... from Intel. The main performance parameter considered is execution time. First, we conduct Hotspot Analysis to find the most time consuming sections for the five algorithms. Second, we perform Microarchitecural Analysis to analyze the behavior of the algorithms for Intel's Sandy Bridge microarchitecture to find architectural bottlenecks. Existing research for improving the performance of IQA algorithms is based on advanced signal processing techniques. Our research focuses on the interaction of IQA algorithms with the underlying hardware and architectural resources. We propose techniques to improve performance using coding techniques that exploit the hardware resources and consequently improve the execution time and computational performance. Along with software tuning methods, we also propose a generic custom IQA hardware engine based on the microarchitectural analysis and the behavior of these five IQA algorithms with the underlying microarchitectural resources.School of Electrical & Computer Engineerin

    Assessing the security of hardware-assisted isolation techniques

    Get PDF
    • …
    corecore