1,202 research outputs found

    Resampling Methods and Visualization Tools for Computer Performance Comparisons in the Presence of Performance Variation

    Get PDF
    Performance variability, stemming from non-deterministic hardware and software behaviors or deterministic behaviors such as measurement bias, is a well-known phenomenon of computer systems which increases the difficulty of comparing computer performance metrics and is slated to become even more of a concern as interest in Big Data Analytics increases. Conventional methods use various measures (such as geometric mean) to quantify the performance of different benchmarks to compare computers without considering this variability which may lead to wrong conclusions. In this paper, we propose three resampling methods for performance evaluation and comparison: a randomization test for a general performance comparison between two computers, bootstrapping confidence estimation, and an empirical distribution and five-number-summary for performance evaluation. The results show that for both PARSEC and high-variance BigDataBench benchmarks: 1) the randomization test substantially improves our chance to identify the difference between performance comparisons when the difference is not large; 2) bootstrapping confidence estimation provides an accurate confidence interval for the performance comparison measure (e.g. ratio of geometric means); and 3) when the difference is very small, a single test is often not enough to reveal the nature of the computer performance due to the variability of computer systems. We further propose using empirical distribution to evaluate computer performance and a five-number-summary to summarize computer performance. We use published SPEC 2006 results to investigate the sources of performance variation by predicting performance and relative variation for 8,236 machines. We achieve a correlation of predicted performances of 0.992 and a correlation of predicted and measured relative variation of 0.5. Finally, we propose the utilization of a novel Biplotting technique to visualize the effectiveness of benchmarks and cluster machines by behavior. We illustrate the results and conclusion through detailed Monte Carlo simulation studies and real examples

    Degradation-based reliability in outdoor environments

    Get PDF
    Traditionally, the field of reliability has been concerned with failure time data. As a result, degradation-based reliability methods have not been very well developed. This is especially true of analysis of degradation data resulting from highly variable environments. This dissertation, comprised of three papers, proposes two simulation-based methods to estimate reliability metrics for materials or products that degrade from exposure to the outdoor weather. In the first paper, time series modeling is used to estimate probability distribution of cumulative degradation in x years and probability distribution of failure time. A procedure to construct approximate confidence intervals for metrics of interest is also given. The second paper is an extension of the work presented in the first paper to include the case where there is an additional uncertainty due to unit-to-unit variability. The paper discusses reliability quantities of interest induced by the presence of two sources of variability and techniques to estimate them. Bayesian methods are used to estimate the distribution of the population of units, and an approximation technique to overcome computational difficulties is described. The third paper uses a model-free block bootstrap scheme to estimate reliability quantities in the context of periodic data. The degradation data has periodic structure due to the seasonality of the outdoor environment. The paper also proposes two methods to choose block size. The choice of block size is an important issue in the implementation of a block bootstrap scheme. A comparison is also made between the results from time series modeling and from block bootstrap

    Contribuitions and developments on nonintrusive load monitoring

    Get PDF
    Energy efficiency is a key subject in our present world agenda, not only because of greenhouse gas emissions, which contribute to global warming, but also because of possible supply interruptions. In Brazil, energy wastage in the residential market is estimated to be around 15%. Previous studies have indicated that the most savings were achieved with specific appliance, electricity consumption feedback, which caused behavioral changes and encouraged consumers to pursue energy conservation. Nonintrusive Load Monitoring (NILM) is a relatively new term. It aims to disaggregate global consumption at an appliance level, using only a single point of measurement. Various methods have been suggested to infer when appliances are turned on and off, using the analysis of current and voltage aggregated waveforms. Within this context, we aim to provide a methodology for NILM to determine which sets of electrical features and feature extraction rates, obtained from aggregated household data, are essential to preserve equivalent levels of accuracy; thus reducing the amount of data that needs to be transferred to, and stored on, cloud servers. As an addendum to this thesis, a Brazilian appliance dataset, sampled from real appliances, was developed for future NILM developments and research. Beyond that, a low-cost NILM smart meter was developed to encourage consumers to change their habits to more sustainable methods.Eficiência energética é um assunto essencial na agenda mundial. No Brasil, o desperdício de energia no setor residencial é estimado em 15%. Estudos indicaram que maiores ganhos em eficiência são conseguidos quando o usuário recebe as informações de consumo detalhadas por cada aparelho, provocando mudanças comportamentais e incentivando os consumidores na conservação de energia. Monitoramento não intrusivo de cargas (NILM da sigla em inglês) é um termo relativamente novo. A sua finalidade é inferir o consumo de um ambiente até observar os consumos individualizados de cada equipamento utilizando-se de apenas um único ponto de medição. Métodos sofisticados têm sido propostos para inferir quando os aparelhos são ligados e desligados em um ambiente. Dentro deste contexto, este trabalho apresenta uma metodologia para a definição de um conjunto mínimo de características elétricas e sua taxa de extração que reduz a quantidade de dados a serem transmitidos e armazenados em servidores de processamento de dados, preservando níveis equivalentes de acurácia. São utilizadas diferentes técnicas de aprendizado de máquina visando à caracterização e solução do problema. Como adendo ao trabalho, apresenta-se um banco de dados de eletrodomésticos brasileiros, com amostras de equipamentos nacionais para desenvolvimentos futuros em NILM, além de um medidor inteligente de baixo custo para desagregação de cargas, visando tornar o consumo de energia mais sustentável

    Resampling to accelerate cross-correlation searches for continuous gravitational waves from binary systems

    Get PDF
    Continuous-wave (CW) gravitational waves (GWs) call for computationally-intensive methods. Low signal-to-noise ratio signals need templated searches with long coherent integration times and thus fine parameter-space resolution. Longer integration increases sensitivity. Low-mass x-ray binaries (LMXBs) such as Scorpius X-1 (Sco X-1) may emit accretion-driven CWs at strains reachable by current ground-based observatories. Binary orbital parameters induce phase modulation. This paper describes how resampling corrects binary and detector motion, yielding source-frame time series used for cross-correlation. Compared to the previous, detector-frame, templated cross-correlation method, used for Sco X-1 on data from the first Advanced LIGO observing run (O1), resampling is about 20x faster in the costliest, most-sensitive frequency bands. Speed-up factors depend on integration time and search setup. The speed could be reinvested into longer integration with a forecast sensitivity gain, 20 to 125 Hz median, of approximately 51%, or from 20 to 250 Hz, 11%, given the same per-band cost and setup. This paper's timing model enables future setup optimization. Resampling scales well with longer integration, and at 10x unoptimized cost could reach respectively 2.83x and 2.75x median sensitivities, limited by spin-wandering. Then an O1 search could yield a marginalized-polarization upper limit reaching torque-balance at 100 Hz. Frequencies from 40 to 140 Hz might be probed in equal observing time with 2x improved detectors.Comment: 28 pages, 7 figures, 3 table

    Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks

    Full text link
    Over the last decade, Convolutional Neural Network (CNN) models have been highly successful in solving complex vision problems. However, these deep models are perceived as "black box" methods considering the lack of understanding of their internal functioning. There has been a significant recent interest in developing explainable deep learning models, and this paper is an effort in this direction. Building on a recently proposed method called Grad-CAM, we propose a generalized method called Grad-CAM++ that can provide better visual explanations of CNN model predictions, in terms of better object localization as well as explaining occurrences of multiple object instances in a single image, when compared to state-of-the-art. We provide a mathematical derivation for the proposed method, which uses a weighted combination of the positive partial derivatives of the last convolutional layer feature maps with respect to a specific class score as weights to generate a visual explanation for the corresponding class label. Our extensive experiments and evaluations, both subjective and objective, on standard datasets showed that Grad-CAM++ provides promising human-interpretable visual explanations for a given CNN architecture across multiple tasks including classification, image caption generation and 3D action recognition; as well as in new settings such as knowledge distillation.Comment: 17 Pages, 15 Figures, 11 Tables. Accepted in the proceedings of IEEE Winter Conf. on Applications of Computer Vision (WACV2018). Extended version is under review at IEEE Transactions on Pattern Analysis and Machine Intelligenc

    Algebraic Comparison of Partial Lists in Bioinformatics

    Get PDF
    The outcome of a functional genomics pipeline is usually a partial list of genomic features, ranked by their relevance in modelling biological phenotype in terms of a classification or regression model. Due to resampling protocols or just within a meta-analysis comparison, instead of one list it is often the case that sets of alternative feature lists (possibly of different lengths) are obtained. Here we introduce a method, based on the algebraic theory of symmetric groups, for studying the variability between lists ("list stability") in the case of lists of unequal length. We provide algorithms evaluating stability for lists embedded in the full feature set or just limited to the features occurring in the partial lists. The method is demonstrated first on synthetic data in a gene filtering task and then for finding gene profiles on a recent prostate cancer dataset

    Anomaly Detection of Smart Meter Data

    Get PDF
    Presently, households and buildings use almost one-third of total energy consumption among all the power consumption sources. This trend is continuing to rise as more and more buildings install smart meter sensors and connect to the Smart Grid. Smart Grid uses sensors and ICT technologies to achieve an uninterrupted power supply and minimize power wastage. Abnormalities in sensors and faults lead to power wastage. Along with that studying the consumption pattern of a building can lead to a substantial reduction in power wastage which can save millions of dollars. According to studies, 20\% of energy consumed by buildings are wasted due to the above factors. In this work, we propose an anomaly detection approach for detecting anomalies in the power consumption of smart meter data from an open dataset of 10 houses from Ausgrid Corporation Australia. Since the power consumption may be affected by various factors such as weather conditions during the year, it was necessary to search for a way to discover the anomalies, considering seasonal periods such as weather seasons, day/night and holidays. Consequently, the first part of this thesis is to identify the outliers and obtain data with labels (normal or anomalous). We use Facebook prophet algorithm along with power consumption domain knowledge to detect anomalies for two years of half-hour sampled data. After generating the dataset with anomaly labels, we proposed a method to classify future power consumptions as anomalous or normal. We use four different approaches using machine learning for classifying anomalies. We also measure the run-time of different classification algorithms. We are able to achieve a G-mean score of 97 per cent
    corecore