2,172 research outputs found

    Coz: Finding Code that Counts with Causal Profiling

    Full text link
    Improving performance is a central concern for software developers. To locate optimization opportunities, developers rely on software profilers. However, these profilers only report where programs spent their time: optimizing that code may have no impact on performance. Past profilers thus both waste developer time and make it difficult for them to uncover significant optimization opportunities. This paper introduces causal profiling. Unlike past profiling approaches, causal profiling indicates exactly where programmers should focus their optimization efforts, and quantifies their potential impact. Causal profiling works by running performance experiments during program execution. Each experiment calculates the impact of any potential optimization by virtually speeding up code: inserting pauses that slow down all other code running concurrently. The key insight is that this slowdown has the same relative effect as running that line faster, thus "virtually" speeding it up. We present Coz, a causal profiler, which we evaluate on a range of highly-tuned applications: Memcached, SQLite, and the PARSEC benchmark suite. Coz identifies previously unknown optimization opportunities that are both significant and targeted. Guided by Coz, we improve the performance of Memcached by 9%, SQLite by 25%, and accelerate six PARSEC applications by as much as 68%; in most cases, these optimizations involve modifying under 10 lines of code.Comment: Published at SOSP 2015 (Best Paper Award

    Low-cost online contact resistance measurement of power connectors to ease predictive maintenance

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting /republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksWith the increasing use of sensors and wireless communication systems, predictive maintenance is acquiring more and more importance to assess the condition of in-service equipment. Predictive maintenance presents promising cost savings, as it allows minimizing unscheduled power systems faults, which can have very costly and catastrophic consequences. Early stage detection of power system failure requires acquiring, monitoring, and periodically analyzing the condition of the elements involved, such as high-voltage power connectors, since they are critical devices which are often located in key points of power systems. This paper proposes a low-cost online system to determine the contact resistance of high-voltage direct current (dc) and alternating current (ac) power connectors, to determine their health condition in order to apply a predictive maintenance plan. The contact resistance is considered as a reliable indicator of the connector's health condition. However, it cannot be directly measured, and the applied strategy differs between dc and ac power systems. The experimental results show a maximum uncertainty of 4.5%, thus proving the accuracy and feasibility of the approach presented in this paper, since the proposed limit of acceptable resistance increase is 20%. This approach can also be applied to many other power systems' elements.Postprint (author's final draft

    Improving efficiency and resilience in large-scale computing systems through analytics and data-driven management

    Full text link
    Applications running in large-scale computing systems such as high performance computing (HPC) or cloud data centers are essential to many aspects of modern society, from weather forecasting to financial services. As the number and size of data centers increase with the growing computing demand, scalable and efficient management becomes crucial. However, data center management is a challenging task due to the complex interactions between applications, middleware, and hardware layers such as processors, network, and cooling units. This thesis claims that to improve robustness and efficiency of large-scale computing systems, significantly higher levels of automated support than what is available in today's systems are needed, and this automation should leverage the data continuously collected from various system layers. Towards this claim, we propose novel methodologies to automatically diagnose the root causes of performance and configuration problems and to improve efficiency through data-driven system management. We first propose a framework to diagnose software and hardware anomalies that cause undesired performance variations in large-scale computing systems. We show that by training machine learning models on resource usage and performance data collected from servers, our approach successfully diagnoses 98% of the injected anomalies at runtime in real-world HPC clusters with negligible computational overhead. We then introduce an analytics framework to address another major source of performance anomalies in cloud data centers: software misconfigurations. Our framework discovers and extracts configuration information from cloud instances such as containers or virtual machines. This is the first framework to provide comprehensive visibility into software configurations in multi-tenant cloud platforms, enabling systematic analysis for validating the correctness of software configurations. This thesis also contributes to the design of robust and efficient system management methods that leverage continuously monitored resource usage data. To improve performance under power constraints, we propose a workload- and cooling-aware power budgeting algorithm that distributes the available power among servers and cooling units in a data center, achieving up to 21% improvement in throughput per Watt compared to the state-of-the-art. Additionally, we design a network- and communication-aware HPC workload placement policy that reduces communication overhead by up to 30% in terms of hop-bytes compared to existing policies.2019-07-02T00:00:00

    Identifying and diagnosing video streaming performance issues

    Get PDF
    On-line video streaming is an ever evolving ecosystem of services and technologies, where content providers are on a constant race to satisfy the users' demand for richer content and higher bitrate streams, updated set of features and cross-platform compatibility. At the same time, network operators are required to ensure that the requested video streams are delivered through the network with a satisfactory quality in accordance with the existing Service Level Agreements (SLA). However, tracking and maintaining satisfactory video Quality of Experience (QoE) has become a greater challenge for operators than ever before. With the growing popularity of content engagement on handheld devices and over wireless connections, new points-of-failure have added to the list of failures that can affect the video quality. Moreover, the adoption of end-to-end encryption by major streaming services has rendered previously used QoE diagnosis methods obsolete. In this thesis, we identify the current challenges in identifying and diagnosing video streaming issues and we propose novel approaches in order to address them. More specifically, the thesis initially presents methods and tools to identify a wide array of QoE problems and the severity with which they affect the users' experience. The next part of the thesis deals with the investigation of methods to locate under-performing parts of the network that lead to drop of the delivered quality of a service. In this context, we propose a data-driven methodology for detecting the under performing areas of cellular network with sub-optimal Quality of Service (QoS) and video QoE. Moreover, we develop and evaluate a multi-vantage point framework that is capable of diagnosing the underlying faults that cause the disruption of the user's experience. The last part of this work, further explores the detection of network performance anomalies and introduces a novel method for detecting such issues using contextual information. This approach provides higher accuracy when detecting network faults in the presence of high variation and can benefit providers to perform early detection of anomalies before they result in QoE issues.La distribución de vídeo online es un ecosistema de servicios y tecnologías, donde los proveedores de contenidos se encuentran en una carrera continua para satisfacer las demandas crecientes de los usuarios de más riqueza de contenido, velocidad de transmisión, funcionalidad y compatibilidad entre diferentes plataformas. Asimismo, los operadores de red deben asegurar que los contenidos demandados son entregados a través de la red con una calidad satisfactoria según los acuerdos existentes de nivel de servicio (en inglés Service Level Agreement o SLA). Sin embargo, la monitorización y el mantenimiento de un nivel satisfactorio de la calidad de experiencia (en inglés Quality of Experience o QoE) del vídeo online se ha convertido en un reto mayor que nunca para los operadores. Dada la creciente popularidad del consumo de contenido con dispositivos móviles y a través de redes inalámbricas, han aparecido nuevos puntos de fallo que se han añadido a la lista de problemas que pueden afectar a la calidad del vídeo transmitido. Adicionalmente, la adopción de sistemas de encriptación extremo a extremo, por parte de los servicios más importantes de distribución de vídeo online, ha dejado obsoletos los métodos existentes de diagnóstico de la QoE. En esta tesis se identifican los retos actuales en la identificación y diagnóstico de los problemas de transmisión de vídeo online, y se proponen nuevas soluciones para abordar estos problemas. Más concretamente, inicialmente la tesis presenta métodos y herramientas para identificar un conjunto amplio de problemas de QoE y la severidad con los que estos afectan a la experiencia de los usuarios. La siguiente parte de la tesis investiga métodos para localizar partes de la red con un rendimiento bajo que resultan en una disminución de la calidad del servicio ofrecido. En este contexto, se propone una metodología basada en el análisis de datos para detectar áreas de la red móvil que ofrecen un nivel subóptimo de calidad de servicio (en inglés Quality of Service o QoS) y QoE. Además, se desarrolla y se evalúa una solución basada en múltiples puntos de medida que es capaz de diagnosticar los problemas subyacentes que causan la alteración de la experiencia de usuario. La última parte de este trabajo explora adicionalmente la detección de anomalías de rendimiento de la red y presenta un nuevo método para detectar estas situaciones utilizando información contextual. Este enfoque proporciona una mayor precisión en la detección de fallos de la red en presencia de alta variabilidad y puede ayudar a los proveedores a la detección precoz de anomalías antes de que se conviertan en problemas de QoE.La distribució de vídeo online és un ecosistema de serveis i tecnologies, on els proveïdors de continguts es troben en una cursa continua per satisfer les demandes creixents del usuaris de més riquesa de contingut, velocitat de transmissió, funcionalitat i compatibilitat entre diferents plataformes. A la vegada, els operadors de xarxa han d’assegurar que els continguts demandats són entregats a través de la xarxa amb una qualitat satisfactòria segons els acords existents de nivell de servei (en anglès Service Level Agreement o SLA). Tanmateix, el monitoratge i el manteniment d’un nivell satisfactori de la qualitat d’experiència (en anglès Quality of Experience o QoE) del vídeo online ha esdevingut un repte més gran que mai per als operadors. Donada la creixent popularitat del consum de contingut amb dispositius mòbils i a través de xarxes sense fils, han aparegut nous punts de fallada que s’han afegit a la llista de problemes que poden afectar a la qualitat del vídeo transmès. Addicionalment, l’adopció de sistemes d’encriptació extrem a extrem, per part dels serveis més importants de distribució de vídeo online, ha deixat obsolets els mètodes existents de diagnòstic de la QoE. En aquesta tesi s’identifiquen els reptes actuals en la identificació i diagnòstic dels problemes de transmissió de vídeo online, i es proposen noves solucions per abordar aquests problemes. Més concretament, inicialment la tesi presenta mètodes i eines per identificar un conjunt ampli de problemes de QoE i la severitat amb la que aquests afecten a la experiència dels usuaris. La següent part de la tesi investiga mètodes per localitzar parts de la xarxa amb un rendiment baix que resulten en una disminució de la qualitat del servei ofert. En aquest context es proposa una metodologia basada en l’anàlisi de dades per detectar àrees de la xarxa mòbil que ofereixen un nivell subòptim de qualitat de servei (en anglès Quality of Service o QoS) i QoE. A més, es desenvolupa i s’avalua una solució basada en múltiples punts de mesura que és capaç de diagnosticar els problemes subjacents que causen l’alteració de l’experiència d’usuari. L’última part d’aquest treball explora addicionalment la detecció d’anomalies de rendiment de la xarxa i presenta un nou mètode per detectar aquestes situacions utilitzant informació contextual. Aquest enfoc proporciona una major precisió en la detecció de fallades de la xarxa en presencia d’alta variabilitat i pot ajudar als proveïdors a la detecció precoç d’anomalies abans de que es converteixin en problemes de QoE.Postprint (published version

    Implementation of a wireless monitoring system for a centrifugal pump

    Get PDF
    L'objectiu d'aquest informe és seleccionar, configurar i determinar els punts d'instal·lació d'un sistema d'adquisició de vibracions totalment sense fil per tal de monitoritzar en línia una bomba centrífuga situada en una estació d'Aigües de Barcelona. El monitoratge sense fil d’una bomba centrífuga implica una minimització de costos a l’hora de desenvolupar la instal·lació i major flexibilitat en la recolecta de dades, reduint el risc de danys i d’interferències electromagnètiques. Es proporcionen explicacions detallades del camp de les vibracions aplicat a les bombes centrífugues, inclòs l'equip necessari per al seu correcte seguiment. La recerca per a la selecció d'instrumentació, el procediment de configuració i l'elecció dels punts d'instal·lació són els objectius principals d'aquest document. A més, s'ha realitzat una anàlisi comparativa entre diferents terminals d'Aigües de Barcelona amb la finalitat de facilitar el futur diagnòstic i manteniment de la bomba centrífuga sotmesa als objectius del projecte.El presente informe tiene como objetivo seleccionar, configurar y determinar los puntos de instalación de un sistema de adquisición de vibraciones totalmente inalámbrico para la monitorización online de una bomba centrífuga ubicada en una estación de Aigües de Barcelona. El monitoreo inalámbrico de una bomba centrífuga implica una minimización de costes a la hora de llevar a cabo la instalación y mayor flexibilidad en la recolecta de datos, reduciendo el riesgo de daños y de interferencias electromagnéticas. Se proporcionan explicaciones detalladas del campo de las vibraciones aplicado a bombas centrífugas, incluyendo el equipo necesario para su correcto monitoreo. La investigación para la selección de la instrumentación, el procedimiento de configuración y la elección de los puntos de instalación son los principales objetivos de este documento. Además, se ha realizado un análisis comparativo entre diferentes terminales de Aigües de Barcelona con el fin de facilitar el diagnóstico y mantenimiento futuro de la bomba centrífuga objeto de los objetivos del proyecto.This report objective is to select, configure and determine the installation points of a fully wireless vibration acquisition system in order to online monitor a centrifugal pump located in an Aigües de Barcelona station. Wirelessly monitoring a centrifugal pump implies the cost minimization of the installation procedures and more flexibility in the data collection, reducing damage risks and electromagnetic interferences. Detailed explanations of the vibratory background in centrifugal pumps are provided, including the necessary equipment for their proper monitoring. The research for the instrumentation selection, the configuration procedure, and the installation points choice are the main purposes of this document. Furthermore, a comparative analysis between different Aigües de Barcelona terminals has been carried out with the purpose of facilitating the future diagnosis and maintenance of the centrifugal pump subjected to the project objectives

    Predictive Maintenance Support System in Industry 4.0 Scenario

    Get PDF
    The fourth industrial revolution that is being witnessed nowadays, also known as Industry 4.0, is heavily related to the digitization of manufacturing systems and the integration of different technologies to optimize manufacturing. By combining data acquisition using specific sensors and machine learning algorithms to analyze this data and predict a failure before it happens, Predictive Maintenance is a critical tool to implement towards reducing downtime due to unpredicted stoppages caused by malfunctions. Based on the reality of Commercial Specialty Tires factory at Continental Mabor - Indústria de Pneus, S.A., the present work describes several problems faced regarding equipment maintenance. Taking advantage of the information gathered from studying the processes incorporated in the factory, it is designed a solution model for applying predictive maintenance in these processes. The model is divided into two primary layers, hardware, and software. Concerning hardware, sensors and respective applications are delineated. In terms of software, techniques of data analysis namely machine learning algorithms are described so that the collected data is studied to detect possible failures

    Passive available bandwidth: Applying self -induced congestion analysis of application-generated traffic

    Get PDF
    Monitoring end-to-end available bandwidth is critical in helping applications and users efficiently use network resources. Because the performance of distributed systems is intrinsically linked to the performance of the network, applications that have knowledge of the available bandwidth can adapt to changing network conditions and optimize their performance. A well-designed available bandwidth tool should be easily deployable and non-intrusive. While several tools have been created to actively measure the end-to-end available bandwidth of a network path, they require instrumentation at both ends of the path, and the traffic injected by these tools may affect the performance of other applications on the path.;We propose a new passive monitoring system that accurately measures available bandwidth by applying self-induced congestion analysis to traces of application-generated traffic. The Watching Resources from the Edge of the Network (Wren) system transparently provides available bandwidth information to applications without having to modify the applications to make the measurements and with negligible impact on the performance of applications. Wren produces a series of real-time available bandwidth measurements that can be used by applications to adapt their runtime behavior to optimize performance or that can be sent to a central monitoring system for use by other or future applications.;Most active bandwidth tools rely on adjustments to the sending rate of packets to infer the available bandwidth. The major obstacle with using passive kernel-level traces of TCP traffic is that we have no control over the traffic pattern. We demonstrate that there is enough natural variability in the sending rates of TCP traffic that techniques used by active tools can be applied to traces of application-generated traffic to yield accurate available bandwidth measurements.;Wren uses kernel-level instrumentation to collect traces of application traffic and analyzes the traces in the user-level to achieve the necessary accuracy and avoid intrusiveness. We introduce new passive bandwidth algorithms based on the principles of the active tools to measure available bandwidth, investigate the effectiveness of these new algorithms, implement a real-time system capable of efficiently monitoring available bandwidth, and demonstrate that applications can use Wren measurements to adapt their runtime decisions

    Exploring the role of system operation modes in failure analysis in the context of first generation cyber-physical systems

    Get PDF
    Typically, emerging system failures have a strong impact on the performance of industrial systems as well as on the efficiency of their operational and servicing processes. Being aware of these, maintenance and repair researchers have developed multiple failure detection and diagnosis techniques that allow early recognition of system or component failures and maintaining continuous system operation in a cost-effective way. However, these techniques have many deficiencies in the case of self-tuning first generation cyber-physical systems (1G-CPSs). The reason is that these systems compensate for the effects of emerging system failures until their resources are exhausted, and the compensatory actions not only mask the failures, but also make their recognition difficult. Late recognition of failures is however in contrast with the principles of preventive maintenance. Therefore, the promotion research concentrated on the issue of recognizing and forecasting failures under dynamic and adaptive behavior of 1G-CPSs. CPSs are enabled to compensate for failure symptoms by changing their system operation modes (SOMs). It was also observed that transitions of SOMs reduce the reliability of a signal-based failure diagnosis. It was hypothesized that the frequency and the duration of the changes of the operational states of the 1G-CPS may be strong indicators of the failure emergence phenomenon and that investigation of SOMs facilitates early detection of failures. Therefore, the completed exploratory studies were aimed at exploring how the frequency and duration of transitions of SOMs can be brought into correlation with specific types of failures, and how they can be computed as measures of failure occurrence. The obtained results revealed that system failures tend to induce unusual system operation modes that can be used as basis for failure characterization, and even for failure forecasting. The empirical research made use of a cyber-physical greenhouse testbed to get experimental data and was completed by the development of computational model. A failure injection strategy was implemented in order to induce failure occurrence in a controlled manner. The proposed approach can be applied as a basis of forecasting system failures of 1G-CPSs, but additional research seems to be necessary

    Development of a Culturally Adaptable Educational Program on Iron-Deficiency Anemia for use in Resource-Limited Communities.

    Get PDF
    Iron deficiency anemia (IDA) is a significant global health issue that disproportionally impacts individuals living in resource-limited countries. Access to professional healthcare in these regions is limited and rural communities often rely on specific community members for their basic health care needs. These community members, commonly referred to as community health promotors (CHPs), are seen as knowledgeable in informal or traditional healing practices and are often sought out when illnesses are present. This is a dynamic that could be cultivated and promoted to prevent and treat early stages of IDA. With culturally tailored support and education, advancing the healthcare knowledge of local CHPs might result in an increased awareness of IDA with the ultimate goal of empowering them to educate community members about prevention and early intervention. This scholarly project developed a culturally malleable, simple, low-cost educational program about IDA prevention and early intervention designed for CHPs in rural, resource-limited regions. The educational program was based on evidence from an extensive literature review and informed by a panel of experts using the Delphi technique. The evidence-based practice model by Rosswurm and Larrabee (1999) underpinned the project as a whole
    • …
    corecore