6,749 research outputs found

    Interactive visualization of cross-layer performance anomalies in dynamic task-parallel applications and systems

    Get PDF
    International audienceThis paper studies the interactive visualization and post-mortem analysis of execution traces generated by task-parallel programs. We focus on the detection of performance anomalies inaccessible to state-of-the-art performance analysis techniques, including anomalies deriving from the interaction of multiple levels of software abstractions, anomalies associated with the hardware, and anomalies resulting from interferences between optimizations in the application and run-time system. Building on our practical experience with the performance debugging of representative task-parallel applications and run-time systems for dynamic dependent task graphs, we designed a new tool called Aftermath. This tool enables the visualization of intricate anomalies involving multiple layers and components in the system. It also supports filtering, aggregation and joint visualization of key metrics and performance indicators, such as task duration, run-time state, hardware performance counters and data transfers. The tool also relates this information to the machine's topology. While not specifically designed for non-uniform memory access (NUMA) architectures, Aftermath takes advantage of the explicit memory regions and dependence information in dependent task models to precisely capture long-distance and inter-core effects. Aftermath supports traces of up to several gigabytes, with fast and intuitive navigation and the on-line configuration of new derived metrics. As it has proven invaluable to optimize both run-time environments and applications, we illustrate Aftermath on genuine cases encountered in the OpenStream project

    TaskInsight: Understanding Task Schedules Effects on Memory and Performance

    Get PDF
    Recent scheduling heuristics for task-based applications have managed to improve their by taking into account memory-related properties such as data locality and cache sharing. However, there is still a general lack of tools that can provide insights into why, and where, different schedulers improve memory behavior, and how this is related to the applications' performance. To address this, we present TaskInsight, a technique to characterize the memory behavior of different task schedulers through the analysis of data reuse between tasks. TaskInsight provides high-level, quantitative information that can be correlated with tasks' performance variation over time to understand data reuse through the caches due to scheduling choices. TaskInsight is useful to diagnose and identify which scheduling decisions affected performance, when were they taken, and why the performance changed, both in single and multi-threaded executions. We demonstrate how TaskInsight can diagnose examples where poor scheduling caused over 10% difference in performance for tasks of the same type, due to changes in the tasks' data reuse through the private and shared caches, in single and multi-threaded executions of the same application. This flexible insight is key for optimization in many contexts, including data locality, throughput, memory footprint or even energy efficiency.We thank the reviewers for their feedback. This work was supported by the Swedish Research Council, the Swedish Foundation for Strategic Research project FFL12-0051 and carried out within the Linnaeus Centre of Excellence UPMARC, Uppsala Programming for Multicore Architectures Research Center. This paper was also published with the support of the HiPEAC network that received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 687698.Peer ReviewedPostprint (published version

    Visualizing the Motion Flow of Crowds

    Get PDF
    In modern cities, massive population causes problems, like congestion, accident, violence and crime everywhere. Video surveillance system such as closed-circuit television cameras is widely used by security guards to monitor human behaviors and activities to manage, direct, or protect people. With the quantity and prolonged duration of the recorded videos, it requires a huge amount of human resources to examine these video recordings and keep track of activities and events. In recent years, new techniques in computer vision field reduce the barrier of entry, allowing developers to experiment more with intelligent surveillance video system. Different from previous research, this dissertation does not address any algorithm design concerns related to object detection or object tracking. This study will put efforts on the technological side and executing methodologies in data visualization to find the model of detecting anomalies. It would like to provide an understanding of how to detect the behavior of the pedestrians in the video and find out anomalies or abnormal cases by using techniques of data visualization

    Language-Centric Performance Analysis of OpenMP Programs with Aftermath

    Get PDF
    International audienceWe present a new set of tools for the language-centric performance analysis and debugging of OpenMP programs that allows programmers to relate dynamic information from parallel execution to OpenMP constructs. Users can visualize execution traces, examine aggregate met-rics on parallel loops and tasks, such as load imbalance or synchronization overhead, and obtain detailed information on specific events, such as the partitioning of a loop's iteration space, its distribution to workers according to the scheduling policy and fine-grain synchronization. Our work is based on the Aftermath performance analysis tool and a ready-to-use, instrumented version of the LLVM/clang OpenMP run-time with negligible overhead for tracing. By analyzing the performance of the MG application of the NPB suite, we show that language-centric performance analysis in general and our tools in particular can help improve the performance of large-scale OpenMP applications significantly

    Making intelligent systems team players: Case studies and design issues. Volume 1: Human-computer interaction design

    Get PDF
    Initial results are reported from a multi-year, interdisciplinary effort to provide guidance and assistance for designers of intelligent systems and their user interfaces. The objective is to achieve more effective human-computer interaction (HCI) for systems with real time fault management capabilities. Intelligent fault management systems within the NASA were evaluated for insight into the design of systems with complex HCI. Preliminary results include: (1) a description of real time fault management in aerospace domains; (2) recommendations and examples for improving intelligent systems design and user interface design; (3) identification of issues requiring further research; and (4) recommendations for a development methodology integrating HCI design into intelligent system design

    Role based behavior analysis

    Get PDF
    Tese de mestrado, Segurança Informática, Universidade de Lisboa, Faculdade de Ciências, 2009Nos nossos dias, o sucesso de uma empresa depende da sua agilidade e capacidade de se adaptar a condições que se alteram rapidamente. Dois requisitos para esse sucesso são trabalhadores proactivos e uma infra-estrutura ágil de Tecnologias de Informacão/Sistemas de Informação (TI/SI) que os consiga suportar. No entanto, isto nem sempre sucede. Os requisitos dos utilizadores ao nível da rede podem nao ser completamente conhecidos, o que causa atrasos nas mudanças de local e reorganizações. Além disso, se não houver um conhecimento preciso dos requisitos, a infraestrutura de TI/SI poderá ser utilizada de forma ineficiente, com excessos em algumas áreas e deficiências noutras. Finalmente, incentivar a proactividade não implica acesso completo e sem restrições, uma vez que pode deixar os sistemas vulneráveis a ameaças externas e internas. O objectivo do trabalho descrito nesta tese é desenvolver um sistema que consiga caracterizar o comportamento dos utilizadores do ponto de vista da rede. Propomos uma arquitectura de sistema modular para extrair informação de fluxos de rede etiquetados. O processo é iniciado com a criação de perfis de utilizador a partir da sua informação de fluxos de rede. Depois, perfis com características semelhantes são agrupados automaticamente, originando perfis de grupo. Finalmente, os perfis individuais são comprados com os perfis de grupo, e os que diferem significativamente são marcados como anomalias para análise detalhada posterior. Considerando esta arquitectura, propomos um modelo para descrever o comportamento de rede dos utilizadores e dos grupos. Propomos ainda métodos de visualização que permitem inspeccionar rapidamente toda a informação contida no modelo. O sistema e modelo foram avaliados utilizando um conjunto de dados reais obtidos de um operador de telecomunicações. Os resultados confirmam que os grupos projectam com precisão comportamento semelhante. Além disso, as anomalias foram as esperadas, considerando a população subjacente. Com a informação que este sistema consegue extrair dos dados em bruto, as necessidades de rede dos utilizadores podem sem supridas mais eficazmente, os utilizadores suspeitos são assinalados para posterior análise, conferindo uma vantagem competitiva a qualquer empresa que use este sistema.In our days, the success of a corporation hinges on its agility and ability to adapt to fast changing conditions. Proactive workers and an agile IT/IS infrastructure that can support them is a requirement for this success. Unfortunately, this is not always the case. The user’s network requirements may not be fully understood, which slows down relocation and reorganization. Also, if there is no grasp on the real requirements, the IT/IS infrastructure may not be efficiently used, with waste in some areas and deficiencies in others. Finally, enabling proactivity does not mean full unrestricted access, since this may leave the systems vulnerable to outsider and insider threats. The purpose of the work described on this thesis is to develop a system that can characterize user network behavior. We propose a modular system architecture to extract information from tagged network flows. The system process begins by creating user profiles from their network flows’ information. Then, similar profiles are automatically grouped into clusters, creating role profiles. Finally, the individual profiles are compared against the roles, and the ones that differ significantly are flagged as anomalies for further inspection. Considering this architecture, we propose a model to describe user and role network behavior. We also propose visualization methods to quickly inspect all the information contained in the model. The system and model were evaluated using a real dataset from a large telecommunications operator. The results confirm that the roles accurately map similar behavior. The anomaly results were also expected, considering the underlying population. With the knowledge that the system can extract from the raw data, the users network needs can be better fulfilled, the anomalous users flagged for inspection, giving an edge in agility for any company that uses it

    AIOps for a Cloud Object Storage Service

    Full text link
    With the growing reliance on the ubiquitous availability of IT systems and services, these systems become more global, scaled, and complex to operate. To maintain business viability, IT service providers must put in place reliable and cost efficient operations support. Artificial Intelligence for IT Operations (AIOps) is a promising technology for alleviating operational complexity of IT systems and services. AIOps platforms utilize big data, machine learning and other advanced analytics technologies to enhance IT operations with proactive actionable dynamic insight. In this paper we share our experience applying the AIOps approach to a production cloud object storage service to get actionable insights into system's behavior and health. We describe a real-life production cloud scale service and its operational data, present the AIOps platform we have created, and show how it has helped us resolving operational pain points.Comment: 5 page
    • …
    corecore