190 research outputs found

    Workflow Behavior Auditing for Mission Centric Collaboration

    Get PDF
    Successful mission-centric collaboration depends on situational awareness in an increasingly complex mission environment. To support timely and reliable high level mission decisions, auditing tools need real-time data for effective assessment and optimization of mission behaviors. In the context of a battle rhythm, mission health can be measured from workflow generated activities. Though battle rhythm collaboration is dynamic and global, a potential enabling technology for workflow behavior auditing exists in process mining. However, process mining is not adequate to provide mission situational awareness in the battle rhythm environment since event logs may contain dynamic mission states, noise and timestamp inaccuracy. Therefore, we address a few key near-term issues. In sequences of activities parsed from network traffic streams, we identify mission state changes in the workflow shift detection algorithm. In segments of unstructured event logs that contain both noise and relevant workflow data, we extract and rank workflow instances for the process analyst. When confronted with timestamp inaccuracy in event logs from semi automated, distributed workflows, we develop the flower chain network and discovery algorithm to improve behavioral conformance. For long term adoption of process mining in mission centric collaboration, we develop and demonstrate an experimental framework for logging uncertainty testing. We show that it is highly feasible to employ process mining techniques in environments with dynamic mission states and logging uncertainty. Future workflow behavior auditing technology will benefit from continued algorithmic development, new data sources and system prototypes to propel next generation mission situational awareness, giving commanders new tools to assess and optimize workflows, computer systems and missions in the battle space environment

    Self-learning Anomaly Detection in Industrial Production

    Get PDF

    LogBERT: Log Anomaly Detection via BERT

    Get PDF
    When systems break down, administrators usually check the produced logs to diagnose the failures. Nowadays, systems grow larger and more complicated. It is labor-intensive to manually detect abnormal behaviors in logs. Therefore, it is necessary to develop an automated anomaly detection on system logs. Automated anomaly detection not only identifies malicious patterns promptly but also requires no prior domain knowledge. Many existing log anomaly detection approaches apply natural language models such as Recurrent Neural Network (RNN) to log analysis since both are based on sequential data. The proposed model, LogBERT, a BERT-based neural network, can capture the contextual information in log sequences. LogBERT is trained on normal log data considering the scarcity of labeled abnormal data in reality. Intuitively, LogBERT learns normal patterns in training data and flags test data that are deviated from prediction as anomalies. We compare LogBERT with four traditional machine learning models and two deep learning models in terms of precision, recall, and F1 score on three public datasets, HDFS, BGL, and Thunderbird. Overall, LogBERT outperforms the state-of-art models for log anomaly detection

    Feature-Trace : an approach to generate operational profile and to support regression testing from BDD features

    Get PDF
    Dissertação (mestrado)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2020.O conceito de Perfil Operacional fornece informações quantitativas sobre como o software será usado, o que permite destacar os componentes de software mais sensíveis à confiabilidade com base no perfil do uso do software. Entretanto, a geração de Perfis Operacionais geralmente requer um esforço considerável da equipe para conectar a especificação de requisitos com as unidades de código que as compõem. Nesse sentido, torna-se primordial, no ciclo de vida do software, a capacidade de executar de maneira fácil ou eficiente a rastreabilidade do requisito ao código, adotando o processo de teste como um meio de garantir que os requisitos sejam atendidos e abordados de maneira satisfatória. Neste trabalho, propomos a abordagem Feature- Trace, que mescla as vantagens do Perfil Operacional e os benefícios da rastreabilidade de requisitos para código presente na abordagem BDD (Behavior-Driven Development). O objetivo principal do nosso trabalho é usar a abordagem BDD como fonte de informação para a geração semi-automatizada do Perfil Operacional, mas também visa extrair várias outras métricas relacionadas ao processo de priorização e seleção de casos de teste, como o Métricas de Program Spectrum e complexidade de código. A abordagem proposta foi avaliada no software Diáspora, um software de código aberto, disponível no GitHub, que contém 68 features BDD, especificadas em 267 cenários e ≈ 72 KLOC e mais de 2.900 forks no Github. O estudo de caso revelou que a abordagem Feature-Trace é capaz de extrair perfeitamente o Perfil Operacional a partir das especificações do BDD do diáspora, além de obter e apresentar informações vitais para orientar o processo de teste de regressão. A abordagem também foi avaliada com base no feedback de 18 desenvolvedores que tiveram acesso à abordagem e ferramenta proposta neste trabalho - evidenciando a utilidade do Feature-Trace para atividades de “Priorização e seleção de casos de teste”, “Avaliação da qualidade de casos de teste" e "Manutenção e evolução de software".Operational Profiles provide quantitative information about how the software will be used, which supports highlighting those software components more sensitive to reliability based on their profile usage. However, the generation of Operational Profiles usually requires a considerable team effort to liaise requirements specification until their reification into ex- pected software artifacts. In this sense, it becomes paramount in the software life cycle the ability to seamlessly or efficiently perform traceability from requirement to code, embrac- ing the testing process as a means to ensure that the requirements are satisfiably covered and addressed. In this work, we propose the Feature-Trace approach which merges the advantages of the Operational Profile and the benefits of the requirements-to-code trace- ability present in the BDD (Behavior-Driven Development) approach. The primary goal of our work is to use the BDD approach as an information source for the semi-automated generation of the Operational Profile, but it also aims to extract several other metrics re- lated to the process of prioritizing and selecting test cases, such as the Program Spectrum and Code Complexity metrics. The proposed approach was evaluated on the Diaspora software, on a GitHub open source software, which contains 68 BDD features, specified in 267 scenarios and ≈ 72 KLOC and more than 2,900 forks and counting. The case study revealed that the Feature-Trace approach is capable of extracting the operational profile seamlessly from the specified Diaspora’s BDD features as well as obtaining and presenting vital information to guide the process of test cases prioritization. The approach was also assessed based on feedback from 18 developers who had access to the approach and tool proposed in this work — making evident the usefulness of the Feature-Trace for activities of “Prioritization and Selection of Test Cases”, “Evaluation of the quality of test cases” and “Maintenance and Software Evolution”

    Scalable and fault-tolerant data stream processing on multi-core architectures

    Get PDF
    With increasing data volumes and velocity, many applications are shifting from the classical “process-after-store” paradigm to a stream processing model: data is produced and consumed as continuous streams. Stream processing captures latency-sensitive applications as diverse as credit card fraud detection and high-frequency trading. These applications are expressed as queries of algebraic operations (e.g., aggregation) over the most recent data using windows, i.e., finite evolving views over the input streams. To guarantee correct results, streaming applications require precise window semantics (e.g., temporal ordering) for operations that maintain state. While high processing throughput and low latency are performance desiderata for stateful streaming applications, achieving both poses challenges. Computing the state of overlapping windows causes redundant aggregation operations: incremental execution (i.e., reusing previous results) reduces latency but prevents parallelization; at the same time, parallelizing window execution for stateful operations with precise semantics demands ordering guarantees and state access coordination. Finally, streams and state must be recovered to produce consistent and repeatable results in the event of failures. Given the rise of shared-memory multi-core CPU architectures and high-speed networking, we argue that it is possible to address these challenges in a single node without compromising window semantics, performance, or fault-tolerance. In this thesis, we analyze, design, and implement stream processing engines (SPEs) that achieve high performance on multi-core architectures. To this end, we introduce new approaches for in-memory processing that address the previous challenges: (i) for overlapping windows, we provide a family of window aggregation techniques that enable computation sharing based on the algebraic properties of aggregation functions; (ii) for parallel window execution, we balance parallelism and incremental execution by developing abstractions for both and combining them to a novel design; and (iii) for reliable single-node execution, we enable strong fault-tolerance guarantees without sacrificing performance by reducing the required disk I/O bandwidth using a novel persistence model. We combine the above to implement an SPE that processes hundreds of millions of tuples per second with sub-second latencies. These results reveal the opportunity to reduce resource and maintenance footprint by replacing cluster-based SPEs with single-node deployments.Open Acces

    Logram: Efficient Log Parsing Using n-Gram Dictionaries

    Full text link
    Software systems usually record important runtime information in their logs. Logs help practitioners understand system runtime behaviors and diagnose field failures. As logs are usually very large in size, automated log analysis is needed to assist practitioners in their software operation and maintenance efforts. Typically, the first step of automated log analysis is log parsing, i.e., converting unstructured raw logs into structured data. However, log parsing is challenging, because logs are produced by static templates in the source code (i.e., logging statements) yet the templates are usually inaccessible when parsing logs. Prior work proposed automated log parsing approaches that have achieved high accuracy. However, as the volume of logs grows rapidly in the era of cloud computing, efficiency becomes a major concern in log parsing. In this work, we propose an automated log parsing approach, Logram, which leverages n-gram dictionaries to achieve efficient log parsing. We evaluated Logram on 16 public log datasets and compared Logram with five state-of-the-art log parsing approaches. We found that Logram achieves a similar parsing accuracy to the best existing approaches while outperforms these approaches in efficiency (i.e., 1.8 to 5.1 times faster than the second fastest approaches). Furthermore, we deployed Logram on Spark and we found that Logram scales out efficiently with the number of Spark nodes (e.g., with near-linear scalability) without sacrificing parsing accuracy. In addition, we demonstrated that Logram can support effective online parsing of logs, achieving similar parsing results and efficiency with the offline mode.Comment: 13 pages, IEEE journal forma

    Process Mining Handbook

    Get PDF
    This is an open access book. This book comprises all the single courses given as part of the First Summer School on Process Mining, PMSS 2022, which was held in Aachen, Germany, during July 4-8, 2022. This volume contains 17 chapters organized into the following topical sections: Introduction; process discovery; conformance checking; data preprocessing; process enhancement and monitoring; assorted process mining topics; industrial perspective and applications; and closing

    logram: efficient log paring using n-gram model

    Get PDF
    Software systems usually record important runtime information in their logs. Logs help practitioners understand system runtime behaviors and diagnose field failures. As logs are usually very large in size, automated log analysis is needed to assist practitioners in their software operation and maintenance efforts. Typically, the first step of automated log analysis is log parsing, i.e., converting unstructured raw logs into structured data. However, log parsing is challenging, because logs are produced by static templates in the source code (i.e., logging statements) yet the templates are usually inaccessible when parsing logs. Prior work proposed automated log parsing approaches that have achieved high accuracy. However, as the volume of logs grows rapidly in the era of cloud computing, efficiency becomes a major concern in log parsing. In this work, we propose an automated log parsing approach, Logram, which leverages n-gram dictionaries to achieve efficient log parsing. We evaluated Logram on 16 public log datasets and compared Logram with five state-of-the-art log parsing approaches. We found that Logram achieves a higher parsing accuracy than the best existing approaches (i.e., at least 10% higher, on average) and also outperforms these approaches in efficiency (i.e., 1.8 to 5.1 times faster than the second-fastest approaches in terms of end-to-end parsing time). Furthermore, we deployed Logram on Spark and we found that Logram scales out efficiently with the number of Spark nodes (e.g., with near- linear scalability for some logs) without sacrificing parsing accuracy. In addition, we demonstrated that Logram can support effective online parsing of logs, achieving similar parsing results and efficiency to the offline mode
    corecore