2,310 research outputs found

    Workflow Provenance: from Modeling to Reporting

    Get PDF
    Workflow provenance is a crucial part of a workflow system as it enables data lineage analysis, error tracking, workflow monitoring, usage pattern discovery, and so on. Integrating provenance into a workflow system or modifying a workflow system to capture or analyze different provenance information is burdensome, requiring extensive development because provenance mechanisms rely heavily on the modelling, architecture, and design of the workflow system. Various tools and technologies exist for logging events in a software system. Unfortunately, logging tools and technologies are not designed for capturing and analyzing provenance information. Workflow provenance is not only about logging, but also about retrieving workflow related information from logs. In this work, we propose a taxonomy of provenance questions and guided by these questions, we created a workflow programming model 'ProvMod' with a supporting run-time library to provide automated provenance and log analysis for any workflow system. The design and provenance mechanism of ProvMod is based on recommendations from prominent research and is easy to integrate into any workflow system. ProvMod offers Neo4j graph database support to manage semi-structured heterogeneous JSON logs. The log structure is adaptable to any NoSQL technology. For each provenance question in our taxonomy, ProvMod provides the answer with data visualization using Neo4j and the ELK Stack. Besides analyzing performance from various angles, we demonstrate the ease of integration by integrating ProvMod with Apache Taverna and evaluate ProvMod usability by engaging users. Finally, we present two Software Engineering research cases (clone detection and architecture extraction) where our proposed model ProvMod and provenance questions taxonomy can be applied to discover meaningful insights

    Develop a generic Rules Engine to quality control a CV database

    Get PDF
    This bachelor’s thesis presents a software solution to enhance Bouvet’s quality control process for employee CVs. By implementing a generic rule engine with extended functionalities, we identified that 90% of the CVs at Bouvet did not meet the company’s business standards. Using Scrum with Extreme Programming as our project management system, we developed a scalable and maintainable pilot, employing Microservices, Event-Driven, and Command and Query Responsibility Segregation architecture. Our pilot allows for future modifications using create, read, update and delete operations. The software solution presented in this thesis can be extended to a production-ready state by implementing an Role-based access control and an API-Gateway. When the event bus project by another group at Bouvet is completed, our implementation will be able to notify employees about their CVs’ status, further improving the quality control process. Overall, our results demonstrate the our software solution and project management system in enhancing the quality control of employee CVs at Bouvet.This bachelor’s thesis presents a software solution to enhance Bouvet’s quality control process for employee CVs. By implementing a generic rule engine with extended functionalities, we identified that 90% of the CVs at Bouvet did not meet the company’s business standards. Using Scrum with Extreme Programming as our project management system, we developed a scalable and maintainable pilot, employing Microservices, Event-Driven, and Command and Query Responsibility Segregation architecture. Our pilot allows for future modifications using create, read, update and delete operations. The software solution presented in this thesis can be extended to a production-ready state by implementing an Role-based access control and an API-Gateway. When the event bus project by another group at Bouvet is completed, our implementation will be able to notify employees about their CVs’ status, further improving the quality control process. Overall, our results demonstrate the our software solution and project management system in enhancing the quality control of employee CVs at Bouvet

    Develop a generic Rules Engine to quality control a CV database

    Get PDF
    This bachelor’s thesis presents a software solution to enhance Bouvet’s quality control process for employee CVs. By implementing a generic rule engine with extended functionalities, we identified that 90% of the CVs at Bouvet did not meet the company’s business standards. Using Scrum with Extreme Programming as our project management system, we developed a scalable and maintainable pilot, employing Microservices, Event-Driven, and Command and Query Responsibility Segregation architecture. Our pilot allows for future modifications using create, read, update and delete operations. The software solution presented in this thesis can be extended to a production-ready state by implementing an Role-based access control and an API-Gateway. When the event bus project by another group at Bouvet is completed, our implementation will be able to notify employees about their CVs’ status, further improving the quality control process. Overall, our results demonstrate the our software solution and project management system in enhancing the quality control of employee CVs at Bouvet.This bachelor’s thesis presents a software solution to enhance Bouvet’s quality control process for employee CVs. By implementing a generic rule engine with extended functionalities, we identified that 90% of the CVs at Bouvet did not meet the company’s business standards. Using Scrum with Extreme Programming as our project management system, we developed a scalable and maintainable pilot, employing Microservices, Event-Driven, and Command and Query Responsibility Segregation architecture. Our pilot allows for future modifications using create, read, update and delete operations. The software solution presented in this thesis can be extended to a production-ready state by implementing an Role-based access control and an API-Gateway. When the event bus project by another group at Bouvet is completed, our implementation will be able to notify employees about their CVs’ status, further improving the quality control process. Overall, our results demonstrate the our software solution and project management system in enhancing the quality control of employee CVs at Bouvet

    Process Mining Concepts for Discovering User Behavioral Patterns in Instrumented Software

    Get PDF
    Process Mining is a technique for discovering “in-use” processes from traces emitted to event logs. Researchers have recently explored applying this technique to documenting processes discovered in software applications. However, the requirements for emitting events to support Process Mining against software applications have not been well documented. Furthermore, the linking of end-user intentional behavior to software quality as demonstrated in the discovered processes has not been well articulated. After evaluating the literature, this thesis suggested focusing on user goals and actual, in-use processes as an input to an Agile software development life cycle in order to improve software quality. It also provided suggestions for instrumenting software applications to support Process Mining techniques

    Online Analysis of Dynamic Streaming Data

    Get PDF
    Die Arbeit zum Thema "Online Analysis of Dynamic Streaming Data" beschäftigt sich mit der Distanzmessung dynamischer, semistrukturierter Daten in kontinuierlichen Datenströmen um Analysen auf diesen Datenstrukturen bereits zur Laufzeit zu ermöglichen. Hierzu wird eine Formalisierung zur Distanzberechnung für statische und dynamische Bäume eingeführt und durch eine explizite Betrachtung der Dynamik von Attributen einzelner Knoten der Bäume ergänzt. Die Echtzeitanalyse basierend auf der Distanzmessung wird durch ein dichte-basiertes Clustering ergänzt, um eine Anwendung des Clustering, einer Klassifikation, aber auch einer Anomalieerkennung zu demonstrieren. Die Ergebnisse dieser Arbeit basieren auf einer theoretischen Analyse der eingeführten Formalisierung von Distanzmessungen für dynamische Bäume. Diese Analysen werden unterlegt mit empirischen Messungen auf Basis von Monitoring-Daten von Batchjobs aus dem Batchsystem des GridKa Daten- und Rechenzentrums. Die Evaluation der vorgeschlagenen Formalisierung sowie der darauf aufbauenden Echtzeitanalysemethoden zeigen die Effizienz und Skalierbarkeit des Verfahrens. Zudem wird gezeigt, dass die Betrachtung von Attributen und Attribut-Statistiken von besonderer Bedeutung für die Qualität der Ergebnisse von Analysen dynamischer, semistrukturierter Daten ist. Außerdem zeigt die Evaluation, dass die Qualität der Ergebnisse durch eine unabhängige Kombination mehrerer Distanzen weiter verbessert werden kann. Insbesondere wird durch die Ergebnisse dieser Arbeit die Analyse sich über die Zeit verändernder Daten ermöglicht

    Coordinated Caching for High Performance Calibration using Z -> µµ Events of the CMS Experiment

    Get PDF
    Calibration of the detectors is a prerequisite for almost all physics analyses conducted as part of the LHC experiment. As such, both speed and precision are critical. As part of this thesis, a high performance analysis infrastructure using coordinated caching has been developed. This has been used to conduct the first calibration of jets using Z -> µµ events recorded during the second LHC run at the CMS experiment

    Node-Oriented Workflow (NOW): A Command Template Workflow Management Tool for High Throughput Data Analysis Pipelines

    Get PDF
    Next generation sequencing (NGS) systems produce vast quantities of data that require substantial computational resources for typical analysis tasks. In addition, data that are generated by different NGS systems are not homogeneous. Moreover, there are an overwhelming number of tools available for performing typical tasks. Managing NGS workflows involves writing custom scripts that quickly grow in complexity, often resulting in unwieldy workflows that underutilize typical high performance compute resources, and increase the demands of the staff managing these workflows. We present Node-Oriented Workflow (NOW), a dynamic command template workflow engine for high performance distributed computing (HPC) systems. Our system provides a simple-to-use browser-based front end for designing and managing complex workflows. Workflows are configured using a simple browser interface, and are managed by the integrated job engine, which initializes nodes, monitors node status, and processes results of individual jobs across nodes in an HPC configuration. We reduce excessive messaging across nodes by placing the burden on nodes to start tasks in a workflow when dependencies are met, i.e., node oriented workflow. Our system was designed for NGS processing in the clinical research setting, emphasizing user simplicity, tool scalability, minimization of redundancy in workflows, while maximizing throughput in an HPC environment. Furthermore, NOW is not restricted to NGS pipeline management, but can used to manage any computational pipeline
    • …
    corecore