908 research outputs found

    Autonomic care platform for optimizing query performance

    Get PDF
    Background: As the amount of information in electronic health care systems increases, data operations get more complicated and time-consuming. Intensive Care platforms require a timely processing of data retrievals to guarantee the continuous display of recent data of patients. Physicians and nurses rely on this data for their decision making. Manual optimization of query executions has become difficult to handle due to the increased amount of queries across multiple sources. Hence, a more automated management is necessary to increase the performance of database queries. The autonomic computing paradigm promises an approach in which the system adapts itself and acts as self-managing entity, thereby limiting human interventions and taking actions. Despite the usage of autonomic control loops in network and software systems, this approach has not been applied so far for health information systems. Methods: We extend the COSARA architecture, an infection surveillance and antibiotic management service platform for the Intensive Care Unit (ICU), with self-managed components to increase the performance of data retrievals. We used real-life ICU COSARA queries to analyse slow performance and measure the impact of optimizations. Each day more than 2 million COSARA queries are executed. Three control loops, which monitor the executions and take action, have been proposed: reactive, deliberative and reflective control loops. We focus on improvements of the execution time of microbiology queries directly related to the visual displays of patients' data on the bedside screens. Results: The results show that autonomic control loops are beneficial for the optimizations in the data executions in the ICU. The application of reactive control loop results in a reduction of 8.61% of the average execution time of microbiology results. The combined application of the reactive and deliberative control loop results in an average query time reduction of 10.92% and the combination of reactive, deliberative and reflective control loops provides a reduction of 13.04%. Conclusions: We found that by controlled reduction of queries' executions the performance for the end-user can be improved. The implementation of autonomic control loops in an existing health platform, COSARA, has a positive effect on the timely data visualization for the physician and nurse

    Effective testing for concurrency bugs

    Get PDF
    In the current multi-core era, concurrency bugs are a serious threat to software reliability. As hardware becomes more parallel, concurrent programming will become increasingly pervasive. However, correct concurrent programming is known to be extremely challenging for developers and can easily lead to the introduction of concurrency bugs. This dissertation addresses this challenge by proposing novel techniques to help developers expose and detect concurrency bugs. We conducted a bug study to better understand the external and internal effects of real-world concurrency bugs. Our study revealed that a significant fraction of concurrency bugs qualify as semantic or latent bugs, which are two particularly challenging classes of concurrency bugs. Based on the insights from the study, we propose a concurrency bug detector, PIKE that analyzes the behavior of program executions to infer whether concurrency bugs have been triggered during a concurrent execution. In addition, we present the design of a testing tool, SKI, that allows developers to test operating system kernels for concurrency bugs in a practical manner. SKI bridges the gap between user-mode testing and kernel-mode testing by enabling the systematic exploration of the kernel thread interleaving space. Our evaluation shows that both PIKE and SKI are effective at finding concurrency bugs.Im gegenwärtigen Multicore-Zeitalter sind Fehler aufgrund von Nebenläufigkeit eine ernsthafte Bedrohung der Zuverlässigkeit von Software. Mit der wachsenden Parallelisierung von Hardware wird nebenläufiges Programmieren nach und nach allgegenwärtig. Diese Art von Programmieren ist jedoch als äußerst schwierig bekannt und kann leicht zu Programmierfehlern führen. Die vorliegende Dissertation nimmt sich dieser Herausforderung an indem sie neuartige Techniken vorschlägt, die Entwicklern beim Aufdecken von Nebenläufigkeitsfehlern helfen. Wir führen eine Studie von Fehlern durch, um die externen und internen Effekte von in der Praxis vorkommenden Nebenläufigkeitsfehlern besser zu verstehen. Diese ergibt, dass ein bedeutender Anteil von solchen Fehlern als semantisch bzw. latent zu charakterisieren ist -- zwei besonders herausfordernde Klassen von Nebenläufigkeitsfehlern. Basierend auf den Erkenntnissen der Studie entwickeln wir einen Detektor (PIKE), der Programmausführungen daraufhin analysiert, ob Nebenläufigkeitsfehler aufgetreten sind. Weiterhin präsentieren wir das Design eines Testtools (SKI), das es Entwicklern ermöglicht, Betriebssystemkerne praktikabel auf Nebenläufigkeitsfehler zu überprüfen. SKI füllt die Lücke zwischen Testen im Benutzermodus und Testen im Kernelmodus, indem es die systematische Erkundung der Kernel-Thread-Verschachtelungen erlaubt. Unsere Auswertung zeigt, dass sowohl PIKE als auch SKI effektiv Nebenläufigkeitsfehler finden

    Event-Oriented Dynamic Adaptation of Workflows: Model, Architecture and Implementation

    Get PDF
    Workflow management is widely accepted as a core technology to support long-term business processes in heterogeneous and distributed environments. However, conventional workflow management systems do not provide sufficient flexibility support to cope with the broad range of failure situations that may occur during workflow execution. In particular, most systems do not allow to dynamically adapt a workflow due to a failure situation, e.g., to dynamically drop or insert execution steps. As a contribution to overcome these limitations, this dissertation introduces the agent-based workflow management system AgentWork. AgentWork supports the definition, the execution and, as its main contribution, the event-oriented and semi-automated dynamic adaptation of workflows. Two strategies for automatic workflow adaptation are provided. Predictive adaptation adapts workflow parts affected by a failure in advance (predictively), typically as soon as the failure is detected. This is advantageous in many situations and gives enough time to meet organizational constraints for adapted workflow parts. Reactive adaptation is typically performed when predictive adaptation is not possible. In this case, adaptation is performed when the affected workflow part is to be executed, e.g., before an activity is executed it is checked whether it is subject to a workflow adaptation such as dropping, postponement or replacement. In particular, the following contributions are provided by AgentWork: A Formal Model for Workflow Definition, Execution, and Estimation: In this context, AgentWork first provides an object-oriented workflow definition language. This language allows for the definition of a workflow\u92s control and data flow. Furthermore, a workflow\u92s cooperation with other workflows or workflow systems can be specified. Second, AgentWork provides a precise workflow execution model. This is necessary, as a running workflow usually is a complex collection of concurrent activities and data flow processes, and as failure situations and dynamic adaptations affect running workflows. Furthermore, mechanisms for the estimation of a workflow\u92s future execution behavior are provided. These mechanisms are of particular importance for predictive adaptation. Mechanisms for Determining and Processing Failure Events and Failure Actions: AgentWork provides mechanisms to decide whether an event constitutes a failure situation and what has to be done to cope with this failure. This is formally achieved by evaluating event-condition-action rules where the event-condition part describes under which condition an event has to be viewed as a failure event. The action part represents the necessary actions needed to cope with the failure. To support the temporal dimension of events and actions, this dissertation provides a novel event-condition-action model based on a temporal object-oriented logic. Mechanisms for the Adaptation of Affected Workflows: In case of failure situations it has to be decided how an affected workflow has to be dynamically adapted on the node and edge level. AgentWork provides a novel approach that combines the two principal strategies reactive adaptation and predictive adaptation. Depending on the context of the failure, the appropriate strategy is selected. Furthermore, control flow adaptation operators are provided which translate failure actions into structural control flow adaptations. Data flow operators adapt the data flow after a control flow adaptation, if necessary. Mechanisms for the Handling of Inter-Workflow Implications of Failure Situations: AgentWork provides novel mechanisms to decide whether a failure situation occurring to a workflow affects other workflows that communicate and cooperate with this workflow. In particular, AgentWork derives the temporal implications of a dynamic adaptation by estimating the duration that will be needed to process the changed workflow definition (in comparison with the original definition). Furthermore, qualitative implications of the dynamic change are determined. For this purpose, so-called quality measuring objects are introduced. All mechanisms provided by AgentWork include that users may interact during the failure handling process. In particular, the user has the possibility to reject or modify suggested workflow adaptations. A Prototypical Implementation: Finally, a prototypical Corba-based implementation of AgentWork is described. This implementation supports the integration of AgentWork into the distributed and heterogeneous environments of real-world organizations such as hospitals or insurance business enterprises

    Computing at massive scale: Scalability and dependability challenges

    Get PDF
    Large-scale Cloud systems and big data analytics frameworks are now widely used for practical services and applications. However, with the increase of data volume, together with the heterogeneity of workloads and resources, and the dynamic nature of massive user requests, the uncertainties and complexity of resource management and service provisioning increase dramatically, often resulting in poor resource utilization, vulnerable system dependability, and user-perceived performance degradations. In this paper we report our latest understanding of the current and future challenges in this particular area, and discuss both existing and potential solutions to the problems, especially those concerned with system efficiency, scalability and dependability. We first introduce a data-driven analysis methodology for characterizing the resource and workload patterns and tracing performance bottlenecks in a massive-scale distributed computing environment. We then examine and analyze several fundamental challenges and the solutions we are developing to tackle them, including for example incremental but decentralized resource scheduling, incremental messaging communication, rapid system failover, and request handling parallelism. We integrate these solutions with our data analysis methodology in order to establish an engineering approach that facilitates the optimization, tuning and verification of massive-scale distributed systems. We aim to develop and offer innovative methods and mechanisms for future computing platforms that will provide strong support for new big data and IoE (Internet of Everything) applications

    A Machine Learning Enhanced Scheme for Intelligent Network Management

    Get PDF
    The versatile networking services bring about huge influence on daily living styles while the amount and diversity of services cause high complexity of network systems. The network scale and complexity grow with the increasing infrastructure apparatuses, networking function, networking slices, and underlying architecture evolution. The conventional way is manual administration to maintain the large and complex platform, which makes effective and insightful management troublesome. A feasible and promising scheme is to extract insightful information from largely produced network data. The goal of this thesis is to use learning-based algorithms inspired by machine learning communities to discover valuable knowledge from substantial network data, which directly promotes intelligent management and maintenance. In the thesis, the management and maintenance focus on two schemes: network anomalies detection and root causes localization; critical traffic resource control and optimization. Firstly, the abundant network data wrap up informative messages but its heterogeneity and perplexity make diagnosis challenging. For unstructured logs, abstract and formatted log templates are extracted to regulate log records. An in-depth analysis framework based on heterogeneous data is proposed in order to detect the occurrence of faults and anomalies. It employs representation learning methods to map unstructured data into numerical features, and fuses the extracted feature for network anomaly and fault detection. The representation learning makes use of word2vec-based embedding technologies for semantic expression. Next, the fault and anomaly detection solely unveils the occurrence of events while failing to figure out the root causes for useful administration so that the fault localization opens a gate to narrow down the source of systematic anomalies. The extracted features are formed as the anomaly degree coupled with an importance ranking method to highlight the locations of anomalies in network systems. Two types of ranking modes are instantiated by PageRank and operation errors for jointly highlighting latent issue of locations. Besides the fault and anomaly detection, network traffic engineering deals with network communication and computation resource to optimize data traffic transferring efficiency. Especially when network traffic are constrained with communication conditions, a pro-active path planning scheme is helpful for efficient traffic controlling actions. Then a learning-based traffic planning algorithm is proposed based on sequence-to-sequence model to discover hidden reasonable paths from abundant traffic history data over the Software Defined Network architecture. Finally, traffic engineering merely based on empirical data is likely to result in stale and sub-optimal solutions, even ending up with worse situations. A resilient mechanism is required to adapt network flows based on context into a dynamic environment. Thus, a reinforcement learning-based scheme is put forward for dynamic data forwarding considering network resource status, which explicitly presents a promising performance improvement. In the end, the proposed anomaly processing framework strengthens the analysis and diagnosis for network system administrators through synthesized fault detection and root cause localization. The learning-based traffic engineering stimulates networking flow management via experienced data and further shows a promising direction of flexible traffic adjustment for ever-changing environments
    • …
    corecore