181 research outputs found

    On the Use of Software Tracing and Boolean Combination of Ensemble Classifiers to Support Software Reliability and Security Tasks

    Get PDF
    In this thesis, we propose an approach that relies on Boolean combination of multiple one-class classification methods based on Hidden Markov Models (HMMs), which are pruned using weighted Kappa coefficient to select and combine accurate and diverse classifiers. Our approach, called WPIBC (Weighted Pruning Iterative Boolean Combination) works in three phases. The first phase selects a subset of the available base diverse soft classifiers by pruning all the redundant soft classifiers based on a weighted version of Cohen’s kappa measure of agreement. The second phase selects a subset of diverse and accurate crisp classifiers from the base soft classifiers (selected in Phase1) based on the unweighted kappa measure. The selected complementary crisp classifiers are then combined in the final phase using Boolean combinations. We apply the proposed approach to two important problems in software security and reliability: The detection of system anomalies and the prediction of the reassignment of bug report fields. Detecting system anomalies at run-time is a critical component of system reliability and security. Studies in this area focus mainly on the effectiveness of the proposed approaches -the ability to detect anomalies with high accuracy. Less attention was given to false alarm and efficiency. Although ensemble approaches for the detection of anomalies that use Boolean combination of classifier decisions have been shown to be useful in reducing the false alarm rate over that of a single classifier, existing methods rely on an exponential number of combinations making them impractical even for a small number of classifiers. Our approach is not only able to maintain and even improve the accuracy of existing Boolean combination techniques, but also significantly reduce the combination time and the number of classifiers selected for combination. The second application domain of our approach is the prediction of the reassignment of bug report fields. Bug reports contain a wealth of information that is used by triaging and development teams to understand the causes of bugs in order to provide fixes. The problem is that, for various reasons, it is common to have bug reports with missing or incorrect information, hindering the bug resolution process. To address this problem. researchers have turned to machine learning techniques. The common practice is to build models that leverage historical bug reports to automatically predict when a given bug report field should be reassigned. Existing approaches have mainly relied upon classifiers that make use of natural language in the title and description of the bug reports. They fail to take advantage of the richly detailed sequential information that is present in stack traces included in bug reports. To address this, we propose an approach called EnHMM which uses WPIBC and stack traces to predict the reassignment of bug report fields. Another contribution of this thesis is an approach to improve the efficiency of WPIBC by leveraging the Hadoop framework and the MapReduce programming model. We also show how WPIBC can be extended to support heterogenous classifiers

    Investigating the performance of transport infrastructure using real-time data and a scalable multi-modal agent based model

    Get PDF
    The idea that including more information in more dynamic and iterative ways is central to the promise of the big data paradigm. The hope is that via new data sources, such as remote sensors and mobile phones, the reliance on heavily simplified generalised functions for model inputs will be erased. This trade between idealised and actual empirical data will be matched with dynamic models which consider complexity at a fundamental level, inherently mirroring the systems they are attempting to replicate. Cloud computing brings the possibility of doing all of this, in less time than the simplified macro models of the past, thus enabling better answers and at the time of critical decision making junctures. This research was task driven - the question of high speed rail versus aviation led to an investigation into the simplifications and assumptions that back up many of the commonly held beliefs on the sustainability of different modes of transport. The literature ultimately highlighted the need for context specific information; actual load factors, actual journey times considering traffic/engineering works and so on. Thus, rather than being explicitly an exercise in answering a specific question, a specific question was used to drive the development of a tool which may hold promise for answering a range of transportation related questions. The original contributions of this work are, firstly the use of real-time data sources to quantify temporally and spatially dynamic network performance metrics (eg. journey times on different transport models) and secondly to organise these data sources in a framework which can handle the volume and type of the data and organise the data in a way so that it is useful for the dynamic agent based modelling of future scenarios.EPSRC I Case Studentship with Ove Arup & Partner
    • …
    corecore