25 research outputs found

    Toward Non-security Failures as a Predictor of Security Faults and Failures

    Full text link
    Abstract. In the search for metrics that can predict the presence of vulnerabilities early in the software life cycle, there may be some benefit to choosing metrics from the non-security realm. We analyzed non-security and security failure data reported for the year 2007 of a Cisco software system. We used non-security failure reports as input variables into a classification and regression tree (CART) model to determine the probability that a component will have at least one vulnerability. Using CART, we ranked all of the system components in descending order of their probabilities and found that 57 % of the vulnerable components were in the top nine percent of the total component ranking, but with a 48 % false positive rate. The results indicate that non-security failures can be used as one of the input variables for security-related prediction models

    Disagreement-based co-training

    No full text
    Recently, Semi-Supervised learning algorithms such as co-training are used in many domains. In co-training, two classifiers based on different subsets of the features or on different learning algorithms are trained in parallel and unlabeled data that are classified differently by the classifiers but for which one classifier has large confidence are labeled and used as training data for the other. In this paper, a new form of co-training, called Ensemble-Co-Training, is proposed that uses an ensemble of different learning algorithms. Based on a theorem by Angluin and Laird that relates noise in the data to the error of hypotheses learned from these data, we propose a criterion for finding a subset of high-confidence predictions and error rate for a classifier in each iteration of the training process. Experiments show that the new method in almost all domains gives better results than the state-of-the-art methods

    A classification scheme for studies on fault-prone components

    No full text
    Various approaches are presented in the literature to identify fault-prone components. The approaches represent a wide range of characteristics and capabilities, but they are not comparable, since different aspects are compared and different data sets are used. In order to enable a consistent and fair comparison, we propose a classification scheme, with two parts, 1) a characterisation scheme which captures information on input, output and model characteristics, and 2) an evaluation scheme which is designed for comparing different models' capabilities. The schemes and the rationale for the elements of the schemes are presented in the paper. Important capabilities to evaluate when comparing different models are rate of misclassification, classification efficiency and total classification cost. Further, the schemes are applied in an example study to illustrate the use of the schemes. It is expected that applying these schemes would help researchers to compare different approaches and thereby enable building of a more consistent knowledge base in software engineering. In addition it is expected to help practitioners to choose a suitable prediction approach for a specific environment by filling out the characterisation scheme and making an evaluation in their own environment

    Schema Independent Reduction of Streaming Log Data

    No full text
    corecore