25 research outputs found
Toward Non-security Failures as a Predictor of Security Faults and Failures
Abstract. In the search for metrics that can predict the presence of vulnerabilities early in the software life cycle, there may be some benefit to choosing metrics from the non-security realm. We analyzed non-security and security failure data reported for the year 2007 of a Cisco software system. We used non-security failure reports as input variables into a classification and regression tree (CART) model to determine the probability that a component will have at least one vulnerability. Using CART, we ranked all of the system components in descending order of their probabilities and found that 57 % of the vulnerable components were in the top nine percent of the total component ranking, but with a 48 % false positive rate. The results indicate that non-security failures can be used as one of the input variables for security-related prediction models
Recommended from our members
Message from the program committee chairs
On behalf of the Program Committee of the 2005 IEEE International Conference on Information Reuse and Integration (IEEE IRI-2005), it is an honor to welcome you to IRI 2005 in Las Vegas, Nevada. It has been our pleasure as Program Committee Co-Chairs to help organize this year's impressive scientific and technical program and the technical proceedings. The proceedings contain the papers selected for presentation at IRI-2005. We hope these proceedings will serve as a valuable reference for the research community. This year's conference theme is "Knowledge Acquisition and Management." We received an overwhelming 168 submissions from 28 countries. We are very pleased with this level of international participation, and we hope the trend continues to grow. From these 168 submissions the program committee selected 100 papers to be presented at the conference. Each of them was reviewed by two or more referees. The authors were asked to address each and every comment made by the referees for improving the quality of their papers. We have organized a scientific and educational program, coupled with what we hope will be some entertaining social events, during which new collaborations will be made, old ones renewed, and current ones strengthened
Disagreement-based co-training
Recently, Semi-Supervised learning algorithms such as co-training are used in many domains. In co-training, two classifiers based on different subsets of the features or on different learning algorithms are trained in parallel and unlabeled data that are classified differently by the classifiers but for which one classifier has large confidence are labeled and used as training data for the other. In this paper, a new form of co-training, called Ensemble-Co-Training, is proposed that uses an ensemble of different learning algorithms. Based on a theorem by Angluin and Laird that relates noise in the data to the error of hypotheses learned from these data, we propose a criterion for finding a subset of high-confidence predictions and error rate for a classifier in each iteration of the training process. Experiments show that the new method in almost all domains gives better results than the state-of-the-art methods
A classification scheme for studies on fault-prone components
Various approaches are presented in the literature to identify fault-prone components. The approaches represent a wide range of characteristics and capabilities, but they are not comparable, since different aspects are compared and different data sets are used. In order to enable a consistent and fair comparison, we propose a classification scheme, with two parts, 1) a characterisation scheme which captures information on input, output and model characteristics, and 2) an evaluation scheme which is designed for comparing different models' capabilities. The schemes and the rationale for the elements of the schemes are presented in the paper. Important capabilities to evaluate when comparing different models are rate of misclassification, classification efficiency and total classification cost. Further, the schemes are applied in an example study to illustrate the use of the schemes. It is expected that applying these schemes would help researchers to compare different approaches and thereby enable building of a more consistent knowledge base in software engineering. In addition it is expected to help practitioners to choose a suitable prediction approach for a specific environment by filling out the characterisation scheme and making an evaluation in their own environment