20 research outputs found

    Mining Bug Databases for Unidentified Software Vulnerabilities

    Full text link
    Identifying software vulnerabilities is becoming more important as critical and sensitive systems increasingly rely on complex software systems. It has been suggested in previous work that some bugs are only identified as vulnerabilities long after the bug has been made public. These vulnerabilities are known as hidden impact vulnerabilities. This paper discusses the feasibility and necessity to mine common publicly available bug databases for vulnerabilities that are yet to be identified. We present bug database analysis of two well known and frequently used software packages, namely Linux kernel and MySQL. It is shown that for both Linux and MySQL, a significant portion of vulnerabilities that were discovered for the time period from January 2006 to April 2011 were hidden impact vulnerabilities. It is also shown that the percentage of hidden impact vulnerabilities has increased in the last two years, for both software packages. We then propose an improved hidden impact vulnerability identification methodology based on text mining bug databases, and conclude by discussing a few potential problems faced by such a classifier

    Automated Analysis of Source Code Patches using Machine Learning Algorithms

    Get PDF
    An updated version of a tool for automated analysis of source code patches and branch differences is presented. The upgrade involves the use of machine learning techniques on source code, comments, and messages. It aims to help analysts, code reviewers, or auditors perform repetitive tasks continuously. The environment designed encourages collaborative work. It systematizes certain tasks pertaining to reviewing or auditing processes. Currently, the scope of the automated test is limited. Current work aims to increase the volume of source code analyzed per time unit, letting users focus on alerts automatically generated. The tool is distributed as open source software. This work also aims to provide arguments in support of the use of this type of tool. A brief overview of security problems in open source software is presented. It is argued that these problems were or may have been discovered reviewing patches and branch differences, released before the vulnerability was disclosed.IV Workshop de Seguridad Informática (WSI)Red de Universidades con Carreras en Informática (RedUNCI

    Automated Analysis of Source Code Patches using Machine Learning Algorithms

    Get PDF
    An updated version of a tool for automated analysis of source code patches and branch differences is presented. The upgrade involves the use of machine learning techniques on source code, comments, and messages. It aims to help analysts, code reviewers, or auditors perform repetitive tasks continuously. The environment designed encourages collaborative work. It systematizes certain tasks pertaining to reviewing or auditing processes. Currently, the scope of the automated test is limited. Current work aims to increase the volume of source code analyzed per time unit, letting users focus on alerts automatically generated. The tool is distributed as open source software. This work also aims to provide arguments in support of the use of this type of tool. A brief overview of security problems in open source software is presented. It is argued that these problems were or may have been discovered reviewing patches and branch differences, released before the vulnerability was disclosed.IV Workshop de Seguridad Informática (WSI)Red de Universidades con Carreras en Informática (RedUNCI

    Empirical Notes on the Interaction Between Continuous Kernel Fuzzing and Development

    Full text link
    Fuzzing has been studied and applied ever since the 1990s. Automated and continuous fuzzing has recently been applied also to open source software projects, including the Linux and BSD kernels. This paper concentrates on the practical aspects of continuous kernel fuzzing in four open source kernels. According to the results, there are over 800 unresolved crashes reported for the four kernels by the syzkaller/syzbot framework. Many of these have been reported relatively long ago. Interestingly, fuzzing-induced bugs have been resolved in the BSD kernels more rapidly. Furthermore, assertions and debug checks, use-after-frees, and general protection faults account for the majority of bug types in the Linux kernel. About 23% of the fixed bugs in the Linux kernel have either went through code review or additional testing. Finally, only code churn provides a weak statistical signal for explaining the associated bug fixing times in the Linux kernel.Comment: The 4th IEEE International Workshop on Reliability and Security Data Analysis (RSDA), 2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), Berlin, IEE

    Automated Analysis of Source Code Patches using Machine Learning Algorithms

    Get PDF
    An updated version of a tool for automated analysis of source code patches and branch differences is presented. The upgrade involves the use of machine learning techniques on source code, comments, and messages. It aims to help analysts, code reviewers, or auditors perform repetitive tasks continuously. The environment designed encourages collaborative work. It systematizes certain tasks pertaining to reviewing or auditing processes. Currently, the scope of the automated test is limited. Current work aims to increase the volume of source code analyzed per time unit, letting users focus on alerts automatically generated. The tool is distributed as open source software. This work also aims to provide arguments in support of the use of this type of tool. A brief overview of security problems in open source software is presented. It is argued that these problems were or may have been discovered reviewing patches and branch differences, released before the vulnerability was disclosed.IV Workshop de Seguridad Informática (WSI)Red de Universidades con Carreras en Informática (RedUNCI

    Information gain based dimensionality selection for classifying text documents

    Full text link
    Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexity is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods

    Information Gain Based Dimensionality Selection for Classifying Text Documents

    Get PDF
    Abstract-Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexity is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods

    Omission of quality software development practices : a systematic literature review

    Get PDF
    Software deficiencies are minimized by utilizing recommended software development and quality assurance practices. However, these recommended practices (i.e., quality practices) become ineffective if software professionals purposefully ignore them. Conducting a systematic literature review (n = 4,838), we discovered that only a small number of previous studies, within software engineering and information systems literature, have investigated the omission of quality practices. These studies explain the omission of quality practices mainly as a result of organizational decisions and trade-offs made under resource constraints or market pressure. However, our study indicates that different aspects of this phenomenon deserve further research. In particular, future research must investigate the conditions triggering the omission of quality practices and the processes through which this phenomenon occurs. Especially, since software development is a human-centric phenomenon, the psychological and behavioral aspects of this process deserve in-depth empirical investigation. In addition, futures research must clarify the social, organizational, and economical consequences of ignoring quality practices. Gaining in-depth theoretically sound and empirically grounded understandings about different aspects of this phenomenon enables research and practice to suggest interventions to overcome this issue.fi=vertaisarvioitu|en=peerReviewed
    corecore