5 research outputs found

    Automated Static Warning Identification via Path-based Semantic Representation

    Full text link
    Despite their ability to aid developers in detecting potential defects early in the software development life cycle, static analysis tools often suffer from precision issues (i.e., high false positive rates of reported alarms). To improve the availability of these tools, many automated warning identification techniques have been proposed to assist developers in classifying false positive alarms. However, existing approaches mainly focus on using hand-engineered features or statement-level abstract syntax tree token sequences to represent the defective code, failing to capture semantics from the reported alarms. To overcome the limitations of traditional approaches, this paper employs deep neural networks' powerful feature extraction and representation abilities to generate code semantics from control flow graph paths for warning identification. The control flow graph abstractly represents the execution process of a given program. Thus, the generated path sequences of the control flow graph can guide the deep neural networks to learn semantic information about the potential defect more accurately. In this paper, we fine-tune the pre-trained language model to encode the path sequences and capture the semantic representations for model building. Finally, this paper conducts extensive experiments on eight open-source projects to verify the effectiveness of the proposed approach by comparing it with the state-of-the-art baselines.Comment: 17 pages, in Chinese language, 9 figure

    Mining Fix Patterns for FindBugs Violations

    Get PDF
    In this paper, we first collect and track a large number of fixed and unfixed violations across revisions of software. The empirical analyses reveal that there are discrepancies in the distributions of violations that are detected and those that are fixed, in terms of occurrences, spread and categories, which can provide insights into prioritizing violations. To automatically identify patterns in violations and their fixes, we propose an approach that utilizes convolutional neural networks to learn features and clustering to regroup similar instances. We then evaluate the usefulness of the identified fix patterns by applying them to unfixed violations. The results show that developers will accept and merge a majority (69/116) of fixes generated from the inferred fix patterns. It is also noteworthy that the yielded patterns are applicable to four real bugs in the Defects4J major benchmark for software testing and automated repair.Comment: Accepted for IEEE Transactions on Software Engineerin

    Feature Set Selection for Improved Classification of Static Analysis Alerts

    Get PDF
    With the extreme growth in third party cloud applications, increased exposure of applications to the internet, and the impact of successful breaches, improving the security of software being produced is imperative. Static analysis tools can alert to quality and security vulnerabilities of an application; however, they present developers and analysts with a high rate of false positives and unactionable alerts. This problem may lead to the loss of confidence in the scanning tools, possibly resulting in the tools not being used. The discontinued use of these tools may increase the likelihood of insecure software being released into production. Insecure software can be successfully attacked resulting in the compromise of one or several information security principles such as confidentiality, availability, and integrity. Feature selection methods have the potential to improve the classification of static analysis alerts and thereby reduce the false positive rates. Thus, the goal of this research effort was to improve the classification of static analysis alerts by proposing and testing a novel method leveraging feature selection. The proposed model was developed and subsequently tested on three open source PHP applications spanning several years. The results were compared to a classification model utilizing all features to gauge the classification improvement of the feature selection model. The model presented did result in the improved classification accuracy and reduction of the false positive rate on a reduced feature set. This work contributes a real-world static analysis dataset based upon three open source PHP applications. It also enhanced an existing data set generation framework to include additional predictive software features. However, the main contribution is a feature selection methodology that may be used to discover optimal feature sets that increase the classification accuracy of static analysis alerts

    Análisis estático de software

    Get PDF
    Uno de los aspectos más importantes a mitigar en el desarrollo de sistemas de alta calidad es el número de defectos presentes en el código fuente debido a que estos podrían manifestarse como fallos durante fases posteriores a la codificación. Por esta razón, se han diseñado múltiples prácticas de ingeniería de software encargadas de descubrir la mayor cantidad posible de dichos defectos, siendo el Análisis Estático Automatizado (ASA) una de las más prometedoras. Sin embargo, aquellas herramientas que se ocupan de llevar a cabo esta práctica presentan una gran desventaja, la cual es la generación de un elevado número de posibles defectos y cuya relevancia es imperceptible a la correcta funcionalidad del sistema (alertas no accionables), provocando un gran consumo de tiempo al momento de inspeccionar cada una de ellas. Por lo anterior, el presente trabajo de investigación hace uso del aprendizaje maquinal para crear una Técnica de Identificación de Alertas Accionables (AAIT) como una forma de incorporar el ASA al Proceso de Desarrollo de Software (PDS). Para dos proyectos de software ajenos entre sí, se han generado múltiples reportes de alertas de análisis estático, los cuales han sido transformados en conjuntos de vectores de 46 Características de Alerta (CA) que sirven para construir y evaluar diferentes modelos de clasificación de alertas con el fin de aumentar el número de defectos relevantes descubiertos (alertas accionables) luego de concluir la fase de codificación y previo a la fase de pruebas. Los resultados obtenidos muestran que la utilización de modelos internos o externos al proyecto (es decir, la construcción de modelos con base en las alertas de un proyecto y su ejecución sobre las alertas del mismo proyecto o de otro) ofrecen un desempeño promedio (exactitud, precisión y sensibilidad) del 96.4% y del 71.1% respectivamente. Adicionalmente, el análisis realizado sobre el impacto que produciría la ejecución de nuestro mejor modelo externo predice que se lograría una eficiencia de eliminación de defectos del 90.0% a costa de un aumento del 47.16% sobre el tiempo total invertido en la corrección de dichos defectos respecto a un PDS que no incorpore ASA, permitiendo aumentar el número de defectos relevantes descubiertos en un 42.9% y disminuir el número de alertas irrelevantes en un 56.0
    corecore