20 research outputs found
A Software Vulnerability Prediction Model Using Traceable Code Patterns And Software Metrics
Software security is an important aspect of ensuring software quality. The goal of this study is to help developers evaluate software security at the early stage of development using traceable patterns and software metrics. The concept of traceable patterns is similar to design patterns, but they can be automatically recognized and extracted from source code. If these patterns can better predict vulnerable code compared to the traditional software metrics, they can be used in developing a vulnerability prediction model to classify code as vulnerable or not. By analyzing and comparing the performance of traceable patterns with metrics, we propose a vulnerability prediction model. Objective: This study explores the performance of code patterns in vulnerability prediction and compares them with traditional software metrics. We have used the findings to build an effective vulnerability prediction model. Method: We designed and conducted experiments on the security vulnerabilities reported for Apache Tomcat (Releases 6, 7 and 8), Apache CXF and three stand-alone Java web applications of Stanford Securibench. We used machine learning and statistical techniques for predicting vulnerabilities of the systems using traceable patterns and metrics as features. Result: We found that patterns have a lower false negative rate and higher recall in detecting vulnerable code than the traditional software metrics. We also found a set of patterns and metrics that shows higher recall in vulnerability prediction. Conclusion: Based on the results of the experiments, we proposed a prediction model using patterns and metrics to better predict vulnerable code with higher recall rate. We evaluated the model for the systems under study. We also evaluated their performance in the cross-dataset validation
Towards a Software Vulnerability Prediction Model using Traceable Code Patterns and Software Metrics
Software security is an important aspect of ensuring software quality. The goal of this study is to help developers evaluate software security using traceable patterns and software metrics during development. The concept of traceable patterns is similar to design patterns but they can be automatically recognized and extracted from source code. If these patterns can better predict vulnerable code compared to traditional software metrics, they can be used in developing a vulnerability prediction model to classify code as vulnerable or not. By analyzing and comparing the performance of traceable patterns with metrics, we propose a vulnerability prediction model. This study explores the performance of some code patterns in vulnerability prediction and compares them with traditional software metrics. We use the findings to build an effective vulnerability prediction model. We evaluate security vulnerabilities reported for Apache Tomcat, Apache CXF and three stand-alone Java web applications. We use machine learning and statistical techniques for predicting vulnerabilities using traceable patterns and metrics as features. We found that patterns have a lower false negative rate and higher recall in detecting vulnerable code than the traditional software metrics
Identifying Evolution of Software Metrics by Analyzing Vulnerability History in Open Source Projects
Software developers mostly focus on functioning code while developing their software paying little attention to the software security issues. Now a days, security is getting priority not only during the development phase, but also during other phases of software development life cycle (starting from requirement specification till maintenance phase). To that end, research have been expanded towards dealing with security issues in various phases. Current research mostly focused on developing different prediction models and most of them are based on software metrics. The metrics based models showed higher precision but poor recall rate in prediction. Moreover, they did not analyze the roles of individual software metric on the occurrences of vulnerabilities separately. In this paper, we target to track the evolution of metrics within the life-cycle of a vulnerability starting from its born version through the last affected version till fixed version. In particular, we studied a total of 250 files from three major releases of Apache Tomcat (8, 9 , and 10). We found that four metrics: AvgCyclomatic, AvgCyclomaticStrict, CountDeclMethod, and CountLineCodeExe show significant changes over the vulnerability history of Tomcat. In addition, we discovered that Tomcat team prioritizes in fixing threatening vulnerabilities such as Denial of Service than less severe vulnerabilities. The results of our research will potentially motivate further research on building more accurate vulnerability prediction models based on the appropriate software metrics. It will also help to assess developer\u27s mindset about fixing different types of vulnerabilities in open source projects
Evaluating Micro Patterns and Software Metrics in Vulnerability Prediction
Software security is an important aspect of ensuring software quality. Early detection of vulnerable code during development is essential for the developers to make cost and time effective software testing. The traditional software metrics are used for early detection of software vulnerability, but they are not directly related to code constructs and do not specify any particular granularity level. The goal of this study is to help developers evaluate software security using class-level traceable patterns called micro patterns to reduce security risks. The concept of micro patterns is similar to design patterns, but they can be automatically recognized and mined from source code. If micro patterns can better predict vulnerable classes compared to traditional software metrics, they can be used in developing a vulnerability prediction model. This study explores the performance of class-level patterns in vulnerability prediction and compares them with traditional class-level software metrics. We studied security vulnerabilities as reported for one major release of Apache Tomcat, Apache Camel and three stand-alone Java web applications. We used machine learning techniques for predicting vulnerabilities using micro patterns and class-level metrics as features. We found that micro patterns have higher recall in detecting vulnerable classes than the software metrics
A Comprehensive Tool for Text Categorization and Text Summarization in Bioinformatics
The work focuses on the integration of text categorization and text summarization tasks based on some existing algorithms. We primarily employ the method for bioinformatics literatures to categorize them in relevant domains of bioinformatics and then get a summarized overview of each of the documents in the domain. For text categorization we have chosen three different and core domains of bioinformatics: Protein-Protein Interaction, Disease-Drug Relevance and Pathway-Process Involvement. The method uses TF-IDF based technology for the categorization task and then after categorization it summarizes the key contents of each document using some existing features. The system plays important role in automatically reducing review spaces for the researchers as they do not need to manually select their relevant texts. It also saves time by providing ranked and significantly relevant lines of the documents. Our method outperforms other existing summarization tools in the sense that it optimizes summarization by first categorizing the documents on the basis of TF-IDF technology and then avoids redundant information by properly ranking the sentences using existing score
Querying KEGG Pathways in Logic
Understanding the interaction patterns among biological entities in a pathway can potentially reveal the role of the entities in biological systems. Although considerable effort has been contributed to this direction, querying biological pathways remained relatively unexplored. Querying is principally different in which we retrieve pathways satisfying a given property in terms of its topology, or constituents. One such property is subnetwork matching using various constituent parameters. In this paper, we introduce a logic based framework for querying biological pathways using a novel and generic subgraph isomorphism computation technique. We develop a graphical interface called IsoKEGG to facilitate flexible querying of KEGG pathways based on isomorphic pathway topologies as well as matching any combination of node names, types, and edges. It allows editing KGML represented query pathways and returns all isomorphic patterns in KEGG pathways satisfying a given query condition for further analysis
New Constraints on Generation of Uniform Random Samples from Evolutionary Trees
In this paper, we introduce new algorithms for selecting taxon samples from large evolutionary trees, maintaining uniformity and randomness, under certain new constraints on the taxa. The algorithms are efficient as their runtimes and space complexities are polynomial. The algorithms have direct applications to the evolution of phylogenetic tree and efficient supertree construction using biologically curated data. We also present new lower bounds for the problem of constructing evolutionary tree from experiment under some earlier stated constraints. All the algorithms have been implemented
Dynamic and Parallel Approaches to Optimal Evolutionary Tree Construction
Phylogenetic trees are commonly reconstructed based on hard optimization problems such as Maximum parsimony (MP) and Maximum likelihood (ML). Conventional MP heuristics for producing phylogenetic trees produce good solutions within reasonable time on small databases (up to a few thousand sequences) while ML heuristics are limited to smaller datasets (up to a few hundred sequences). However, since MP and ML are NP-hard, application of such approaches do not scale large datasets. In this paper, we present a promising divide-and-conquer technique, the TAZ method, to construct an evolutionary tree. The algorithm has been implemented and tested against five large biological datasets ranging from 5000-7000 sequences and dramatic speedup with significant improvement in accuracy (better than 94%), in comparison to existing approaches, has been obtained. Thus, high quality reconstruction can be obtained for large datasets by using this approach. Moreover, we present here another approach to construct the tree dynamically (when sequences come dynamically with partial information). Finally Combining the two approaches, we show parallel approaches to construct the tree when sequences are generated or obtained dynamically
Epics: A System for Genome-Wide Epistasis and Genetic Variation Analysis using Protein-Protein Interactions
Epistasis usually contributes to many well known diseases making the traits more complex and harder to study. The interactions between multiple genes and their alleles of different loci often mask the effects of a single gene at particular locus resulting in a complex trait. So the analysis of epistasis uncovers the facts about the mechanisms and pathways involved in a disease by analyzing biological interactions between implicated proteins. As the existing tools mainly focus on the single or pair wise variation analysis, a comprehensive tool capable of analyzing interactions among multiple variations located in different chromosomal loci is still of growing importance for genome wide association study. In this paper, we focus on exploring all the protein-protein interactions coded by the genes in the regions of variations of human genome. We introduce a tool called EpICS that helps explore the epistatic effects of genes by analyzing the protein-protein interactions within the regions of different types of genetic variations. It accepts variation IDs, type of variations (Insertion-Deletion/Copy Number Variation/Single Nucleotide polymorphism), PubMed identifiers, or a region of a chromosome as input and then enumerates the variations of the user-specified types as well as the interactions of the proteins coded by the genes in the region. It also provides necessary details for further study of the results. EpICS is available at http://integra.cs.wayne.edu:8080/epics for general use