15 research outputs found

    A Comprehensive Analysis of Proportional Intensity-based Software Reliability Models with Covariates (New Developments on Mathematical Decision Making Under Uncertainty)

    Get PDF
    The black-box approach based on stochastic software reliability models is a simple methodology with only software fault data in order to describe the temporal behavior of fault-detection processes, but fails to incorporate some significant development metrics data observed in the development process. In this paper we develop proportional intensity-based software reliability models with time-dependent metrics, and propose a statistical framework to assess the software reliability with the timedependent covariate as well as the software fault data. The resulting models are similar to the usual proportional hazard model, but possess somewhat different covariate structure from the existing one. We compare these metricsbased software reliability models with eleven well-known non-homogeneous Poisson process models, which are the special cases of our models, and evaluate quantitatively the goodness-of-fit and prediction. As an important result, the accuracy on reliability assessment strongly depends on the kind of software metrics used for analysis and can be improved by incorporating the time-dependent metrics data in modeling

    Reliability in open source software

    Get PDF
    Open Source Software is a component or an application whose source code is freely accessible and changeable by the users, subject to constraints expressed in a number of licensing modes. It implies a global alliance for developing quality software with quick bug fixing along with quick evolution of the software features. In the recent year tendency toward adoption of OSS in industrial projects has swiftly increased. Many commercial products use OSS in various fields such as embedded systems, web management systems, and mobile software’s. In addition to these, many OSSs are modified and adopted in software products. According to Netcarf survey more than 58% web servers are using an open source web server, Apache. The swift increase in the taking on of the open source technology is due to its availability, and affordability. Recent empirical research published by Forrester highlighted that although many European software companies have a clear OSS adoption strategy; there are fears and questions about the adoption. All these fears and concerns can be traced back to the quality and reliability of OSS. Reliability is one of the more important characteristics of software quality when considered for commercial use. It is defined as the probability of failure free operation of software for a specified period of time in a specified environment (IEEE Std. 1633-2008). While open source projects routinely provide information about community activity, number of developers and the number of users or downloads, this is not enough to convey information about reliability. Software reliability growth models (SRGM) are frequently used in the literature for the characterization of reliability in industrial software. These models assume that reliability grows after a defect has been detected and fixed. SRGM is a prominent class of software reliability models (SRM). SRM is a mathematical expression that specifies the general form of the software failure process as a function of factors such as fault introduction, fault removal, and the operational environment. Due to defect identification and removal the failure rate (failures per unit of time) of a software system generally decreases over time. Software reliability modeling is done to estimate the form of the curve of the failure rate by statistically estimating the parameters associated with the selected model. The purpose of this measure is twofold: 1) to estimate the extra test time required to meet a specified reliability objective and 2) to identify the expected reliability of the software after release (IEEE Std. 1633-2008). SRGM can be applied to guide the test board in their decision of whether to stop or continue the testing. These models are grouped into concave and S-Shaped models on the basis of assumption about cumulative failure occurrence pattern. The S-Shaped models assume that the occurrence pattern of cumulative number of failures is S-Shaped: initially the testers are not familiar with the product, then they become more familiar and hence there is a slow increase in fault removing. As the testers’ skills improve the rate of uncovering defects increases quickly and then levels off as the residual errors become more difficult to remove. In the concave shaped models the increase in failure intensity reaches a peak before a decrease in failure pattern is observed. Therefore the concave models indicate that the failure intensity is expected to decrease exponentially after a peak was reached. From exhaustive study of the literature I come across three research gaps: SRGM have widely been used for reliability characterization of closed source software (CSS), but 1) there is no universally applicable model that can be applied in all cases, 2) applicability of SRGM for OSS is unclear and 3) there is no agreement on how to select the best model among several alternative models, and no specific empirical methodologies have been proposed, especially for OSS. My PhD work mainly focuses on these three research gaps. In first step, focusing on the first research gap, I analyzed comparatively eight SRGM, including Musa Okumoto, Inflection S-Shaped, Geol Okumoto, Delayed S-Shaped, Logistic, Gompertz and Generalized Geol, in term of their fitting and prediction capabilities. These models have selected due to their wide spread use and they are the most representative in their category. For this study 38 failure datasets of 38 projects have been used. Among 38 projects, 6 were OSS and 32 were CSS. In 32 CSS datasets 22 were from testing phase and remaining 10 were from operational phase (i.e. field). The outcomes show that Musa Okumoto remains the best for CSS projects while Inflection S-Shaped and Gompertz remain best for OSS projects. Apart from that we observe that concave models outperform for CSS and S-Shaped outperform for OSS projects. In the second step, focusing on the second research gap, reliability growth of OSS projects was compared with that of CSS projects. For this purpose 25 OSS and 22 CSS projects were selected with related defect data. Eight SRGM were fitted to the defect data of selected projects and the reliability growth was analyzed with respect to fitted models. I found that the entire selected models fitted to OSS projects defect data in the same manner as that of CSS projects and hence it confirms that OSS projects reliability grows similarly to that of CSS projects. However, I observed that for OSS S-Shaped models outperform and for CSS concave shaped models outperform. To overcome the third research gap I proposed a method that selects the best SRGM among several alternative models for predicting the residuals of an OSS. The method helps the practitioners in deciding whether to adopt an OSS component, or not in a project. We test the method empirically by applying it to twenty one different releases of seven OSS projects. From the validation results it is clear that the method selects the best model 17 times out of 21. In the remaining four it selects the second best model

    Two-Dimensional Software Defect Models with Test Execution History

    Get PDF

    Mathematics in Software Reliability and Quality Assurance

    Get PDF
    This monograph concerns the mathematical aspects of software reliability and quality assurance and consists of 11 technical papers in this emerging area. Included are the latest research results related to formal methods and design, automatic software testing, software verification and validation, coalgebra theory, automata theory, hybrid system and software reliability modeling and assessment

    Dynamic learning with neural networks and support vector machines

    Get PDF
    Neural network approach has proven to be a universal approximator for nonlinear continuous functions with an arbitrary accuracy. It has been found to be very successful for various learning and prediction tasks. However, supervised learning using neural networks has some limitations because of the black box nature of their solutions, experimental network parameter selection, danger of overfitting, and convergence to local minima instead of global minima. In certain applications, the fixed neural network structures do not address the effect on the performance of prediction as the number of available data increases. Three new approaches are proposed with respect to these limitations of supervised learning using neural networks in order to improve the prediction accuracy.;Dynamic learning model using evolutionary connectionist approach . In certain applications, the number of available data increases over time. The optimization process determines the number of the input neurons and the number of neurons in the hidden layer. The corresponding globally optimized neural network structure will be iteratively and dynamically reconfigured and updated as new data arrives to improve the prediction accuracy. Improving generalization capability using recurrent neural network and Bayesian regularization. Recurrent neural network has the inherent capability of developing an internal memory, which may naturally extend beyond the externally provided lag spaces. Moreover, by adding a penalty term of sum of connection weights, Bayesian regularization approach is applied to the network training scheme to improve the generalization performance and lower the susceptibility of overfitting. Adaptive prediction model using support vector machines . The learning process of support vector machines is focused on minimizing an upper bound of the generalization error that includes the sum of the empirical training error and a regularized confidence interval, which eventually results in better generalization performance. Further, this learning process is iteratively and dynamically updated after every occurrence of new data in order to capture the most current feature hidden inside the data sequence.;All the proposed approaches have been successfully applied and validated on applications related to software reliability prediction and electric power load forecasting. Quantitative results show that the proposed approaches achieve better prediction accuracy compared to existing approaches

    Software reliability modeling and release time determination

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Ordenaciones estocásticas e inferencia bayesiana en fiabilidad de software

    Get PDF
    Within the last decade of the 20th century and the first few years of the 21st century, the demand for complex software systems has increased, and therefore, the reliability of software systems has become a major concern for our modern society. Software reliability is defined as the probability of failure free software operations for a specified period of time in a specified environment. Many current software reliability techniques and practices are detailed by Lyu and Pham. From a statistical point of view, the random variables that characterize software reliability are the epoch times in which a failure of software takes place or the times between failures. Most of the well known models for software reliability are centered around the interfailure times or the point processes that they generate. A software reliability model specifies the general form of the dependence of the failure process on the principal factors that affect it: fault introduction, fault removal, and the operational environment. The purpose of this thesis is threefold: (1) to study stochastic properties of times between failures relative to independent but not identically distributed random variables; (2) to investigate properties of the epoch times of nonhomogeneous pure birth processes as an extension of nonhomogeneous Poisson processes used in the literature in software reliability modelling and, (3) to develop a software reliability model based on the use of covariate information such as software metrics. Firstly, properties of statistics based on heterogeneous samples will be investigated with the aid of stochastic orders. Stochastic orders between probability distributions is a widely studied concept. There are several kinds of stochastic orders that are used to compare different aspects of probability distributions like location, variability, skewness, dependence, etc. Secondly, ageing notions and stochastic orderings of the epoch times of nonhomogeneous pure birth processes are studied. Ageing notions are another important concepts in reliability theory. Many classes of life distributions are characterized or defined according to their aging properties in the literature. Finally, we exhibit a non-parametric model based on Gaussian processes to predict number of software failures and times between failures. Gaussian processes are a flexible and attractive method for a wide variety of supervised learning problems, such as regression and classification in machine learning. This thesis is organized as follows. In Chapter 1, we present some basic software reliability measures. After providing a brief review of stochastic point processes and models of ordered random variables, it discusses the relationship between these kind of models and types of failure data. This is then followed by a brief review of some stochastic orderings and ageing notions. The chapter concludes with a review of some well known software reliability models. The results of Chapter 2 concern stochastic orders for spacings of the order statistics of independent exponential random variables with different scale parameters. These results on stochastic orderings and spacings are based on the relation between the spacings and the times between successive software failures. Due to the complicated expression of the distribution in the non-iid case, only limited results are found in the literature. In the first part of this chapter, we investigate the hazard rate ordering of simple spacings and normalized spacings of a sample of heterogeneous exponential random variables. In the second part of this chapter, we study the two sample problem. Specifically, we compare both simple spacings and normalized spacings from two samples of heterogeneous exponential random variables according to the likelihood ratio ordering. We also show applications of these results to multiple-outlier models. In Chapter 3, motivated by the equality in distribution between sequential order statistics and the first n epoch times of a nonhomogeneous pure birth process, we consider the problem of comparing the components of sequential k-out-of-n systems according to magnitude and location orders. In particular, this chapter discusses conditions on the underlying distribution functions on which the sequential order statistics are based, to obtain ageing notions and stochastic comparisons of sequential order statistics. We also present a nonhomogeneous pure birth process approach to software reliability modelling. A large number of models have been proposed in the literature to predict software failures, but a few incorporate some significant metrics data observed in software testing. In Chapter 4, we develop a new procedure to predict both interfailure times and numbers of software failures using metrics information, from a Bayesian perspective. In particular, we develop a hierarchical non-parametric regression model based on exponential interfailure times or Poisson failure counts, where the rates are modeled as Gaussian processes with software metrics data as inputs, together with some illustrative concrete examples. In Chapter 5 we show some general conclusions and describe the most significant contributions of this thesis. -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------En la última década del siglo 20 y en los primeros años del siglo 21, la demanda de sistemas informáticos ha aumentado considerablemente, muestra de ello es su presencia en satélites espaciales, aviones, cadenas de montaje automatizadas, incluso cada vez están más cercanos a nuestra vida cotidiana como en automóviles, electrodomésticos o teléfonos móviles. Un sistema informático consta de dos tipos de componentes: el hardware y el software. Entre ellos la principal diferencia es que el software no se desgasta. Así, un programa informático podría funcionar al cabo de años con la misma corrección con que lo hizo el primer día sin necesidad de modificación alguna. En general, la calidad de un producto puede valorarse desde diversos puntos de vista. El software no es una excepción, y existen por tanto diferentes enfoques para la valoración de su calidad. Aquí nos centraremos en uno de dichos enfoques: la fiabilidad. Por fiabilidad se entiende la probabilidad de ausencia de fallos durante la operación de un producto de software. Existen diferentes técnicas estadísticas para medir la fiabilidad de un programa informático, algunas de ellas son detalladas en Lyu y Pham. Desde un punto de vista estadístico, las variables aleatorias que caracterizan la fiabilidad del software son los instantes de tiempo en los que se produce un fallo de software, así como, los tiempos entre fallos. Uno de los objetivos principales de esta tesis es modelizar el comportamiento de dichas variables aleatorias. Resulta interesante estudiar el comportamiento estocástico de dichas variables, ya que, de este modo, podemos conocer propiedades de las mismas relacionadas con sus funciones de supervivencia o con sus funciones de tasa de fallo. En este sentido, en el Capítulo 2, presentamos resultados referidos con ordenaciones estocásticas de los tiempos entre fallos de software, relativos a variables aleatorias independientes no idénticamente distribuidas. Estos resultados se basan en la relación que liga dichos tiempos con los espaciamientos (spacings). Tanto los estadísticos de orden como los espaciamientos tienen un gran interés en el contexto del Análisis de Supervivencia, así como en la Teoría de Fiabilidad. En la mayoría de los trabajos existentes, se asume que las variables implicadas son independientes e idénticamente distribuidas (iid). Debido a la complejidad analítica que conlleva relajar alguna de estas dos hipótesis, no hay demasiadas referencias para el caso en el que las variables no sean iid. Kochar y Korwar comprobaron que, cuando el número de exponenciales que se contemplan son tres, los espaciamientos normalizados cumplen la ordenación de tasa de fallo y conjeturaron lo mismo para el caso general de n variables aleatorias exponenciales heterogéneas. En la Sección 2.2, se presentan avances relacionados con dicha conjetura, así como, resultados relativos a la ordenación de tasa de fallo de espaciamientos sin normalizar. También han sido estudiados en este capítulo problemas asociados con espaciamientos obtenidos a partir de muestras aleatorias de dos poblaciones. En particular, hemos obtenido condiciones suficientes para que se verifique la ordenación de razón de verosimilitud entre espaciamientos de dos muestras de exponenciales heterogéneas. Por otra parte, hemos trabajado con estadísticos de orden secuenciales, ya que incluyen un gran número de variables aleatorias ordenadas. Además, este tipo de estadísticos de orden son interesantes porque están ligados con los tiempos en los que ocurre un fallo de procesos no homogéneos de nacimiento puro. Cabe destacar, que este tipo de variables son dependientes y no idénticamente distribuidas, lo que aumenta la complejidad del problema. Nuestro objetivo aquí, es estudiar qué condiciones deben verificar las distribuciones subyacentes a partir de las cuales se definen los estadísticos de orden secuenciales para que éstos cumplan algún tipo de ordenación estocástica. Los resultados obtenidos en este sentido se presentan en el Capítulo 3. En este capítulo, también estudiamos otro concepto importante en fiabilidad, la noción de envejecimiento. Los diferentes conceptos de envejecimiento describen como una componente o un sistema mejora o empeora con la edad. En este sentido, el envejecimiento positivo significa que las componentes tienden a empeorar debido al desgaste. Exactamente esto es lo que le ocurre al hardware. Mientras que, cuando un sistema supera ciertos tests y mejora, diremos que el envejecimiento es negativo, como le sucede al software. En el segundo capítulo de la tesis, estudiamos condiciones bajo las cuales algunas propiedades de envejecimiento verificadas por las distribuciones subyacentes, a partir de las cuales se definen los estadísticos de orden secuenciales, se cumplen también para los estadísticos de orden secuenciales. Si bien es cierto que se han desarrollado en los últimos cuarenta años un gran número de modelos de fiabilidad de software, la mayoría de ellos no tienen en consideración la información proporcionada por covariables. Otra aportación de esta tesis, la cual se encuentra en el Capítulo 4, consiste en la utilización de métricas del software como variables independientes para predecir o bien el número de fallos de un programa informático o bien los tiempos entre sucesivos fallos del software. Una métrica de un programa informático sirve para medir la complejidad y la calidad del mismo, así como, la productividad de los programadores con respecto a su eficiencia y competencia. En esta tesis, hacemos uso de métricas para medir la complejidad de un programa informático a través del volumen del mismo contabilizando el número de líneas de código. En la literatura existen algunos modelos lineales para predecir datos de fallos del software mediante métodos de inferencia clásicos. Sin embargo, nosotros optamos por utilizar procesos gaussianos que relajan la linealidad y que han sido ampliamente usados en problemas de aprendizaje automático, tanto en regresión como en clasificación. Por último, en el Capítulo 5, resumimos las principales aportaciones de esta tesis

    Research reports: 1991 NASA/ASEE Summer Faculty Fellowship Program

    Get PDF
    The basic objectives of the programs, which are in the 28th year of operation nationally, are: (1) to further the professional knowledge of qualified engineering and science faculty members; (2) to stimulate an exchange of ideas between participants and NASA; (3) to enrich and refresh the research and teaching activities of the participants' institutions; and (4) to contribute to the research objectives of the NASA Centers. The faculty fellows spent 10 weeks at MSFC engaged in a research project compatible with their interests and background and worked in collaboration with a NASA/MSFC colleague. This is a compilation of their research reports for summer 1991

    Some Guidelines for Risk Assessment of Vulnerability Discovery Processes

    Get PDF
    Software vulnerabilities can be defined as software faults, which can be exploited as results of security attacks. Security researchers have used data from vulnerability databases to study trends of discovery of new vulnerabilities or propose models for fitting the discovery times and for predicting when new vulnerabilities may be discovered. Estimating the discovery times for new vulnerabilities is useful both for vendors as well as the end-users as it can help with resource allocation strategies over time. Among the research conducted on vulnerability modeling, only a few studies have tried to provide a guideline about which model should be used in a given situation. In other words, assuming the vulnerability data for a software is given, the research questions are the following: Is there any feature in the vulnerability data that could be used for identifying the most appropriate models for that dataset? What models are more accurate for vulnerability discovery process modeling? Can the total number of publicly-known exploited vulnerabilities be predicted using all vulnerabilities reported for a given software? To answer these questions, we propose to characterize the vulnerability discovery process using several common software reliability/vulnerability discovery models, also known as Software Reliability Models (SRMs)/Vulnerability Discovery Models (VDMs). We plan to consider different aspects of vulnerability modeling including curve fitting and prediction. Some existing SRMs/VDMs lack accuracy in the prediction phase. To remedy the situation, three strategies are considered: (1) Finding a new approach for analyzing vulnerability data using common models. In other words, we examine the effect of data manipulation techniques (i.e. clustering, grouping) on vulnerability data, and investigate whether it leads to more accurate predictions. (2) Developing a new model that has better curve filling and prediction capabilities than current models. (3) Developing a new method to predict the total number of publicly-known exploited vulnerabilities using all vulnerabilities reported for a given software. The dissertation is intended to contribute to the science of software reliability analysis and presents some guidelines for vulnerability risk assessment that could be integrated as part of security tools, such as Security Information and Event Management (SIEM) systems
    corecore