1,355 research outputs found

    Some new results on convolutions of heterogeneous gamma random variables

    Get PDF
    AbstractConvolutions of independent random variables often arise in a natural way in many applied areas. In this paper, we study various stochastic orderings of convolutions of heterogeneous gamma random variables in terms of the majorization order [p-larger order, reciprocal majorization order] of parameter vectors and the likelihood ratio order [dispersive order, hazard rate order, star order, right spread order, mean residual life order] between convolutions of two heterogeneous gamma sets of variables wherein they have both differing scale parameters and differing shape parameters. The results established in this paper strengthen and generalize those known in the literature

    Markovian and stochastic differential equation based approaches to computer virus propagation dynamics and some models for survival distributions

    Get PDF
    This dissertation is divided in two Parts. The first Part explores probabilistic modeling of propagation of computer \u27malware\u27 (generally referred to as \u27virus\u27) across a network of computers, and investigates modeling improvements achieved by introducing a random latency period during which an infected computer in the network is unable to infect others. In the second Part, two approaches for modeling life distributions in univariate and bivariate setups are developed. In Part I, homogeneous and non-homogeneous stochastic susceptible-exposed-infectious- recovered (SEIR) models are specifically explored for the propagation of computer virus over the Internet by borrowing ideas from mathematical epidemiology. Large computer networks such as the Internet have become essential in today\u27s technological societies and even critical to the financial viability of the national and the global economy. However, the easy access and widespread use of the Internet makes it a prime target for malicious activities, such as introduction of computer viruses, which pose a major threat to large computer networks. Since an understanding of the underlying dynamics of their propagation is essential in efforts to control them, a fair amount of research attention has been devoted to model the propagation of computer viruses, starting from basic deterministic models with ordinary differential equations (ODEs) through stochastic models of increasing realism. In the spirit of exploring more realistic probability models that seek to explain the time dependent transient behavior of computer virus propagation by exploiting the essential stochastic nature of contacts and communications among computers, the present study introduces a new refinement in such efforts to consider the suitability and use of the stochastic SEIR model of mathematical epidemiology in the context of computer viruses propagation. We adapt the stochastic SEIR model to the study of computer viruses prevalence by incorporating the idea of a latent period during which computer is in an \u27exposed state\u27 in the sense that the computer is infected but cannot yet infect other computers until the latency is over. The transition parameters of the SEIR model are estimated using real computer viruses data. We develop the maximum likelihood (MLE) and Bayesian estimators for the SEIR model parameters, and apply them to the \u27Code Red worm\u27 data. Since network structure can be a possibly important factor in virus propagation, multi-group stochastic SEIR models for the spreading of computer virus in heterogeneous networks are explored next. For the multi-group stochastic SEIR model using Markovian approach, the method of maximum likelihood estimation for model parameters of interest are derived. The method of least squares is used to estimate the model parameters of interest in the multi-group stochastic SEIR-SDE model, based on stochastic differential equations. The models and methodologies are applied to Code Red worm data. Simulations based on different models proposed in this dissertation and deterministic/ stochastic models available in the literature are conducted and compared. Based on such comparisons, we conclude that (i) stochastic models using SEIR framework appear to be relatively much superior than previous models of computer virus propagation - even up to its saturation level, and (ii) there is no appreciable difference between homogeneous and heterogeneous (multi-group) models. The \u27no difference\u27 finding of course may possibly be influenced by the criterion used to assign computers in the overall network to different groups. In our study, the grouping of computers in the total network into subgroups or, clusters were based on their geographical location only, since no other grouping criterion were available in the Code Red worm data. Part II covers two approaches for modeling life distributions in univariate and bivariate setups. In the univariate case, a new partial order based on the idea of \u27star-shaped functions\u27 is introduced and explored. In the bivariate context; a class of models for joint lifetime distributions that extends the idea of univariate proportional hazards in a suitable way to the bivariate case is proposed. The expectation-maximization (EM) method is used to estimate the model parameters of interest. For the purpose of illustration, the bivariate proportional hazard model and the method of parameter estimation are applied to two real data sets

    Ordenaciones estocásticas e inferencia bayesiana en fiabilidad de software

    Get PDF
    Within the last decade of the 20th century and the first few years of the 21st century, the demand for complex software systems has increased, and therefore, the reliability of software systems has become a major concern for our modern society. Software reliability is defined as the probability of failure free software operations for a specified period of time in a specified environment. Many current software reliability techniques and practices are detailed by Lyu and Pham. From a statistical point of view, the random variables that characterize software reliability are the epoch times in which a failure of software takes place or the times between failures. Most of the well known models for software reliability are centered around the interfailure times or the point processes that they generate. A software reliability model specifies the general form of the dependence of the failure process on the principal factors that affect it: fault introduction, fault removal, and the operational environment. The purpose of this thesis is threefold: (1) to study stochastic properties of times between failures relative to independent but not identically distributed random variables; (2) to investigate properties of the epoch times of nonhomogeneous pure birth processes as an extension of nonhomogeneous Poisson processes used in the literature in software reliability modelling and, (3) to develop a software reliability model based on the use of covariate information such as software metrics. Firstly, properties of statistics based on heterogeneous samples will be investigated with the aid of stochastic orders. Stochastic orders between probability distributions is a widely studied concept. There are several kinds of stochastic orders that are used to compare different aspects of probability distributions like location, variability, skewness, dependence, etc. Secondly, ageing notions and stochastic orderings of the epoch times of nonhomogeneous pure birth processes are studied. Ageing notions are another important concepts in reliability theory. Many classes of life distributions are characterized or defined according to their aging properties in the literature. Finally, we exhibit a non-parametric model based on Gaussian processes to predict number of software failures and times between failures. Gaussian processes are a flexible and attractive method for a wide variety of supervised learning problems, such as regression and classification in machine learning. This thesis is organized as follows. In Chapter 1, we present some basic software reliability measures. After providing a brief review of stochastic point processes and models of ordered random variables, it discusses the relationship between these kind of models and types of failure data. This is then followed by a brief review of some stochastic orderings and ageing notions. The chapter concludes with a review of some well known software reliability models. The results of Chapter 2 concern stochastic orders for spacings of the order statistics of independent exponential random variables with different scale parameters. These results on stochastic orderings and spacings are based on the relation between the spacings and the times between successive software failures. Due to the complicated expression of the distribution in the non-iid case, only limited results are found in the literature. In the first part of this chapter, we investigate the hazard rate ordering of simple spacings and normalized spacings of a sample of heterogeneous exponential random variables. In the second part of this chapter, we study the two sample problem. Specifically, we compare both simple spacings and normalized spacings from two samples of heterogeneous exponential random variables according to the likelihood ratio ordering. We also show applications of these results to multiple-outlier models. In Chapter 3, motivated by the equality in distribution between sequential order statistics and the first n epoch times of a nonhomogeneous pure birth process, we consider the problem of comparing the components of sequential k-out-of-n systems according to magnitude and location orders. In particular, this chapter discusses conditions on the underlying distribution functions on which the sequential order statistics are based, to obtain ageing notions and stochastic comparisons of sequential order statistics. We also present a nonhomogeneous pure birth process approach to software reliability modelling. A large number of models have been proposed in the literature to predict software failures, but a few incorporate some significant metrics data observed in software testing. In Chapter 4, we develop a new procedure to predict both interfailure times and numbers of software failures using metrics information, from a Bayesian perspective. In particular, we develop a hierarchical non-parametric regression model based on exponential interfailure times or Poisson failure counts, where the rates are modeled as Gaussian processes with software metrics data as inputs, together with some illustrative concrete examples. In Chapter 5 we show some general conclusions and describe the most significant contributions of this thesis. -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------En la última década del siglo 20 y en los primeros años del siglo 21, la demanda de sistemas informáticos ha aumentado considerablemente, muestra de ello es su presencia en satélites espaciales, aviones, cadenas de montaje automatizadas, incluso cada vez están más cercanos a nuestra vida cotidiana como en automóviles, electrodomésticos o teléfonos móviles. Un sistema informático consta de dos tipos de componentes: el hardware y el software. Entre ellos la principal diferencia es que el software no se desgasta. Así, un programa informático podría funcionar al cabo de años con la misma corrección con que lo hizo el primer día sin necesidad de modificación alguna. En general, la calidad de un producto puede valorarse desde diversos puntos de vista. El software no es una excepción, y existen por tanto diferentes enfoques para la valoración de su calidad. Aquí nos centraremos en uno de dichos enfoques: la fiabilidad. Por fiabilidad se entiende la probabilidad de ausencia de fallos durante la operación de un producto de software. Existen diferentes técnicas estadísticas para medir la fiabilidad de un programa informático, algunas de ellas son detalladas en Lyu y Pham. Desde un punto de vista estadístico, las variables aleatorias que caracterizan la fiabilidad del software son los instantes de tiempo en los que se produce un fallo de software, así como, los tiempos entre fallos. Uno de los objetivos principales de esta tesis es modelizar el comportamiento de dichas variables aleatorias. Resulta interesante estudiar el comportamiento estocástico de dichas variables, ya que, de este modo, podemos conocer propiedades de las mismas relacionadas con sus funciones de supervivencia o con sus funciones de tasa de fallo. En este sentido, en el Capítulo 2, presentamos resultados referidos con ordenaciones estocásticas de los tiempos entre fallos de software, relativos a variables aleatorias independientes no idénticamente distribuidas. Estos resultados se basan en la relación que liga dichos tiempos con los espaciamientos (spacings). Tanto los estadísticos de orden como los espaciamientos tienen un gran interés en el contexto del Análisis de Supervivencia, así como en la Teoría de Fiabilidad. En la mayoría de los trabajos existentes, se asume que las variables implicadas son independientes e idénticamente distribuidas (iid). Debido a la complejidad analítica que conlleva relajar alguna de estas dos hipótesis, no hay demasiadas referencias para el caso en el que las variables no sean iid. Kochar y Korwar comprobaron que, cuando el número de exponenciales que se contemplan son tres, los espaciamientos normalizados cumplen la ordenación de tasa de fallo y conjeturaron lo mismo para el caso general de n variables aleatorias exponenciales heterogéneas. En la Sección 2.2, se presentan avances relacionados con dicha conjetura, así como, resultados relativos a la ordenación de tasa de fallo de espaciamientos sin normalizar. También han sido estudiados en este capítulo problemas asociados con espaciamientos obtenidos a partir de muestras aleatorias de dos poblaciones. En particular, hemos obtenido condiciones suficientes para que se verifique la ordenación de razón de verosimilitud entre espaciamientos de dos muestras de exponenciales heterogéneas. Por otra parte, hemos trabajado con estadísticos de orden secuenciales, ya que incluyen un gran número de variables aleatorias ordenadas. Además, este tipo de estadísticos de orden son interesantes porque están ligados con los tiempos en los que ocurre un fallo de procesos no homogéneos de nacimiento puro. Cabe destacar, que este tipo de variables son dependientes y no idénticamente distribuidas, lo que aumenta la complejidad del problema. Nuestro objetivo aquí, es estudiar qué condiciones deben verificar las distribuciones subyacentes a partir de las cuales se definen los estadísticos de orden secuenciales para que éstos cumplan algún tipo de ordenación estocástica. Los resultados obtenidos en este sentido se presentan en el Capítulo 3. En este capítulo, también estudiamos otro concepto importante en fiabilidad, la noción de envejecimiento. Los diferentes conceptos de envejecimiento describen como una componente o un sistema mejora o empeora con la edad. En este sentido, el envejecimiento positivo significa que las componentes tienden a empeorar debido al desgaste. Exactamente esto es lo que le ocurre al hardware. Mientras que, cuando un sistema supera ciertos tests y mejora, diremos que el envejecimiento es negativo, como le sucede al software. En el segundo capítulo de la tesis, estudiamos condiciones bajo las cuales algunas propiedades de envejecimiento verificadas por las distribuciones subyacentes, a partir de las cuales se definen los estadísticos de orden secuenciales, se cumplen también para los estadísticos de orden secuenciales. Si bien es cierto que se han desarrollado en los últimos cuarenta años un gran número de modelos de fiabilidad de software, la mayoría de ellos no tienen en consideración la información proporcionada por covariables. Otra aportación de esta tesis, la cual se encuentra en el Capítulo 4, consiste en la utilización de métricas del software como variables independientes para predecir o bien el número de fallos de un programa informático o bien los tiempos entre sucesivos fallos del software. Una métrica de un programa informático sirve para medir la complejidad y la calidad del mismo, así como, la productividad de los programadores con respecto a su eficiencia y competencia. En esta tesis, hacemos uso de métricas para medir la complejidad de un programa informático a través del volumen del mismo contabilizando el número de líneas de código. En la literatura existen algunos modelos lineales para predecir datos de fallos del software mediante métodos de inferencia clásicos. Sin embargo, nosotros optamos por utilizar procesos gaussianos que relajan la linealidad y que han sido ampliamente usados en problemas de aprendizaje automático, tanto en regresión como en clasificación. Por último, en el Capítulo 5, resumimos las principales aportaciones de esta tesis
    corecore