13 research outputs found

    Bayesian inference for queueing networks and modeling of internet services

    Get PDF
    Modern Internet services, such as those at Google, Yahoo!, and Amazon, handle billions of requests per day on clusters of thousands of computers. Because these services operate under strict performance requirements, a statistical understanding of their performance is of great practical interest. Such services are modeled by networks of queues, where each queue models one of the computers in the system. A key challenge is that the data are incomplete, because recording detailed information about every request to a heavily used system can require unacceptable overhead. In this paper we develop a Bayesian perspective on queueing models in which the arrival and departure times that are not observed are treated as latent variables. Underlying this viewpoint is the observation that a queueing model defines a deterministic transformation between the data and a set of independent variables called the service times. With this viewpoint in hand, we sample from the posterior distribution over missing data and model parameters using Markov chain Monte Carlo. We evaluate our framework on data from a benchmark Web application. We also present a simple technique for selection among nested queueing models. We are unaware of any previous work that considers inference in networks of queues in the presence of missing data.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS392 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Maximum likelihood estimation by monte carlo simulation:Toward data-driven stochastic modeling

    Get PDF
    We propose a gradient-based simulated maximum likelihood estimation to estimate unknown parameters in a stochastic model without assuming that the likelihood function of the observations is available in closed form. A key element is to develop Monte Carlo-based estimators for the density and its derivatives for the output process, using only knowledge about the dynamics of the model. We present the theory of these estimators and demonstrate how our approach can handle various types of model structures. We also support our findings and illustrate the merits of our approach with numerical results

    Filtering recursions for calculating likelihoods for queues based on inter-departure time data.

    No full text
    We consider inference for queues based on inter-departure time data. Calculating the likelihood for such models is difficult, as the likelihood involves summing up over the (exponentially-large) space of realisations of the arrival process. We demonstrate how a likelihood recursion can be used to calculate this likelihood efficiently for the specific cases of M/G/1 and Er/G/1 queues. We compare the sampling properties of the mles to the sampling properties of estimators, based on indirect inference, which have previously been suggested for this problem

    Some New Results in Sequential Monte Carlo

    Get PDF
    Sequential Monte Carlo (SMC) methods have been well studied within the context of performing inference with respect to partially observed Markov processes, and their use in this context relies upon the ability to evaluate or estimate the likelihood of a set of observed data, given the state of the latent process. In many real-world applications such as the study of population genetics and econometrics, however, this likelihood can neither be analytically evaluated nor replaced by an unbiased estimator, and so the application of exact SMC methods to these problems may be infeasible, or even impossible. The models in many of these applications are complex, yet realistic, and so development of techniques that can deal with problems of likelihood intractability can help us to perform inference for many important yet otherwise inaccessible problems; this motivates the research presented within this thesis. The main focus of this work is the application of approximate Bayesian computation (ABC) methodology to state-space models (SSMs) and the development of SMC methods in the context of these ABC SSMs for filtering and smoothing of the latent process. The introduction of ABC here avoids the need to evaluate the likelihood, at the cost of introducing a bias into the resulting filtering and smoothing estimators; this bias is explored theoretically and through simulation studies. An alternative SMC procedure, incorporating an additional rejection step, is also considered and the novel application of this rejection-based SMC procedure to the ABC approximation of the SSM is considered. This thesis will also consider the application of MCMC and SMC methods to a class of partially observed point process (PP) models. We investigate the problem of performing sequential inference for these models and note that current methods often fail. We present a new approach to smoothing in this context, using SMC samplers (Del Moral et al., 2006). This approach is illustrated, with some theoretical discussion, on a doubly stochastic PP applied in the context of finance

    Complexity penalized methods for structured and unstructured data

    Get PDF
    A fundamental goal of statisticians is to make inferences from the sample about characteristics of the underlying population. This is an inverse problem, since we are trying to recover a feature of the input with the availability of observations on an output. Towards this end, we consider complexity penalized methods, because they balance goodness of fit and generalizability of the solution. The data from the underlying population may come in diverse formats - structured or unstructured - such as probability distributions, text tokens, or graph characteristics. Depending on the defining features of the problem we can chose the appropriate complexity penalized approach, and assess the quality of the estimate produced by it. Favorable characteristics are strong theoretical guarantees of closeness to the true value and interpretability. Our work fits within this framework and spans the areas of simulation optimization, text mining and network inference. The first problem we consider is model calibration under the assumption that given a hypothesized input model, we can use stochastic simulation to obtain its corresponding output observations. We formulate it as a stochastic program by maximizing the entropy of the input distribution subject to moment matching. We then propose an iterative scheme via simulation to approximately solve it. We prove convergence of the proposed algorithm under appropriate conditions and demonstrate the performance via numerical studies. The second problem we consider is summarizing text documents through an inferred set of topics. We propose a frequentist reformulation of a Bayesian regularization scheme. Through our complexity-penalized perspective we lend further insight into the nature of the loss function and the regularization achieved through the priors in the Bayesian formulation. The third problem is concerned with the impact of sampling on the degree distribution of a network. Under many sampling designs, we have a linear inverse problem characterized by an ill-conditioned matrix. We investigate the theoretical properties of an approximate solution for the degree distribution found by regularizing the solution of the ill-conditioned least squares objective. Particularly, we study the rate at which the penalized solution tends to the true value as a function of network size and sampling rate

    Hidden Markov Models

    Get PDF
    Hidden Markov Models (HMMs), although known for decades, have made a big career nowadays and are still in state of development. This book presents theoretical issues and a variety of HMMs applications in speech recognition and synthesis, medicine, neurosciences, computational biology, bioinformatics, seismology, environment protection and engineering. I hope that the reader will find this book useful and helpful for their own research

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio
    corecore