1,920 research outputs found
Catching the head, tail, and everything in between: a streaming algorithm for the degree distribution
The degree distribution is one of the most fundamental graph properties of
interest for real-world graphs. It has been widely observed in numerous domains
that graphs typically have a tailed or scale-free degree distribution. While
the average degree is usually quite small, the variance is quite high and there
are vertices with degrees at all scales. We focus on the problem of
approximating the degree distribution of a large streaming graph, with small
storage. We design an algorithm headtail, whose main novelty is a new estimator
of infrequent degrees using truncated geometric random variables. We give a
mathematical analysis of headtail and show that it has excellent behavior in
practice. We can process streams will millions of edges with storage less than
1% and get extremely accurate approximations for all scales in the degree
distribution.
We also introduce a new notion of Relative Hausdorff distance between tailed
histograms. Existing notions of distances between distributions are not
suitable, since they ignore infrequent degrees in the tail. The Relative
Hausdorff distance measures deviations at all scales, and is a more suitable
distance for comparing degree distributions. By tracking this new measure, we
are able to give strong empirical evidence of the convergence of headtail
On finite-time ruin probabilities with reinsurance cycles influenced by large claims
Market cycles play a great role in reinsurance. Cycle transitions are not independent from the claim arrival process : a large claim or a high number of claims may accelerate cycle transitions. To take this into account, a semi-Markovian risk model is proposed and analyzed. A refined Erlangization method is developed to compute the finite-time ruin probability of a reinsurance company. As this model needs the claim amounts to be Phase-type distributed, we explain how to fit mixtures of Erlang distributions to long-tailed distributions. Numerical applications and comparisons to results obtained from simulation methods are given. The impact of dependency between claim amounts and phase changes is studied.
Bayesian prediction of the transient behaviour and busy period in short and long-tailed GI/G/1 queueing systems
Bayesian inference for the transient behavior and duration of a busy period in a single server queueing
system with general, unknown distributions for the interarrival and service times is investigated. Both
the interarrival and service time distributions are approximated using the dense family of Coxian distributions. A suitable reparameterization allows the definition of a non-informative prior and Bayesian
inference is then undertaken using reversible jump Markov chain Monte Carlo methods. An advantage of
the proposed procedure is that heavy tailed interarrival and service time distributions such as the Pareto
can be well approximated. The proposed procedure for estimating the system measures is based on
recent theoretical results for the Coxian/Coxian/1 system. A numerical technique is developed for every
MCMC iteration so that the transient queue length and waiting time distributions and the duration of
a busy period can be estimated. The approach is illustrated with both simulated and real data
A note on marginal posterior simulation via higher-order tail area approximations
We explore the use of higher-order tail area approximations for Bayesian
simulation. These approximations give rise to an alternative simulation scheme
to MCMC for Bayesian computation of marginal posterior distributions for a
scalar parameter of interest, in the presence of nuisance parameters. Its
advantage over MCMC methods is that samples are drawn independently with lower
computational time and the implementation requires only standard maximum
likelihood routines. The method is illustrated by a genetic linkage model, a
normal regression with censored data and a logistic regression model
Statistical distributions for service times
Queueing models have been used extensively in the design of call centres. In particular, a queueing model will be used to describe a help desk which is a form of a call centre. The design of the queueing model involves modelling the arrival an service processes of the system.Conventionally, the arrival process is assumed to be Poisson and service times are assumed to be exponentially distributed. But it has been proposed that practically these are seldom the case. Past research reveals that the log-normal distribution can be used to model the service times in call centres. Also, services may involve stages/tasks before completion. This motivates the use of a phase-type distribution to model the underlying stages of service.This research work focuses on developing statistical models for the overall service times and the service times by job types in a particular help desk. The assumption of exponential service times was investigated and a log-normal distribution was fitted to service times of this help desk. Each stage of the service in this help desk was modelled as a phase in the phase-type distribution.Results from the analysis carried out in this work confirmed the irrelevance of the assumption of exponential service times to this help desk and it was apparent that log-normal distributions provided a reasonable fit to the service times. A phase-type distribution with three phases fitted the overall service times and the service times of administrative and miscellaneous jobs very well. For the service times of e-mail and network jobs, a phase-type distribution with two phases served as a good model.Finally, log-normal models of service times in this help desk were approximated using an order three phase-type distribution
Aggregate matrix-analytic techniques and their applications
The complexity of computer systems affects the complexity of modeling techniques that can be used for their performance analysis. In this dissertation, we develop a set of techniques that are based on tractable analytic models and enable efficient performance analysis of computer systems. Our approach is three pronged: first, we propose new techniques to parameterize measurement data with Markovian-based stochastic processes that can be further used as input into queueing systems; second, we propose new methods to efficiently solve complex queueing models; and third, we use the proposed methods to evaluate the performance of clustered Web servers and propose new load balancing policies based on this analysis.;We devise two new techniques for fitting measurement data that exhibit high variability into Phase-type (PH) distributions. These techniques apply known fitting algorithms in a divide-and-conquer fashion. We evaluate the accuracy of our methods from both the statistics and the queueing systems perspective. In addition, we propose a new methodology for fitting measurement data that exhibit long-range dependence into Markovian Arrival Processes (MAPs).;We propose a new methodology, ETAQA, for the exact solution of M/G/1-type processes, (GI/M/1-type processes, and their intersection, i.e., quasi birth-death (QBD) processes. ETAQA computes an aggregate steady state probability distribution and a set of measures of interest. E TAQA is numerically stable and computationally superior to alternative solution methods. Apart from ETAQA, we propose a new methodology for the exact solution of a class of GI/G/1-type processes based on aggregation/decomposition.;Finally, we demonstrate the applicability of the proposed techniques by evaluating load balancing policies in clustered Web servers. We address the high variability in the service process of Web servers by dedicating the servers of a cluster to requests of similar sizes and propose new, content-aware load balancing policies. Detailed analysis shows that the proposed policies achieve high user-perceived performance and, by continuously adapting their scheduling parameters to the current workload characteristics, provide good performance under conditions of transient overload
- …