21 research outputs found
Regenerative Simulation for Queueing Networks with Exponential or Heavier Tail Arrival Distributions
Multiclass open queueing networks find wide applications in communication,
computer and fabrication networks. Often one is interested in steady-state
performance measures associated with these networks. Conceptually, under mild
conditions, a regenerative structure exists in multiclass networks, making them
amenable to regenerative simulation for estimating the steady-state performance
measures. However, typically, identification of a regenerative structure in
these networks is difficult. A well known exception is when all the
interarrival times are exponentially distributed, where the instants
corresponding to customer arrivals to an empty network constitute a
regenerative structure. In this paper, we consider networks where the
interarrival times are generally distributed but have exponential or heavier
tails. We show that these distributions can be decomposed into a mixture of
sums of independent random variables such that at least one of the components
is exponentially distributed. This allows an easily implementable embedded
regenerative structure in the Markov process. We show that under mild
conditions on the network primitives, the regenerative mean and standard
deviation estimators are consistent and satisfy a joint central limit theorem
useful for constructing asymptotically valid confidence intervals. We also show
that amongst all such interarrival time decompositions, the one with the
largest mean exponential component minimizes the asymptotic variance of the
standard deviation estimator.Comment: A preliminary version of this paper will appear in Proceedings of
Winter Simulation Conference, Washington, DC, 201
Column Subset Selection and Nystr\"om Approximation via Continuous Optimization
We propose a continuous optimization algorithm for the Column Subset
Selection Problem (CSSP) and Nystr\"om approximation. The CSSP and Nystr\"om
method construct low-rank approximations of matrices based on a predetermined
subset of columns. It is well known that choosing the best column subset of
size is a difficult combinatorial problem. In this work, we show how one
can approximate the optimal solution by defining a penalized continuous loss
function which is minimized via stochastic gradient descent. We show that the
gradients of this loss function can be estimated efficiently using
matrix-vector products with a data matrix in the case of the CSSP or a
kernel matrix in the case of the Nystr\"om approximation. We provide
numerical results for a number of real datasets showing that this continuous
optimization is competitive against existing methods
Generalized Linear Models via the Lasso: To Scale or Not to Scale?
The Lasso regression is a popular regularization method for feature selection
in statistics. Prior to computing the Lasso estimator in both linear and
generalized linear models, it is common to conduct a preliminary rescaling of
the feature matrix to ensure that all the features are standardized. Without
this standardization, it is argued, the Lasso estimate will unfortunately
depend on the units used to measure the features. We propose a new type of
iterative rescaling of the features in the context of generalized linear
models. Whilst existing Lasso algorithms perform a single scaling as a
preprocessing step, the proposed rescaling is applied iteratively throughout
the Lasso computation until convergence. We provide numerical examples, with
both real and simulated data, illustrating that the proposed iterative
rescaling can significantly improve the statistical performance of the Lasso
estimator without incurring any significant additional computational cost
Variance Reduction for Matrix Computations with Applications to Gaussian Processes
In addition to recent developments in computing speed and memory,
methodological advances have contributed to significant gains in the
performance of stochastic simulation. In this paper, we focus on variance
reduction for matrix computations via matrix factorization. We provide insights
into existing variance reduction methods for estimating the entries of large
matrices. Popular methods do not exploit the reduction in variance that is
possible when the matrix is factorized. We show how computing the square root
factorization of the matrix can achieve in some important cases arbitrarily
better stochastic performance. In addition, we propose a factorized estimator
for the trace of a product of matrices and numerically demonstrate that the
estimator can be up to 1,000 times more efficient on certain problems of
estimating the log-likelihood of a Gaussian process. Additionally, we provide a
new estimator of the log-determinant of a positive semi-definite matrix where
the log-determinant is treated as a normalizing constant of a probability
density.Comment: 20 pages, 3 figure
COMBSS: Best Subset Selection via Continuous Optimization
The problem of best subset selection in linear regression is considered with
the aim to find a fixed size subset of features that best fits the response.
This is particularly challenging when the total available number of features is
very large compared to the number of data samples. Existing optimal methods for
solving this problem tend to be slow while fast methods tend to have low
accuracy. Ideally, new methods perform best subset selection faster than
existing optimal methods but with comparable accuracy, or, being more accurate
than methods of comparable computational speed. Here, we propose a novel
continuous optimization method that identifies a subset solution path, a small
set of models of varying size, that consists of candidates for the single best
subset of features, that is optimal in a specific sense in linear regression.
Our method turns out to be fast, making the best subset selection possible when
the number of features is well in excess of thousands. Because of the
outstanding overall performance, framing the best subset selection challenge as
a continuous optimization problem opens new research directions for feature
extraction for a large variety of regression models
Rare Events in Random Geometric Graphs
This work introduces and compares approaches for estimating rare-event probabilities related to the number of edges in the random geometric graph on a Poisson point process. In the one-dimensional setting, we derive closed-form expressions for a variety of conditional probabilities related to the number of edges in the random geometric graph and develop conditional Monte Carlo algorithms for estimating rare-event probabilities on this basis. We prove rigorously a reduction in variance when compared to the crude Monte Carlo estimators and illustrate the magnitude of the improvements in a simulation study. In higher dimensions, we use conditional Monte Carlo to remove the fluctuations in the estimator coming from the randomness in the Poisson number of nodes. Finally, building on conceptual insights from large-deviations theory, we illustrate that importance sampling using a Gibbsian point process can further substantially reduce the estimation variance