214 research outputs found
Testing Statistical Hypotheses for Latent Variable Models and Some Computational Issues
In this dissertation, I address unorthodox statistical problems concerning goodness-of-fit tests
in the latent variable context and efficient statistical computations.
In epidemiological and biomedical studies observations with measurement errors are quite
common, especially when it is difficult to calibrate true signals accurately. In this first problem,
I develop a statistical test for testing equality of two distributions when the observed contaminated
data follow the classical additive measurement error model. The fact is that the two-sample
homogeneity tests, such as Kolmogorov-Smirnov, Anderson-Darling, or von Mises test, are not
consistent when observations are subject to measurement error. To develop a consistent test, first
the characteristic functions of unobservable true random variables are estimated from the contaminated
data, and then the test statistic is defined as the integrated difference between the two
estimated characteristic functions. It is shown that when the sample size is large and the null hypothesis
holds, the test statistic converges to an integral of a squared Gaussian process. However,
enumeration of this distribution to obtain the rejection region is not simple. Therefore, I propose a
bootstrap approach to compute the p-value of the test statistic. The operating characteristics of the
proposed test is assessed and compared with the other approaches via extensive simulation studies.
The proposed method is then applied to analyze the National Health and Nutrition Examination
Survey (NHANES) dataset. Although researchers considered estimation of the regression parameters
in the presence of exposure measurement error, this testing problem is completely new and no
one has considered it before.
In the next problem, I consider the stochastic frontier model (SFM) which is a widely used
model for measuring firms’ efficiency. In productivity or cost studies in the field of econometrics,
there is a discrepancy between the theoretically optimal product and the actual output for a
certain amount of inputs and this gap is called technical inefficiency. To assess this inefficiency,
the stochastic frontier model is in use to include this gap as a latent variable in addition to the
usual statistical noise. Since it is unable to observe this gap, estimation and inference depend on the distributional assumption of the technical inefficiency term. Usually, an exponential or half-normal
distribution is widely assumed for the inefficiency term. In that sense, I develop a Bayesian
test for testing whether this parametric assumption is correct. I construct a broad semiparametric
family which approximate or contain the true distribution as an alternative and then define a Bayes
factor. I show the Bayes factor consistency under certain conditions and present the finite sample
performance via Monte-Carlo simulations.
The second part of my dissertation is about statistical computational problems. Frequentist
standard errors are of interest to evaluate uncertainty of an estimator and utilized for many statistical
inference problems. In this dissertation, I consider standard error calculation for Bayes
estimators. Except some hypothetical scenarios, estimating frequentist variability of any estimator
possibly involves bootstrapping to approximate the sampling distribution of the estimator. In addition,
for a Bayesian modeling combined with Markov chain Monte Carlo (MCMC) and bootstrap
the computation of the standard error of Bayes estimator is computationally expensive and impractical.
Specifically, repeated application of the MCMC on each of the bootstrapped data make
everything computationally inefficient. To overcome this difficulty, I propose a clever use of the
importance sampling technique to reduce the computational burden. I apply this proposed technique
to several examples including logistic regression, linear measurement error model, Weibull
regression model and vector autoregressive model.
In the second computational problem, I explore the binary regression with flexible skew-probit
link function which contains traditional probit link function as a special case. The skew-probit
model is useful for modelling success probability of binary response or count data where the success
probability is not a symmetric function of continuous regressors. In this topic, I investigate the
parameter identifiability of skew-probit model. I then demonstrate that the maximum likelihood
estimator (MLE) of the skewness parameter is highly biased. I develop a penalized likelihood
approach based on three penalty functions to reduce the finite sample bias of the MLE of the
skew-probit model. The performances of each penalized MLE are compared through extensive
simulations and I analyze the heart-disease data using the proposed approaches
The Blacklisting Memory Scheduler: Balancing Performance, Fairness and Complexity
In a multicore system, applications running on different cores interfere at
main memory. This inter-application interference degrades overall system
performance and unfairly slows down applications. Prior works have developed
application-aware memory schedulers to tackle this problem. State-of-the-art
application-aware memory schedulers prioritize requests of applications that
are vulnerable to interference, by ranking individual applications based on
their memory access characteristics and enforcing a total rank order.
In this paper, we observe that state-of-the-art application-aware memory
schedulers have two major shortcomings. First, such schedulers trade off
hardware complexity in order to achieve high performance or fairness, since
ranking applications with a total order leads to high hardware complexity.
Second, ranking can unfairly slow down applications that are at the bottom of
the ranking stack. To overcome these shortcomings, we propose the Blacklisting
Memory Scheduler (BLISS), which achieves high system performance and fairness
while incurring low hardware complexity, based on two observations. First, we
find that, to mitigate interference, it is sufficient to separate applications
into only two groups. Second, we show that this grouping can be efficiently
performed by simply counting the number of consecutive requests served from
each application.
We evaluate BLISS across a wide variety of workloads/system configurations
and compare its performance and hardware complexity, with five state-of-the-art
memory schedulers. Our evaluations show that BLISS achieves 5% better system
performance and 25% better fairness than the best-performing previous scheduler
while greatly reducing critical path latency and hardware area cost of the
memory scheduler (by 79% and 43%, respectively), thereby achieving a good
trade-off between performance, fairness and hardware complexity
DeLTA: GPU Performance Model for Deep Learning Applications with In-depth Memory System Traffic Analysis
Training convolutional neural networks (CNNs) requires intense compute
throughput and high memory bandwidth. Especially, convolution layers account
for the majority of the execution time of CNN training, and GPUs are commonly
used to accelerate these layer workloads. GPU design optimization for efficient
CNN training acceleration requires the accurate modeling of how their
performance improves when computing and memory resources are increased. We
present DeLTA, the first analytical model that accurately estimates the traffic
at each GPU memory hierarchy level, while accounting for the complex reuse
patterns of a parallel convolution algorithm. We demonstrate that our model is
both accurate and robust for different CNNs and GPU architectures. We then show
how this model can be used to carefully balance the scaling of different GPU
resources for efficient CNN performance improvement
- …