302 research outputs found
Generalized extreme value regression for binary response data: An application to B2B electronic payments system adoption
In the information system research, a question of particular interest is to
interpret and to predict the probability of a firm to adopt a new technology
such that market promotions are targeted to only those firms that were more
likely to adopt the technology. Typically, there exists significant difference
between the observed number of ``adopters'' and ``nonadopters,'' which is
usually coded as binary response. A critical issue involved in modeling such
binary response data is the appropriate choice of link functions in a
regression model. In this paper we introduce a new flexible skewed link
function for modeling binary response data based on the generalized extreme
value (GEV) distribution. We show how the proposed GEV links provide more
flexible and improved skewed link regression models than the existing skewed
links, especially when dealing with imbalance between the observed number of
0's and 1's in a data. The flexibility of the proposed model is illustrated
through simulated data sets and a billing data set of the electronic payments
system adoption from a Fortune 100 company in 2005.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS354 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Asymptotics of a Clustering Criterion for Smooth Distributions
We develop a clustering framework for observations from a population with a
smooth probability distribution function and derive its asymptotic properties.
A clustering criterion based on a linear combination of order statistics is
proposed. The asymptotic behavior of the point at which the observations are
split into two clusters is examined. The results obtained can then be utilized
to construct an interval estimate of the point which splits the data and
develop tests for bimodality and presence of clusters
Model-Based Method for Social Network Clustering
We propose a simple mixed membership model for social network clustering in
this note. A flexible function is adopted to measure affinities among a set of
entities in a social network. The model not only allows each entity in the
network to possess more than one membership, but also provides accurate
statistical inference about network structure. We estimate the membership
parameters by using an MCMC algorithm. We evaluate the performance of the
proposed algorithm by applying our model to two empirical social network data,
the Zachary club data and the bottlenose dolphin network data. We also conduct
some numerical studies for different types of simulated networks for assessing
the effectiveness of our algorithm. In the end, some concluding remarks and
future work are addressed briefly
Categorical data analysis using a skewed Weibull regression model
In this paper, we present a Weibull link (skewed) model for categorical
response data arising from binomial as well as multinomial model. We show that,
for such types of categorical data, the most commonly used models (logit,
probit and complementary log-log) can be obtained as limiting cases. We further
compare the proposed model with some other asymmetrical models. The Bayesian as
well as frequentist estimation procedures for binomial and multinomial data
responses are presented in details. The analysis of two data sets to show the
efficiency of the proposed model is performed
Generalized Variable Selection Algorithms for Gaussian Process Models by LASSO-like Penalty
With the rapid development of modern technology, massive amounts of data with
complex pattern are generated. Gaussian process models that can easily fit the
non-linearity in data become more and more popular nowadays. It is often the
case that in some data only a few features are important or active. However,
unlike classical linear models, it is challenging to identify active variables
in Gaussian process models. One of the most commonly used methods for variable
selection in Gaussian process models is automatic relevance determination,
which is known to be open-ended. There is no rule of thumb to determine the
threshold for dropping features, which makes the variable selection in Gaussian
process models ambiguous. In this work, we propose two variable selection
algorithms for Gaussian process models, which use the artificial nuisance
columns as baseline for identifying the active features. Moreover, the proposed
methods work for both regression and classification problems. The algorithms
are demonstrated using comprehensive simulation experiments and an application
to multi-subject electroencephalography data that studies alcoholic levels of
experimental subjects
Wavelet modeling of priors on triangles
AbstractParameters in statistical problems often live in a geometry of certain shape. For example, count probabilities in a multinomial distribution belong to a simplex. For these problems, Bayesian analysis needs to model priors satisfying certain constraints imposed by the geometry. This paper investigates modeling of priors on triangles by use of wavelets constructed specifically for triangles. Theoretical analysis and numerical simulations show that our modeling is flexible and is superior to the commonly used Dirichlet prior
- …