302 research outputs found

    Generalized extreme value regression for binary response data: An application to B2B electronic payments system adoption

    Full text link
    In the information system research, a question of particular interest is to interpret and to predict the probability of a firm to adopt a new technology such that market promotions are targeted to only those firms that were more likely to adopt the technology. Typically, there exists significant difference between the observed number of ``adopters'' and ``nonadopters,'' which is usually coded as binary response. A critical issue involved in modeling such binary response data is the appropriate choice of link functions in a regression model. In this paper we introduce a new flexible skewed link function for modeling binary response data based on the generalized extreme value (GEV) distribution. We show how the proposed GEV links provide more flexible and improved skewed link regression models than the existing skewed links, especially when dealing with imbalance between the observed number of 0's and 1's in a data. The flexibility of the proposed model is illustrated through simulated data sets and a billing data set of the electronic payments system adoption from a Fortune 100 company in 2005.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS354 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Asymptotics of a Clustering Criterion for Smooth Distributions

    Full text link
    We develop a clustering framework for observations from a population with a smooth probability distribution function and derive its asymptotic properties. A clustering criterion based on a linear combination of order statistics is proposed. The asymptotic behavior of the point at which the observations are split into two clusters is examined. The results obtained can then be utilized to construct an interval estimate of the point which splits the data and develop tests for bimodality and presence of clusters

    Model-Based Method for Social Network Clustering

    Full text link
    We propose a simple mixed membership model for social network clustering in this note. A flexible function is adopted to measure affinities among a set of entities in a social network. The model not only allows each entity in the network to possess more than one membership, but also provides accurate statistical inference about network structure. We estimate the membership parameters by using an MCMC algorithm. We evaluate the performance of the proposed algorithm by applying our model to two empirical social network data, the Zachary club data and the bottlenose dolphin network data. We also conduct some numerical studies for different types of simulated networks for assessing the effectiveness of our algorithm. In the end, some concluding remarks and future work are addressed briefly

    Categorical data analysis using a skewed Weibull regression model

    Full text link
    In this paper, we present a Weibull link (skewed) model for categorical response data arising from binomial as well as multinomial model. We show that, for such types of categorical data, the most commonly used models (logit, probit and complementary log-log) can be obtained as limiting cases. We further compare the proposed model with some other asymmetrical models. The Bayesian as well as frequentist estimation procedures for binomial and multinomial data responses are presented in details. The analysis of two data sets to show the efficiency of the proposed model is performed

    Generalized Variable Selection Algorithms for Gaussian Process Models by LASSO-like Penalty

    Full text link
    With the rapid development of modern technology, massive amounts of data with complex pattern are generated. Gaussian process models that can easily fit the non-linearity in data become more and more popular nowadays. It is often the case that in some data only a few features are important or active. However, unlike classical linear models, it is challenging to identify active variables in Gaussian process models. One of the most commonly used methods for variable selection in Gaussian process models is automatic relevance determination, which is known to be open-ended. There is no rule of thumb to determine the threshold for dropping features, which makes the variable selection in Gaussian process models ambiguous. In this work, we propose two variable selection algorithms for Gaussian process models, which use the artificial nuisance columns as baseline for identifying the active features. Moreover, the proposed methods work for both regression and classification problems. The algorithms are demonstrated using comprehensive simulation experiments and an application to multi-subject electroencephalography data that studies alcoholic levels of experimental subjects

    Wavelet modeling of priors on triangles

    Get PDF
    AbstractParameters in statistical problems often live in a geometry of certain shape. For example, count probabilities in a multinomial distribution belong to a simplex. For these problems, Bayesian analysis needs to model priors satisfying certain constraints imposed by the geometry. This paper investigates modeling of priors on triangles by use of wavelets constructed specifically for triangles. Theoretical analysis and numerical simulations show that our modeling is flexible and is superior to the commonly used Dirichlet prior
    • …
    corecore