8,225 research outputs found
Extended Ordered Paired Comparison Models with Application to Football Data from German Bundesliga
A general paired comparison model for the evaluation of sports competitions
is proposed. It efficiently uses the available information by allowing
for ordered response categories and team-specific home advantage effects.
Penalized estimation techniques are used to identify clusters of teams that
share the same ability. The model is extended to include team-specific
explanatory variables. It is shown that regularization techniques allow to
identify the contribution of explanatory variables to the success of teams.
The usefulness of the methods is demonstrated by investigating the performance
and its dependence on the budget for football teams of the German
Bundesliga
Regularization and Model Selection with Categorial Effect Modifiers
The case of continuous effect modifiers in varying-coefficient models has been well investigated. Categorial effect modifiers, however, have been largely neglected. In this paper a regularization technique is proposed that allows for selection of covariates and fusion of categories of categorial effect modifiers in a linear model. It is distinguished between nominal and ordinal variables, since for the latter more economic parametrizations are warranted. The proposed methods are illustrated and investigated in simulation studies and real world data evaluations. Moreover, some asymptotic properties are derived
Extended Ordered Paired Comparison Models with Application to Football Data from German Bundesliga
A general paired comparison model for the evaluation of sports competitions
is proposed. It efficiently uses the available information by allowing
for ordered response categories and team-specific home advantage effects.
Penalized estimation techniques are used to identify clusters of teams that
share the same ability. The model is extended to include team-specific
explanatory variables. It is shown that regularization techniques allow to
identify the contribution of explanatory variables to the success of teams.
The usefulness of the methods is demonstrated by investigating the performance
and its dependence on the budget for football teams of the German
Bundesliga
Multiple Imputation Using Gaussian Copulas
Missing observations are pervasive throughout empirical research, especially
in the social sciences. Despite multiple approaches to dealing adequately with
missing data, many scholars still fail to address this vital issue. In this
paper, we present a simple-to-use method for generating multiple imputations
using a Gaussian copula. The Gaussian copula for multiple imputation (Hoff,
2007) allows scholars to attain estimation results that have good coverage and
small bias. The use of copulas to model the dependence among variables will
enable researchers to construct valid joint distributions of the data, even
without knowledge of the actual underlying marginal distributions. Multiple
imputations are then generated by drawing observations from the resulting
posterior joint distribution and replacing the missing values. Using simulated
and observational data from published social science research, we compare
imputation via Gaussian copulas with two other widely used imputation methods:
MICE and Amelia II. Our results suggest that the Gaussian copula approach has a
slightly smaller bias, higher coverage rates, and narrower confidence intervals
compared to the other methods. This is especially true when the variables with
missing data are not normally distributed. These results, combined with
theoretical guarantees and ease-of-use suggest that the approach examined
provides an attractive alternative for applied researchers undertaking multiple
imputations
Recommended from our members
On Nonregularized Estimation of Psychological Networks.
An important goal for psychological science is developing methods to characterize relationships between variables. Customary approaches use structural equation models to connect latent factors to a number of observed measurements, or test causal hypotheses between observed variables. More recently, regularized partial correlation networks have been proposed as an alternative approach for characterizing relationships among variables through off-diagonal elements in the precision matrix. While the graphical Lasso (glasso) has emerged as the default network estimation method, it was optimized in fields outside of psychology with very different needs, such as high dimensional data where the number of variables (p) exceeds the number of observations (n). In this article, we describe the glasso method in the context of the fields where it was developed, and then we demonstrate that the advantages of regularization diminish in settings where psychological networks are often fitted ( p≪n ). We first show that improved properties of the precision matrix, such as eigenvalue estimation, and predictive accuracy with cross-validation are not always appreciable. We then introduce nonregularized methods based on multiple regression and a nonparametric bootstrap strategy, after which we characterize performance with extensive simulations. Our results demonstrate that the nonregularized methods can be used to reduce the false-positive rate, compared to glasso, and they appear to provide consistent performance across sparsity levels, sample composition (p/n), and partial correlation size. We end by reviewing recent findings in the statistics literature that suggest alternative methods often have superior performance than glasso, as well as suggesting areas for future research in psychology. The nonregularized methods have been implemented in the R package GGMnonreg
Scalable Privacy-Compliant Virality Prediction on Twitter
The digital town hall of Twitter becomes a preferred medium of communication
for individuals and organizations across the globe. Some of them reach
audiences of millions, while others struggle to get noticed. Given the impact
of social media, the question remains more relevant than ever: how to model the
dynamics of attention in Twitter. Researchers around the world turn to machine
learning to predict the most influential tweets and authors, navigating the
volume, velocity, and variety of social big data, with many compromises. In
this paper, we revisit content popularity prediction on Twitter. We argue that
strict alignment of data acquisition, storage and analysis algorithms is
necessary to avoid the common trade-offs between scalability, accuracy and
privacy compliance. We propose a new framework for the rapid acquisition of
large-scale datasets, high accuracy supervisory signal and multilanguage
sentiment prediction while respecting every privacy request applicable. We then
apply a novel gradient boosting framework to achieve state-of-the-art results
in virality ranking, already before including tweet's visual or propagation
features. Our Gradient Boosted Regression Tree is the first to offer
explainable, strong ranking performance on benchmark datasets. Since the
analysis focused on features available early, the model is immediately
applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective
Content Analysi
Efficiency characterization of a large neuronal network: a causal information approach
When inhibitory neurons constitute about 40% of neurons they could have an
important antinociceptive role, as they would easily regulate the level of
activity of other neurons. We consider a simple network of cortical spiking
neurons with axonal conduction delays and spike timing dependent plasticity,
representative of a cortical column or hypercolumn with large proportion of
inhibitory neurons. Each neuron fires following a Hodgkin-Huxley like dynamics
and it is interconnected randomly to other neurons. The network dynamics is
investigated estimating Bandt and Pompe probability distribution function
associated to the interspike intervals and taking different degrees of
inter-connectivity across neurons. More specifically we take into account the
fine temporal ``structures'' of the complex neuronal signals not just by using
the probability distributions associated to the inter spike intervals, but
instead considering much more subtle measures accounting for their causal
information: the Shannon permutation entropy, Fisher permutation information
and permutation statistical complexity. This allows us to investigate how the
information of the system might saturate to a finite value as the degree of
inter-connectivity across neurons grows, inferring the emergent dynamical
properties of the system.Comment: 26 pages, 3 Figures; Physica A, in pres
- …