8,225 research outputs found

    Extended Ordered Paired Comparison Models with Application to Football Data from German Bundesliga

    Get PDF
    A general paired comparison model for the evaluation of sports competitions is proposed. It efficiently uses the available information by allowing for ordered response categories and team-specific home advantage effects. Penalized estimation techniques are used to identify clusters of teams that share the same ability. The model is extended to include team-specific explanatory variables. It is shown that regularization techniques allow to identify the contribution of explanatory variables to the success of teams. The usefulness of the methods is demonstrated by investigating the performance and its dependence on the budget for football teams of the German Bundesliga

    Regularization and Model Selection with Categorial Effect Modifiers

    Get PDF
    The case of continuous effect modifiers in varying-coefficient models has been well investigated. Categorial effect modifiers, however, have been largely neglected. In this paper a regularization technique is proposed that allows for selection of covariates and fusion of categories of categorial effect modifiers in a linear model. It is distinguished between nominal and ordinal variables, since for the latter more economic parametrizations are warranted. The proposed methods are illustrated and investigated in simulation studies and real world data evaluations. Moreover, some asymptotic properties are derived

    Extended Ordered Paired Comparison Models with Application to Football Data from German Bundesliga

    Get PDF
    A general paired comparison model for the evaluation of sports competitions is proposed. It efficiently uses the available information by allowing for ordered response categories and team-specific home advantage effects. Penalized estimation techniques are used to identify clusters of teams that share the same ability. The model is extended to include team-specific explanatory variables. It is shown that regularization techniques allow to identify the contribution of explanatory variables to the success of teams. The usefulness of the methods is demonstrated by investigating the performance and its dependence on the budget for football teams of the German Bundesliga

    Multiple Imputation Using Gaussian Copulas

    Get PDF
    Missing observations are pervasive throughout empirical research, especially in the social sciences. Despite multiple approaches to dealing adequately with missing data, many scholars still fail to address this vital issue. In this paper, we present a simple-to-use method for generating multiple imputations using a Gaussian copula. The Gaussian copula for multiple imputation (Hoff, 2007) allows scholars to attain estimation results that have good coverage and small bias. The use of copulas to model the dependence among variables will enable researchers to construct valid joint distributions of the data, even without knowledge of the actual underlying marginal distributions. Multiple imputations are then generated by drawing observations from the resulting posterior joint distribution and replacing the missing values. Using simulated and observational data from published social science research, we compare imputation via Gaussian copulas with two other widely used imputation methods: MICE and Amelia II. Our results suggest that the Gaussian copula approach has a slightly smaller bias, higher coverage rates, and narrower confidence intervals compared to the other methods. This is especially true when the variables with missing data are not normally distributed. These results, combined with theoretical guarantees and ease-of-use suggest that the approach examined provides an attractive alternative for applied researchers undertaking multiple imputations

    Scalable Privacy-Compliant Virality Prediction on Twitter

    Get PDF
    The digital town hall of Twitter becomes a preferred medium of communication for individuals and organizations across the globe. Some of them reach audiences of millions, while others struggle to get noticed. Given the impact of social media, the question remains more relevant than ever: how to model the dynamics of attention in Twitter. Researchers around the world turn to machine learning to predict the most influential tweets and authors, navigating the volume, velocity, and variety of social big data, with many compromises. In this paper, we revisit content popularity prediction on Twitter. We argue that strict alignment of data acquisition, storage and analysis algorithms is necessary to avoid the common trade-offs between scalability, accuracy and privacy compliance. We propose a new framework for the rapid acquisition of large-scale datasets, high accuracy supervisory signal and multilanguage sentiment prediction while respecting every privacy request applicable. We then apply a novel gradient boosting framework to achieve state-of-the-art results in virality ranking, already before including tweet's visual or propagation features. Our Gradient Boosted Regression Tree is the first to offer explainable, strong ranking performance on benchmark datasets. Since the analysis focused on features available early, the model is immediately applicable to incoming tweets in 18 languages.Comment: AffCon@AAAI-19 Best Paper Award; Presented at AAAI-19 W1: Affective Content Analysi

    Efficiency characterization of a large neuronal network: a causal information approach

    Full text link
    When inhibitory neurons constitute about 40% of neurons they could have an important antinociceptive role, as they would easily regulate the level of activity of other neurons. We consider a simple network of cortical spiking neurons with axonal conduction delays and spike timing dependent plasticity, representative of a cortical column or hypercolumn with large proportion of inhibitory neurons. Each neuron fires following a Hodgkin-Huxley like dynamics and it is interconnected randomly to other neurons. The network dynamics is investigated estimating Bandt and Pompe probability distribution function associated to the interspike intervals and taking different degrees of inter-connectivity across neurons. More specifically we take into account the fine temporal ``structures'' of the complex neuronal signals not just by using the probability distributions associated to the inter spike intervals, but instead considering much more subtle measures accounting for their causal information: the Shannon permutation entropy, Fisher permutation information and permutation statistical complexity. This allows us to investigate how the information of the system might saturate to a finite value as the degree of inter-connectivity across neurons grows, inferring the emergent dynamical properties of the system.Comment: 26 pages, 3 Figures; Physica A, in pres
    • …
    corecore