78 research outputs found

    A model based approach to Spotify data analysis: a Beta GLMM

    Get PDF
    Digital music distribution is increasingly powered by automated mechanisms that continuously capture, sort and analyze large amounts of Web-based data. This paper deals with the management of songs audio features from a statistical point of view. In particular, it explores the data catching mechanisms enabled by Spotify Web API, and suggests statistical tools for the analysis of these data. Special attention is devoted to songs popularity and a Beta model including random eïŹ€ects is proposed in order to give a ïŹrst answer to questions like: which are the determinants of popularity? The identiïŹcation of a model able to describe this relationship, the determination within the set of characteristics of those considered most important in making a song popular is a very interesting topic for those who aim to predict the success of new products

    Clustering alternatives in preference-approvals via novel pseudometrics

    Get PDF
    Preference-approval structures combine preference rankings and approval voting for declaring opinions over a set of alternatives. In this paper, we propose a new procedure for clustering alternatives in order to reduce the complexity of the preferenceapproval space and provide a more accessible interpretation of data. To that end, we present a new family of pseudometrics on the set of alternatives that take into account voters’ preferences via preference-approvals. To obtain clusters, we use the Ranked k-medoids (RKM) partitioning algorithm, which takes as input the similarities between pairs of alternatives based on the proposed pseudometrics. Finally, using non-metric multidimensional scaling, clusters are represented in 2-dimensional space

    The Neutrophil-to-Lymphocyte Ratio is Related to Disease Activity in Relapsing Remitting Multiple Sclerosis

    Get PDF
    Background: The role of the neutrophil-to-lymphocyte ratio (NLR) of peripheral blood has been investigated in relation to several autoimmune diseases. Limited studies have addressed the significance of the NLR in terms of being a marker of disease activity in multiple sclerosis (MS). Methods: This is a retrospective study in relapsing\u2013remitting MS patients (RRMS) admitted to the tertiary MS center of Catania, Italy during the period of 1 January to 31 December 2018. The aim of the present study was to investigate the significance of the NLR in reflecting the disease activity in a cohort of early diagnosed RRMS patients. Results: Among a total sample of 132 patients diagnosed with RRMS, 84 were enrolled in the present study. In the association analysis, a relation between the NLR value and disease activity at onset was found (V-Cramer 0.271, p = 0.013). In the logistic regression model, the variable NLR (p = 0.03 ExpB 3.5, CI 95% 1.089\u201311.4) was related to disease activity at onset. Conclusion: An elevated NLR is associated with disease activity at onset in RRMS patients. More large-scale studies with a longer follow-up are needed

    Discrete Beta and Shifted Beta-Binomial models for rating and ranking data

    Get PDF
    Ranking and rating methods for preference data result in a different underlying organization of data that can lead to manifold probabilistic approaches to data modelling. As an alternative to existing approaches, two new flexible probability distributions are discussed as a modelling framework: the Discrete Beta and the Shifted Beta-Binomial. Through the presentation of three real-world examples, we demonstrate the practical utility of these distributions. These illustrative cases show how these novel distributions can effectively address real-world challenges, with a particular focus on data derived from surveys concerning environmental issues. Our analysis highlights the new distributions’ capability to capture the inherent structures within preference data, offering valuable insights into the field

    Variable selection in mixed models: a graphical approach

    No full text
    Model selection can be defined as the task of estimating the performance of dif- ferent models in order to choose the (approximate) best one. The purpose of this article is to introduce an extension of the graphical representation of deviance proposed in the framework of classical and generalized linear models to the wider class of mixed models. The proposed plot is useful in determining which are the important explanatory variables conditioning on the random effects part. The applicability and the easy interpretation of the graph are illus- trated with a real data examples

    Random forest analysis: a new approach for classication of Beta Thalassemia

    Get PDF
    In recent years, Thalassemia care providers started classifying patients as transfusion- dependent-Thalassemia (TDT) or non-transfusion-dependent-Thalassemia (NTDT) owing to the established role of transfusion therapy in dening the clinical complication prole, although this classication was also based on expert opinion and is limited by reliance on patients'current transfusion status. Starting from a vast set of variables indicating severity phenotype, through the use of both classication and clustering techniques we want to explore the presence of two (TDT vs NTDT) or more clusters, in order to approaching to a new denition for the classication of Beta-Thalassemia in Thalassemia Syndromes (TS)

    Weighted and unweighted distances based decision tree for ranking data

    Get PDF
    Preference data represent a particular type of ranking data (widely used in sports, web search, social sciences), where a group of people gives their preferences over a set of alternatives. Within this framework, distance-based decision trees represent a non-parametric tool for identifying the profiles of subjects giving a similar ranking. This paper aims at detecting, in the framework of (complete and incomplete) ranking data, the impact of the differently structured weighted distances for building decision trees. The traditional metrics between rankings don’t take into account the importance of swapping elements similar among them (element weights) or elements belonging to the top (or to the bottom) of an ordering (position weights). By means of simulations, using weighted distances to build decision trees, we will compute the impact of different weighting structures both on splitting and on consensus ranking. The distances that will be used satisfy Kemenys axioms and, accordingly, a modified version of the rank correlation coefficient τx, proposed by Edmond and Mason, will be proposed and used for assessing the trees’ goodness

    GAMLSS for high-variability data: an application to liver fibrosis case.

    No full text
    In this paper, we propose management of the problem caused by overdispersed data by applying the generalized additive model for location, scale and shape framework (GAMLSS) as introduced by Rigby and Stasinopoulos (2005). The idea of using a GAMLSS approach for handling our problem comes from the idea of Aitkin (1996) consisting in the use of an EM maximum likelihood estimation algorithm (Dempster, Laird, and Rubin, 1977) to deal with overdispersed generalized linear models (GLM). As in the GLM case, the algorithm is initially derived as a form of Gaussian quadrature assuming a normal mixing distribution. The GAMLSS specification allows the extension of the Aitkin algorithm to probability distributions not belonging to the exponential family. In particular, aim of this work is to show the importance of using a GAMLSS strutcure when a mixture is used to provide a natural representation of heterogeneity in a finite number of latent classes (Celeux and Diebolt, 1992)

    New Flexible Probability distributions for ranking data

    No full text
    Recently, several models have been proposed in literature for analyzing ranks assigned by people to some object. These models summarize the liking feeling for this object, possibly also with respect to a set of explanatory variables. Some recent works have suggested the use of the Shifted Binomial and of the Inverse Hypergeometric distribution for modelling the approval rate, while mixture models have been developed for taking into account the uncertainty of the ranking process. We propose two new probabilistic models, based on the Discrete Beta and the Shifted-Beta Binomial distributions, that ensure much flexibility and allow the joint modelling of the scale (approval rate) and the shape (uncertainty) of the distribution of the ranks assigned to the object
    • 

    corecore