78 research outputs found
A model based approach to Spotify data analysis: a Beta GLMM
Digital music distribution is increasingly powered by automated mechanisms that continuously capture, sort and analyze large amounts of Web-based data. This paper deals with the management of songs audio features from a statistical point of view. In particular, it explores the data catching mechanisms enabled by Spotify Web API, and suggests statistical tools for the analysis of these data. Special attention is devoted to songs popularity and a Beta model including random eïŹects is proposed in order to give a ïŹrst answer to questions like: which are the determinants of popularity? The identiïŹcation of a model able to describe this relationship, the determination within the set of characteristics of those considered most important in making a song popular is a very interesting topic for those who aim to predict the success of new products
Clustering alternatives in preference-approvals via novel pseudometrics
Preference-approval structures combine preference rankings and approval voting for
declaring opinions over a set of alternatives. In this paper, we propose a new procedure
for clustering alternatives in order to reduce the complexity of the preferenceapproval
space and provide a more accessible interpretation of data. To that end,
we present a new family of pseudometrics on the set of alternatives that take into
account votersâ preferences via preference-approvals. To obtain clusters, we use the
Ranked k-medoids (RKM) partitioning algorithm, which takes as input the similarities
between pairs of alternatives based on the proposed pseudometrics. Finally,
using non-metric multidimensional scaling, clusters are represented in 2-dimensional
space
The Neutrophil-to-Lymphocyte Ratio is Related to Disease Activity in Relapsing Remitting Multiple Sclerosis
Background: The role of the neutrophil-to-lymphocyte ratio (NLR) of peripheral blood
has been investigated in relation to several autoimmune diseases. Limited studies have addressed
the significance of the NLR in terms of being a marker of disease activity in multiple sclerosis (MS).
Methods: This is a retrospective study in relapsing\u2013remitting MS patients (RRMS) admitted to the
tertiary MS center of Catania, Italy during the period of 1 January to 31 December 2018. The aim of
the present study was to investigate the significance of the NLR in reflecting the disease activity in a
cohort of early diagnosed RRMS patients. Results: Among a total sample of 132 patients diagnosed
with RRMS, 84 were enrolled in the present study. In the association analysis, a relation between
the NLR value and disease activity at onset was found (V-Cramer 0.271, p = 0.013). In the logistic
regression model, the variable NLR (p = 0.03 ExpB 3.5, CI 95% 1.089\u201311.4) was related to disease
activity at onset. Conclusion: An elevated NLR is associated with disease activity at onset in RRMS
patients. More large-scale studies with a longer follow-up are needed
Discrete Beta and Shifted Beta-Binomial models for rating and ranking data
Ranking and rating methods for preference data result in a different underlying
organization of data that can lead to manifold probabilistic approaches to data modelling.
As an alternative to existing approaches, two new flexible probability distributions
are discussed as a modelling framework: the Discrete Beta and the Shifted
Beta-Binomial. Through the presentation of three real-world examples, we demonstrate
the practical utility of these distributions. These illustrative cases show how
these novel distributions can effectively address real-world challenges, with a particular
focus on data derived from surveys concerning environmental issues. Our
analysis highlights the new distributionsâ capability to capture the inherent structures
within preference data, offering valuable insights into the field
Variable selection in mixed models: a graphical approach
Model selection can be defined as the task of estimating the performance of dif-
ferent models in order to choose the (approximate) best one. The purpose of this article is to
introduce an extension of the graphical representation of deviance proposed in the framework
of classical and generalized linear models to the wider class of mixed models. The proposed
plot is useful in determining which are the important explanatory variables conditioning on
the random effects part. The applicability and the easy interpretation of the graph are illus-
trated with a real data examples
Random forest analysis: a new approach for classication of Beta Thalassemia
In recent years, Thalassemia care providers started classifying patients as transfusion-
dependent-Thalassemia (TDT) or non-transfusion-dependent-Thalassemia (NTDT) owing to
the established role of transfusion therapy in dening the clinical complication prole, although
this classication was also based on expert opinion and is limited by reliance on patients'current
transfusion status. Starting from a vast set of variables indicating severity phenotype, through
the use of both classication and clustering techniques we want to explore the presence of
two (TDT vs NTDT) or more clusters, in order to approaching to a new denition for the
classication of Beta-Thalassemia in Thalassemia Syndromes (TS)
Weighted and unweighted distances based decision tree for ranking data
Preference data represent a particular type of ranking data (widely used
in sports, web search, social sciences), where a group of people gives their preferences
over a set of alternatives. Within this framework, distance-based decision
trees represent a non-parametric tool for identifying the profiles of subjects giving
a similar ranking. This paper aims at detecting, in the framework of (complete
and incomplete) ranking data, the impact of the differently structured weighted distances
for building decision trees. The traditional metrics between rankings donât
take into account the importance of swapping elements similar among them (element
weights) or elements belonging to the top (or to the bottom) of an ordering
(position weights). By means of simulations, using weighted distances to build decision
trees, we will compute the impact of different weighting structures both on
splitting and on consensus ranking. The distances that will be used satisfy Kemenys
axioms and, accordingly, a modified version of the rank correlation coefficient Ïx,
proposed by Edmond and Mason, will be proposed and used for assessing the treesâ
goodness
GAMLSS for high-variability data: an application to liver fibrosis case.
In this paper, we propose management of the
problem caused by overdispersed data by applying the generalized additive model
for location, scale and shape framework (GAMLSS) as introduced by Rigby and
Stasinopoulos (2005). The idea of using a GAMLSS approach for handling our
problem comes from the idea of Aitkin (1996) consisting in the use of an EM maximum
likelihood estimation algorithm (Dempster, Laird, and Rubin, 1977) to deal
with overdispersed generalized linear models (GLM). As in the GLM case, the algorithm
is initially derived as a form of Gaussian quadrature assuming a normal
mixing distribution. The GAMLSS specification allows the extension of the Aitkin
algorithm to probability distributions not belonging to the exponential family. In
particular, aim of this work is to show the importance of using a GAMLSS strutcure
when a mixture is used to provide a natural representation of heterogeneity in a finite
number of latent classes (Celeux and Diebolt, 1992)
New Flexible Probability distributions for ranking data
Recently, several models have been proposed in literature for analyzing
ranks assigned by people to some object. These models summarize the liking feeling
for this object, possibly also with respect to a set of explanatory variables. Some
recent works have suggested the use of the Shifted Binomial and of the Inverse Hypergeometric
distribution for modelling the approval rate, while mixture models
have been developed for taking into account the uncertainty of the ranking process.
We propose two new probabilistic models, based on the Discrete Beta and the
Shifted-Beta Binomial distributions, that ensure much flexibility and allow the joint
modelling of the scale (approval rate) and the shape (uncertainty) of the distribution
of the ranks assigned to the object
- âŠ