66 research outputs found
Clustering of discretely observed diffusion processes
In this paper a new dissimilarity measure to identify groups of assets
dynamics is proposed. The underlying generating process is assumed to be a
diffusion process solution of stochastic differential equations and observed at
discrete time. The mesh of observations is not required to shrink to zero. As
distance between two observed paths, the quadratic distance of the
corresponding estimated Markov operators is considered. Analysis of both
synthetic data and real financial data from NYSE/NASDAQ stocks, give evidence
that this distance seems capable to catch differences in both the drift and
diffusion coefficients contrary to other commonly used metrics
Social networks, happiness and health: from sentiment analysis to a multidimensional indicator of subjective well-being
This paper applies a novel technique of opinion analysis over social media
data with the aim of proposing a new indicator of perceived and subjective
well-being. This new index, namely SWBI, examines several dimension of
individual and social life. The indicator has been compared to some other
existing indexes of well-being and health conditions in Italy: the BES
(Benessere Equo Sostenibile), the incidence rate of influenza and the abundance
of PM10 in urban environments. SWBI is a daily measure available at province
level. BES data, currently available only for 2013 and 2014, are annual and
available at regional level. Flu data are weekly and distributed as regional
data and PM10 are collected daily for different cities. Due to the fact that
the time scale and space granularity of the different indexes varies, we apply
a novel statistical technique to discover nowcasting features and the classical
latent analysis to study the relationships among them. A preliminary analysis
suggest that the environmental and health conditions anticipate several
dimensions of the perception of well-being as measured by SWBI. Moreover, the
set of indicators included in the BES represent a latent dimension of
well-being which shares similarities with the latent dimension represented by
SWBI.Comment: 26 pages, 5 figur
Is Japanese gendered language used on Twitter ? A large scale study
This study analyzes the usage of Japanese gendered language on Twitter.
Starting from a collection of 408 million Japanese tweets from 2015 till 2019
and an additional sample of 2355 manually classified Twitter accounts timelines
into gender and categories (politicians, musicians, etc). A large scale textual
analysis is performed on this corpus to identify and examine sentence-final
particles (SFPs) and first-person pronouns appearing in the texts. It turns out
that gendered language is in fact used also on Twitter, in about 6% of the
tweets, and that the prescriptive classification into "male" and "female"
language does not always meet the expectations, with remarkable exceptions.
Further, SFPs and pronouns show increasing or decreasing trends, indicating an
evolution of the language used on Twitter
Forecasting asylum-related migration flows with machine learning and data at scale
The effects of the so-called "refugee crisis" of 2015-16 continue to dominate
the political agenda in Europe. Migration flows were sudden and unexpected,
leaving governments unprepared and exposing significant shortcomings in the
field of migration forecasting. Migration is a complex system typified by
episodic variation, underpinned by causal factors that are interacting, highly
context dependent and short-lived. Correspondingly, migration monitoring relies
on scattered data, while approaches to forecasting focus on specific migration
flows and often have inconsistent results that are difficult to generalise at
the regional or global levels.
Here we show that adaptive machine learning algorithms that integrate
official statistics and non-traditional data sources at scale can effectively
forecast asylum-related migration flows. We focus on asylum applications lodged
in countries of the European Union (EU) by nationals of all countries of origin
worldwide; the same approach can be applied in any context provided adequate
migration or asylum data are available.
We exploit three tiers of data - geolocated events and internet searches in
countries of origin, detections of irregular crossings at the EU border, and
asylum recognition rates in countries of destination - to effectively forecast
individual asylum-migration flows up to four weeks ahead with high accuracy.
Uniquely, our approach a) monitors potential drivers of migration in countries
of origin to detect changes early onset; b) models individual
country-to-country migration flows separately and on moving time windows; c)
estimates the effects of individual drivers, including lagged effects; d)
provides forecasts of asylum applications up to four weeks ahead; e) assesses
how patterns of drivers shift over time to describe the functioning and change
of migration systems
On a Japanese Subjective Well-Being Indicator Based on Twitter data
This study presents for the first time the SWB-J index, a subjective
well-being indicator for Japan based on Twitter data. The index is composed by
eight dimensions of subjective well-being and is estimated relying on Twitter
data by using human supervised sentiment analysis. The index is then compared
with the analogous SWB-I index for Italy, in order to verify possible analogies
and cultural differences. Further, through structural equation models, a causal
assumption is tested to see whether the economic and health conditions of the
country influence the well-being latent variable and how this latent dimension
affects the SWB-J and SWB-I indicators. It turns out that, as expected, the
economic and health welfare is only one aspect of the multidimensional
well-being that is captured by the Twitter-based indicator
A proposal to deal with sampling bias in social network big data
[EN] Selection bias is the bias introduced by the non random selection of data, it leads to question whether the sample obtained is representative of the target population. Generally there are different types of selection bias, but when one manages web-surveys or data from social network as Twitter or Facebook, one mostly need to focus with sampling and self-selection bias. In this work we propose to use offcial statistics to anchor and remove the sampling bias and unreliability of the estimations, due to the use of social network big data, following a weighting method combined with a small area estimations (SAE) approach.Iacus, SM.; Porro, G.; Salini, S.; Siletti, E. (2018). A proposal to deal with sampling bias in social network big data. En 2nd International Conference on Advanced Reserach Methods and Analytics (CARMA 2018). Editorial Universitat Politècnica de València. 29-37. https://doi.org/10.4995/CARMA2018.2018.8302OCS293
Are official confirmed cases and fatalities counts good enough to study the COVID-19 pandemic dynamics? A critical assessment through the case of Italy
As the COVID-19 outbreak is developing the two most frequently reported
statistics seem to be the raw confirmed case and case fatalities counts.
Focusing on Italy, one of the hardest hit countries, we look at how these two
values could be put in perspective to reflect the dynamics of the virus spread.
In particular, we find that merely considering the confirmed case counts would
be very misleading. The number of daily tests grows, while the daily fraction
of confirmed cases to total tests has a change point. It (depending on region)
generally increases with strong fluctuations till (around, depending on region)
15th-22nd March and then decreases linearly after. Combined with the increasing
trend of daily performed tests, the raw confirmed case counts are not
representative of the situation and are confounded with the sampling effort.
This we observe when regressing on time the logged fraction of positive tests
and for comparison the logged raw confirmed count. Hence, calibrating model
parameters for this virus's dynamics should not be done based only on confirmed
case counts (without rescaling by the number of tests), but take also
fatalities and hospitalization count under consideration as variables not prone
to be distorted by testing efforts. Furthermore, reporting statistics on the
national level does not say much about the dynamics of the disease, which are
taking place at the regional level. These findings are based on the official
data of total death counts up to 15th April 2020 released by ISTAT and up to
10th May 2020 for the number of cases. In this work we do not fit models but we
rather investigate whether this task is possible at all. This work also informs
about a new tool to collect and harmonize official statistics coming from
different sources in the form of a package for the R statistical environment
and presents the COVID-19 Data Hub.Comment: updated reference
A Japanese subjective well-being indicator based on Twitter data
This study presents for the first time the SWB-J index, a subjective well-being indicator for Japan based on Twitter data. The index is composed by eight dimensions of subjective well-being and is estimated relying on Twitter data by using human supervised sentiment analysis. The index is then compared with the analogous SWB-I index for Italy in order to verify possible analogies and cultural differences. Further, through structural equation models, we investigate the relationship between economic and health conditions of the country and the well-being latent variable and illustrate how this latent dimension affects the SWB-J and SWB-I indicators. It turns out that, as expected, economic and health welfare is only one aspect of the multidimensional well-being that is captured by the Twitter-based indicator
- …