Search CORE

17,099 research outputs found

Measure based metrics for aggregated data

Author: Rayward-Smith V. J.
Publication venue: 'IOS Press'
Publication date: 01/01/2011
Field of study

Aggregated data arises commonly from surveys and censuses where groups of individuals are studied as coherent entities. The aggregated data can take many forms including sets, intervals, distributions and histograms. The data analyst needs to measure the similarity between such aggregated data items and a range of metrics are reported in the literature to achieve this (e.g. the Jaccard metric for sets and the Wasserstein metric for histograms). In this paper, a unifying theory based on measure theory is developed that establishes not only that known metrics are essentially similar but also suggests new metrics

University of East Anglia digital repository

On central tendency and dispersion measures for intervals and hypercubes

Author: Bertrand P.
Bock H. H.
Bock H. H.
Bock H.-H.
Chavent M.
De Carvalho F. de A. T.
Diday E.
Jérôme Saracco
Marie Chavent
Nadler S. B. J.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2008
Field of study

The uncertainty or the variability of the data may be treated by considering, rather than a single value for each data, the interval of values in which it may fall. This paper studies the derivation of basic description statistics for interval-valued datasets. We propose a geometrical approach in the determination of summary statistics (central tendency and dispersion measures) for interval-valued variables

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Oskar Bordeaux

Linear regression for numeric symbolic variables: an ordinary least squares approach based on Wasserstein Distance

Author: A Irpino
Antonio Irpino
B Efron
CL Lawson
CL Mallows
E Diday
EAL Neto
EAL Neto
G Dall’Aglio
H Bock
J Arroyo
L Billard
L Kantorovich
L Wasserstein
M Noirhomme-Fraiture
P Bertrand
P Bickel
R Tibshirani
Rosanna Verde
WG Gilchrist
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/07/2012
Field of study

In this paper we present a linear regression model for modal symbolic data. The observed variables are histogram variables according to the definition given in the framework of Symbolic Data Analysis and the parameters of the model are estimated using the classic Least Squares method. An appropriate metric is introduced in order to measure the error between the observed and the predicted distributions. In particular, the Wasserstein distance is proposed. Some properties of such metric are exploited to predict the response variable as direct linear combination of other independent histogram variables. Measures of goodness of fit are discussed. An application on real data corroborates the proposed method

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

Auto-tuning Distributed Stream Processing Systems using Reinforcement Learning

Author: Cuadrado Felix
Vaquero Luis M.
Publication venue
Publication date: 14/09/2018
Field of study

Fine tuning distributed systems is considered to be a craftsmanship, relying on intuition and experience. This becomes even more challenging when the systems need to react in near real time, as streaming engines have to do to maintain pre-agreed service quality metrics. In this article, we present an automated approach that builds on a combination of supervised and reinforcement learning methods to recommend the most appropriate lever configurations based on previous load. With this, streaming engines can be automatically tuned without requiring a human to determine the right way and proper time to deploy them. This opens the door to new configurations that are not being applied today since the complexity of managing these systems has surpassed the abilities of human experts. We show how reinforcement learning systems can find substantially better configurations in less time than their human counterparts and adapt to changing workloads

arXiv.org e-Print Archive

Explore Bristol Research