Search CORE

11,691 research outputs found

Symbolic Time Series Analysis in Economics

Author: Juan Gabriel Brida
Publication venue
Publication date
Field of study

In this paper I describe and apply the methods of Symbolic Time Series Analysis (STSA) to an experimental framework. The idea behind Symbolic Time Series Analysis is simple: the values of a given time series data are transformed into a finite set of symbols obtaining a finite string. Then, we can process the symbolic sequence using tools from information theory and symbolic dynamics. I discuss data symbolization as a tool for identifying temporal patterns in experimental data and use symbol sequence statistics in a model strategy. To explain these applications, I describe methods to select the symbolization of the data (Section 2), I introduce the symbolic sequence histograms and some tools to characterize and compare these histograms (Section 3). I show that the methods of symbolic time series analysis can be a good tool to describe and recognize time patterns in complex dynamical processes and to extract dynamical information about this kind of system. In particular, the method gives us a language in which to express and analyze these time patterns. In section 4 I report some applications of STSA to study the evolution of ifferent economies. In these applications data symbolization is based on economic criteria using the notion of economic regime introduced earlier in this thesis. I use STSA methods to describe the dynamical behavior of these economies and to do comparative analysis of their regime dynamics. In section 5 I use STSA to reconstruct a model of a dynamical system from measured time series data. In particular, I will show how the observed symbolic sequence statistics can be used as a target for measuring the goodness of fit of proposed models.

Research Papers in Economics

Descriptive statistics for symbolic interval-valued data

Author: H K RANGANATH
HIMADRI GHOSH
PRAJNESHU
Publication venue: Indian Council of Agricultural Research
Publication date: 01/03/2014
Field of study

It is, by now, well recognized that real data are in intervals and not in points. Unfortunately, classical statistical theory is not capable of handling data in intervals. Here, methodology for drawing univariate and bivariate histograms and computation of descriptive statistics, like sample mean and sample variance for Symbolic interval-valued data is discussed. It is hoped that researchers would start employing this type of ‘Symbolic data analysis’ to their datasets

Directory of Open Access Journals

Linear regression for numeric symbolic variables: an ordinary least squares approach based on Wasserstein Distance

Author: A Irpino
Antonio Irpino
B Efron
CL Lawson
CL Mallows
E Diday
EAL Neto
EAL Neto
G Dall’Aglio
H Bock
J Arroyo
L Billard
L Kantorovich
L Wasserstein
M Noirhomme-Fraiture
P Bertrand
P Bickel
R Tibshirani
Rosanna Verde
WG Gilchrist
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/07/2012
Field of study

In this paper we present a linear regression model for modal symbolic data. The observed variables are histogram variables according to the definition given in the framework of Symbolic Data Analysis and the parameters of the model are estimated using the classic Least Squares method. An appropriate metric is introduced in order to measure the error between the observed and the predicted distributions. In particular, the Wasserstein distance is proposed. Some properties of such metric are exploited to predict the response variable as direct linear combination of other independent histogram variables. Measures of goodness of fit are discussed. An application on real data corroborates the proposed method

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

Basic statistics for probabilistic symbolic variables: a novel metric-based approach

Author: Irpino Antonio
Verde Rosanna
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/12/2013
Field of study

In data mining, it is usually to describe a set of individuals using some summaries (means, standard deviations, histograms, confidence intervals) that generalize individual descriptions into a typology description. In this case, data can be described by several values. In this paper, we propose an approach for computing basic statics for such data, and, in particular, for data described by numerical multi-valued variables (interval, histograms, discrete multi-valued descriptions). We propose to treat all numerical multi-valued variables as distributional data, i.e. as individuals described by distributions. To obtain new basic statistics for measuring the variability and the association between such variables, we extend the classic measure of inertia, calculated with the Euclidean distance, using the squared Wasserstein distance defined between probability measures. The distance is a generalization of the Wasserstein distance, that is a distance between quantile functions of two distributions. Some properties of such a distance are shown. Among them, we prove the Huygens theorem of decomposition of the inertia. We show the use of the Wasserstein distance and of the basic statistics presenting a k-means like clustering algorithm, for the clustering of a set of data described by modal numerical variables (distributional variables), on a real data set. Keywords: Wasserstein distance, inertia, dependence, distributional data, modal variables.Comment: 19 pages, 3 figure

arXiv.org e-Print Archive

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

Measure based metrics for aggregated data

Author: Rayward-Smith V. J.
Publication venue: 'IOS Press'
Publication date: 01/01/2011
Field of study

Aggregated data arises commonly from surveys and censuses where groups of individuals are studied as coherent entities. The aggregated data can take many forms including sets, intervals, distributions and histograms. The data analyst needs to measure the similarity between such aggregated data items and a range of metrics are reported in the literature to achieve this (e.g. the Jaccard metric for sets and the Wasserstein metric for histograms). In this paper, a unifying theory based on measure theory is developed that establishes not only that known metrics are essentially similar but also suggests new metrics

University of East Anglia digital repository