11,691 research outputs found
Symbolic Time Series Analysis in Economics
In this paper I describe and apply the methods of Symbolic Time Series Analysis (STSA) to an experimental framework. The idea behind Symbolic Time Series Analysis is simple: the values of a given time series data are transformed into a finite set of symbols obtaining a finite string. Then, we can process the symbolic sequence using tools from information theory and symbolic dynamics. I discuss data symbolization as a tool for identifying temporal patterns in experimental data and use symbol sequence statistics in a model strategy. To explain these applications, I describe methods to select the symbolization of the data (Section 2), I introduce the symbolic sequence histograms and some tools to characterize and compare these histograms (Section 3). I show that the methods of symbolic time series analysis can be a good tool to describe and recognize time patterns in complex dynamical processes and to extract dynamical information about this kind of system. In particular, the method gives us a language in which to express and analyze these time patterns. In section 4 I report some applications of STSA to study the evolution of ifferent economies. In these applications data symbolization is based on economic criteria using the notion of economic regime introduced earlier in this thesis. I use STSA methods to describe the dynamical behavior of these economies and to do comparative analysis of their regime dynamics. In section 5 I use STSA to reconstruct a model of a dynamical system from measured time series data. In particular, I will show how the observed symbolic sequence statistics can be used as a target for measuring the goodness of fit of proposed models.
Descriptive statistics for symbolic interval-valued data
It is, by now, well recognized that real data are in intervals and not in points. Unfortunately, classical statistical theory is not capable of handling data in intervals. Here, methodology for drawing univariate and bivariate histograms and computation of descriptive statistics, like sample mean and sample variance for Symbolic interval-valued data is discussed. It is hoped that researchers would start employing this type of âSymbolic data analysisâ to their datasets
Linear regression for numeric symbolic variables: an ordinary least squares approach based on Wasserstein Distance
In this paper we present a linear regression model for modal symbolic data.
The observed variables are histogram variables according to the definition
given in the framework of Symbolic Data Analysis and the parameters of the
model are estimated using the classic Least Squares method. An appropriate
metric is introduced in order to measure the error between the observed and the
predicted distributions. In particular, the Wasserstein distance is proposed.
Some properties of such metric are exploited to predict the response variable
as direct linear combination of other independent histogram variables. Measures
of goodness of fit are discussed. An application on real data corroborates the
proposed method
Basic statistics for probabilistic symbolic variables: a novel metric-based approach
In data mining, it is usually to describe a set of individuals using some
summaries (means, standard deviations, histograms, confidence intervals) that
generalize individual descriptions into a typology description. In this case,
data can be described by several values. In this paper, we propose an approach
for computing basic statics for such data, and, in particular, for data
described by numerical multi-valued variables (interval, histograms, discrete
multi-valued descriptions). We propose to treat all numerical multi-valued
variables as distributional data, i.e. as individuals described by
distributions. To obtain new basic statistics for measuring the variability and
the association between such variables, we extend the classic measure of
inertia, calculated with the Euclidean distance, using the squared Wasserstein
distance defined between probability measures. The distance is a generalization
of the Wasserstein distance, that is a distance between quantile functions of
two distributions. Some properties of such a distance are shown. Among them, we
prove the Huygens theorem of decomposition of the inertia. We show the use of
the Wasserstein distance and of the basic statistics presenting a k-means like
clustering algorithm, for the clustering of a set of data described by modal
numerical variables (distributional variables), on a real data set. Keywords:
Wasserstein distance, inertia, dependence, distributional data, modal
variables.Comment: 19 pages, 3 figure
Measure based metrics for aggregated data
Aggregated data arises commonly from surveys and censuses where groups of individuals are studied as coherent entities. The aggregated data can take many forms including sets, intervals, distributions and histograms. The data analyst needs to measure the similarity between such aggregated data items and a range of metrics are reported in the literature to achieve this (e.g. the Jaccard metric for sets and the Wasserstein metric for histograms). In this paper, a unifying theory based on measure theory is developed that establishes not only that known metrics are essentially similar but also suggests new metrics
- âŠ