440 research outputs found
Null models and complexity science: disentangling signal from noise in complex interacting systems
The constantly increasing availability of fine-grained data has led to a very detailed description of many socio-economic systems (such as financial markets, interbank loans or supply chains), whose representation, however, quickly becomes too complex to allow for any meaningful intuition or insight about their functioning mechanisms. This, in turn, leads to the challenge of disentangling statistically meaningful information from noise without assuming any a priori knowledge on the particular system under study. The aim of this thesis is to develop and test on real world data unsupervised techniques to extract relevant information from large complex interacting systems. The question I try to answer is the following: is it possible to disentangle statistically relevant information from noise without assuming any prior knowledge about the system under study? In particular, I tackle this challenge from the viewpoint of hypothesis testing by developing techniques based on so-called null models, i.e., partially randomised representations of the system under study. Given that complex systems can be analysed both from the perspective of their time evolution and of their time-aggregated properties, I have tested and developed one technique for each of these two purposes. The first technique I have developed is aimed at extracting âbackbonesâ of relevant relationships in complex interacting systems represented as static weighted networks of pairwise interactions and it is inspired by the well-known PĂłlya urn combinatorial process. The second technique I have developed is instead aimed at identifying statistically relevant events and temporal patterns in single or multiple time series by means of maximum entropy null models based on Ensemble Theory. Both of these methodologies try to exploit the heterogeneity of complex systems data in order to design null models that are tailored to the systems under study, and therefore capable of identifying signals that are genuinely distinctive of the systems themselves
Extensions of Positive Definite Functions: Applications and Their Harmonic Analysis
We study two classes of extension problems, and their interconnections: (i)
Extension of positive definite (p.d.) continuous functions defined on subsets
in locally compact groups ; (ii) In case of Lie groups, representations of
the associated Lie algebras by unbounded skew-Hermitian
operators acting in a reproducing kernel Hilbert space (RKHS)
.
Why extensions? In science, experimentalists frequently gather spectral data
in cases when the observed data is limited, for example limited by the
precision of instruments; or on account of a variety of other limiting external
factors. Given this fact of life, it is both an art and a science to still
produce solid conclusions from restricted or limited data. In a general sense,
our monograph deals with the mathematics of extending some such given partial
data-sets obtained from experiments. More specifically, we are concerned with
the problems of extending available partial information, obtained, for example,
from sampling. In our case, the limited information is a restriction, and the
extension in turn is the full positive definite function (in a dual variable);
so an extension if available will be an everywhere defined generating function
for the exact probability distribution which reflects the data; if it were
fully available. Such extensions of local information (in the form of positive
definite functions) will in turn furnish us with spectral information. In this
form, the problem becomes an operator extension problem, referring to operators
in a suitable reproducing kernel Hilbert spaces (RKHS). In our presentation we
have stressed hands-on-examples. Extensions are almost never unique, and so we
deal with both the question of existence, and if there are extensions, how they
relate back to the initial completion problem.Comment: 235 pages, 42 figures, 7 tables. arXiv admin note: substantial text
overlap with arXiv:1401.478
A multilayered block network model to forecast large dynamic transportation graphs:An application to US air transport
Dynamic transportation networks have been analyzed for years by means of
static graph-based indicators in order to study the temporal evolution of
relevant network components, and to reveal complex dependencies that would not
be easily detected by a direct inspection of the data. This paper presents a
state-of-the-art latent network model to forecast multilayer dynamic graphs
that are increasingly common in transportation and proposes a community-based
extension to reduce the computational burden. Flexible time series analysis is
obtained by modeling the probability of edges between vertices through latent
Gaussian processes. The models and Bayesian inference are illustrated on a
sample of 10-year data from four major airlines within the US air
transportation system. Results show how the estimated latent parameters from
the models are related to the airline's connectivity dynamics, and their
ability to project the multilayer graph into the future for out-of-sample full
network forecasts, while stochastic blockmodeling allows for the identification
of relevant communities. Reliable network predictions would allow policy-makers
to better understand the dynamics of the transport system, and help in their
planning on e.g. route development, or the deployment of new regulations
Structured Mixture of Continuation-ratio Logits Models for Ordinal Regression
We develop a nonparametric Bayesian modeling approach to ordinal regression
based on priors placed directly on the discrete distribution of the ordinal
responses. The prior probability models are built from a structured mixture of
multinomial distributions. We leverage a continuation-ratio logits
representation to formulate the mixture kernel, with mixture weights defined
through the logit stick-breaking process that incorporates the covariates
through a linear function. The implied regression functions for the response
probabilities can be expressed as weighted sums of parametric regression
functions, with covariate-dependent weights. Thus, the modeling approach
achieves flexible ordinal regression relationships, avoiding linearity or
additivity assumptions in the covariate effects. A key model feature is that
the parameters for both the mixture kernel and the mixture weights can be
associated with a continuation-ratio logits regression structure. Hence, an
efficient and relatively easy to implement posterior simulation method can be
designed, using P\'olya-Gamma data augmentation. Moreover, the model is built
from a conditional independence structure for category-specific parameters,
which results in additional computational efficiency gains through partial
parallel sampling. In addition to the general mixture structure, we study
simplified model versions that incorporate covariate dependence only in the
mixture kernel parameters or only in the mixture weights. For all proposed
models, we discuss approaches to prior specification and develop Markov chain
Monte Carlo methods for posterior simulation. The methodology is illustrated
with several synthetic and real data examples
Bayesian spectral modeling for multiple time series
We develop a novel Bayesian modeling approach to spectral density estimation for multiple time series. The log-periodogram distribution for each series is modeled as a mixture of Gaussian distributions with frequency-dependent weights and mean functions. The implied model for the log-spectral density is a mixture of linear mean functions with frequency-dependent weights. The mixture weights are built through successive differences of a logit-normal distribution function with frequency-dependent parameters. Building from the construction for a single spectral density, we develop a hierarchical extension for multiple time series. Specifically, we set the mean functions to be common to all spectral densities and make the weights specific to the time series through the parameters of the logit-normal distribution. In addition to accommodating flexible spectral density shapes, a practically important feature of the proposed formulation is that it allows for ready posterior simulation through a Gibbs sampler with closed form full conditional distributions for all model parameters. The modeling approach is illustrated with simulated datasets, and used for spectral analysis of multichannel electroencephalographic recordings (EEGs), which provides a key motivating application for the proposed methodology
- âŠ