440 research outputs found

    Null models and complexity science: disentangling signal from noise in complex interacting systems

    Get PDF
    The constantly increasing availability of fine-grained data has led to a very detailed description of many socio-economic systems (such as financial markets, interbank loans or supply chains), whose representation, however, quickly becomes too complex to allow for any meaningful intuition or insight about their functioning mechanisms. This, in turn, leads to the challenge of disentangling statistically meaningful information from noise without assuming any a priori knowledge on the particular system under study. The aim of this thesis is to develop and test on real world data unsupervised techniques to extract relevant information from large complex interacting systems. The question I try to answer is the following: is it possible to disentangle statistically relevant information from noise without assuming any prior knowledge about the system under study? In particular, I tackle this challenge from the viewpoint of hypothesis testing by developing techniques based on so-called null models, i.e., partially randomised representations of the system under study. Given that complex systems can be analysed both from the perspective of their time evolution and of their time-aggregated properties, I have tested and developed one technique for each of these two purposes. The first technique I have developed is aimed at extracting “backbones” of relevant relationships in complex interacting systems represented as static weighted networks of pairwise interactions and it is inspired by the well-known Pólya urn combinatorial process. The second technique I have developed is instead aimed at identifying statistically relevant events and temporal patterns in single or multiple time series by means of maximum entropy null models based on Ensemble Theory. Both of these methodologies try to exploit the heterogeneity of complex systems data in order to design null models that are tailored to the systems under study, and therefore capable of identifying signals that are genuinely distinctive of the systems themselves

    Extensions of Positive Definite Functions: Applications and Their Harmonic Analysis

    Full text link
    We study two classes of extension problems, and their interconnections: (i) Extension of positive definite (p.d.) continuous functions defined on subsets in locally compact groups GG; (ii) In case of Lie groups, representations of the associated Lie algebras La(G)La\left(G\right) by unbounded skew-Hermitian operators acting in a reproducing kernel Hilbert space (RKHS) HF\mathscr{H}_{F}. Why extensions? In science, experimentalists frequently gather spectral data in cases when the observed data is limited, for example limited by the precision of instruments; or on account of a variety of other limiting external factors. Given this fact of life, it is both an art and a science to still produce solid conclusions from restricted or limited data. In a general sense, our monograph deals with the mathematics of extending some such given partial data-sets obtained from experiments. More specifically, we are concerned with the problems of extending available partial information, obtained, for example, from sampling. In our case, the limited information is a restriction, and the extension in turn is the full positive definite function (in a dual variable); so an extension if available will be an everywhere defined generating function for the exact probability distribution which reflects the data; if it were fully available. Such extensions of local information (in the form of positive definite functions) will in turn furnish us with spectral information. In this form, the problem becomes an operator extension problem, referring to operators in a suitable reproducing kernel Hilbert spaces (RKHS). In our presentation we have stressed hands-on-examples. Extensions are almost never unique, and so we deal with both the question of existence, and if there are extensions, how they relate back to the initial completion problem.Comment: 235 pages, 42 figures, 7 tables. arXiv admin note: substantial text overlap with arXiv:1401.478

    Bibliographie

    Get PDF

    A multilayered block network model to forecast large dynamic transportation graphs:An application to US air transport

    Get PDF
    Dynamic transportation networks have been analyzed for years by means of static graph-based indicators in order to study the temporal evolution of relevant network components, and to reveal complex dependencies that would not be easily detected by a direct inspection of the data. This paper presents a state-of-the-art latent network model to forecast multilayer dynamic graphs that are increasingly common in transportation and proposes a community-based extension to reduce the computational burden. Flexible time series analysis is obtained by modeling the probability of edges between vertices through latent Gaussian processes. The models and Bayesian inference are illustrated on a sample of 10-year data from four major airlines within the US air transportation system. Results show how the estimated latent parameters from the models are related to the airline's connectivity dynamics, and their ability to project the multilayer graph into the future for out-of-sample full network forecasts, while stochastic blockmodeling allows for the identification of relevant communities. Reliable network predictions would allow policy-makers to better understand the dynamics of the transport system, and help in their planning on e.g. route development, or the deployment of new regulations

    Structured Mixture of Continuation-ratio Logits Models for Ordinal Regression

    Full text link
    We develop a nonparametric Bayesian modeling approach to ordinal regression based on priors placed directly on the discrete distribution of the ordinal responses. The prior probability models are built from a structured mixture of multinomial distributions. We leverage a continuation-ratio logits representation to formulate the mixture kernel, with mixture weights defined through the logit stick-breaking process that incorporates the covariates through a linear function. The implied regression functions for the response probabilities can be expressed as weighted sums of parametric regression functions, with covariate-dependent weights. Thus, the modeling approach achieves flexible ordinal regression relationships, avoiding linearity or additivity assumptions in the covariate effects. A key model feature is that the parameters for both the mixture kernel and the mixture weights can be associated with a continuation-ratio logits regression structure. Hence, an efficient and relatively easy to implement posterior simulation method can be designed, using P\'olya-Gamma data augmentation. Moreover, the model is built from a conditional independence structure for category-specific parameters, which results in additional computational efficiency gains through partial parallel sampling. In addition to the general mixture structure, we study simplified model versions that incorporate covariate dependence only in the mixture kernel parameters or only in the mixture weights. For all proposed models, we discuss approaches to prior specification and develop Markov chain Monte Carlo methods for posterior simulation. The methodology is illustrated with several synthetic and real data examples

    Bayesian spectral modeling for multiple time series

    Get PDF
    We develop a novel Bayesian modeling approach to spectral density estimation for multiple time series. The log-periodogram distribution for each series is modeled as a mixture of Gaussian distributions with frequency-dependent weights and mean functions. The implied model for the log-spectral density is a mixture of linear mean functions with frequency-dependent weights. The mixture weights are built through successive differences of a logit-normal distribution function with frequency-dependent parameters. Building from the construction for a single spectral density, we develop a hierarchical extension for multiple time series. Specifically, we set the mean functions to be common to all spectral densities and make the weights specific to the time series through the parameters of the logit-normal distribution. In addition to accommodating flexible spectral density shapes, a practically important feature of the proposed formulation is that it allows for ready posterior simulation through a Gibbs sampler with closed form full conditional distributions for all model parameters. The modeling approach is illustrated with simulated datasets, and used for spectral analysis of multichannel electroencephalographic recordings (EEGs), which provides a key motivating application for the proposed methodology
    • 

    corecore