70,286 research outputs found

    Efficient Optimization of Echo State Networks for Time Series Datasets

    Full text link
    Echo State Networks (ESNs) are recurrent neural networks that only train their output layer, thereby precluding the need to backpropagate gradients through time, which leads to significant computational gains. Nevertheless, a common issue in ESNs is determining its hyperparameters, which are crucial in instantiating a well performing reservoir, but are often set manually or using heuristics. In this work we optimize the ESN hyperparameters using Bayesian optimization which, given a limited budget of function evaluations, outperforms a grid search strategy. In the context of large volumes of time series data, such as light curves in the field of astronomy, we can further reduce the optimization cost of ESNs. In particular, we wish to avoid tuning hyperparameters per individual time series as this is costly; instead, we want to find ESNs with hyperparameters that perform well not just on individual time series but rather on groups of similar time series without sacrificing predictive performance significantly. This naturally leads to a notion of clusters, where each cluster is represented by an ESN tuned to model a group of time series of similar temporal behavior. We demonstrate this approach both on synthetic datasets and real world light curves from the MACHO survey. We show that our approach results in a significant reduction in the number of ESN models required to model a whole dataset, while retaining predictive performance for the series in each cluster

    Bayesian comparison of latent variable models: Conditional vs marginal likelihoods

    Full text link
    Typical Bayesian methods for models with latent variables (or random effects) involve directly sampling the latent variables along with the model parameters. In high-level software code for model definitions (using, e.g., BUGS, JAGS, Stan), the likelihood is therefore specified as conditional on the latent variables. This can lead researchers to perform model comparisons via conditional likelihoods, where the latent variables are considered model parameters. In other settings, however, typical model comparisons involve marginal likelihoods where the latent variables are integrated out. This distinction is often overlooked despite the fact that it can have a large impact on the comparisons of interest. In this paper, we clarify and illustrate these issues, focusing on the comparison of conditional and marginal Deviance Information Criteria (DICs) and Watanabe-Akaike Information Criteria (WAICs) in psychometric modeling. The conditional/marginal distinction corresponds to whether the model should be predictive for the clusters that are in the data or for new clusters (where "clusters" typically correspond to higher-level units like people or schools). Correspondingly, we show that marginal WAIC corresponds to leave-one-cluster out (LOcO) cross-validation, whereas conditional WAIC corresponds to leave-one-unit out (LOuO). These results lead to recommendations on the general application of the criteria to models with latent variables.Comment: Manuscript in press at Psychometrika; 31 pages, 8 figure

    Detection of regulator genes and eQTLs in gene networks

    Full text link
    Genetic differences between individuals associated to quantitative phenotypic traits, including disease states, are usually found in non-coding genomic regions. These genetic variants are often also associated to differences in expression levels of nearby genes (they are "expression quantitative trait loci" or eQTLs for short) and presumably play a gene regulatory role, affecting the status of molecular networks of interacting genes, proteins and metabolites. Computational systems biology approaches to reconstruct causal gene networks from large-scale omics data have therefore become essential to understand the structure of networks controlled by eQTLs together with other regulatory genes, and to generate detailed hypotheses about the molecular mechanisms that lead from genotype to phenotype. Here we review the main analytical methods and softwares to identify eQTLs and their associated genes, to reconstruct co-expression networks and modules, to reconstruct causal Bayesian gene and module networks, and to validate predicted networks in silico.Comment: minor revision with typos corrected; review article; 24 pages, 2 figure

    Evaluation of geospatial methods to generate subnational HIV prevalence estimates for local level planning

    No full text
    Objective: There is evidence of substantial subnational variation in the HIV epidemic. However, robust spatial HIV data are often only available at high levels of geographic aggregation and not at the finer resolution needed for decision making. Therefore, spatial analysis methods that leverage available data to provide local estimates of HIV prevalence may be useful. Such methods exist but have not been formally compared when applied to HIV. Design/methods: Six candidate methods – including those used by the Joint United Nations Programme on HIV/AIDS to generate maps and a Bayesian geostatistical approach applied to other diseases – were used to generate maps and subnational estimates of HIV prevalence across three countries using cluster level data from household surveys. Two approaches were used to assess the accuracy of predictions: internal validation, whereby a proportion of input data is held back (test dataset) to challenge predictions; and comparison with location-specific data from household surveys in earlier years. Results: Each of the methods can generate usefully accurate predictions of prevalence at unsampled locations, with the magnitude of the error in predictions similar across approaches. However, the Bayesian geostatistical approach consistently gave marginally the strongest statistical performance across countries and validation procedures. Conclusions: Available methods may be able to furnish estimates of HIV prevalence at finer spatial scales than the data currently allow. The subnational variation revealed can be integrated into planning to ensure responsiveness to the spatial features of the epidemic. The Bayesian geostatistical approach is a promising strategy for integrating HIV data to generate robust local estimates

    A PAC-Bayesian Analysis of Graph Clustering and Pairwise Clustering

    Full text link
    We formulate weighted graph clustering as a prediction problem: given a subset of edge weights we analyze the ability of graph clustering to predict the remaining edge weights. This formulation enables practical and theoretical comparison of different approaches to graph clustering as well as comparison of graph clustering with other possible ways to model the graph. We adapt the PAC-Bayesian analysis of co-clustering (Seldin and Tishby, 2008; Seldin, 2009) to derive a PAC-Bayesian generalization bound for graph clustering. The bound shows that graph clustering should optimize a trade-off between empirical data fit and the mutual information that clusters preserve on the graph nodes. A similar trade-off derived from information-theoretic considerations was already shown to produce state-of-the-art results in practice (Slonim et al., 2005; Yom-Tov and Slonim, 2009). This paper supports the empirical evidence by providing a better theoretical foundation, suggesting formal generalization guarantees, and offering a more accurate way to deal with finite sample issues. We derive a bound minimization algorithm and show that it provides good results in real-life problems and that the derived PAC-Bayesian bound is reasonably tight

    Spatio-Temporal Modelling of Perfusion Cardiovascular MRI

    Get PDF
    Myocardial perfusion MRI provides valuable insight into how coronary artery and microvascular diseases affect myocardial tissue. Stenosis in a coronary vessel leads to reduced maximum blood flow (MBF), but collaterals may secure the blood supply of the myocardium but with altered tracer kinetics. To date, quantitative analysis of myocardial perfusion MRI has only been performed on a local level, largely ignoring the contextual information inherent in different myocardial segments. This paper proposes to quantify the spatial dependencies between the local kinetics via a Hierarchical Bayesian Model (HBM). In the proposed framework, all local systems are modelled simultaneously along with their dependencies, thus allowing more robust context-driven estimation of local kinetics. Detailed validation on both simulated and patient data is provided
    corecore