70,286 research outputs found
Efficient Optimization of Echo State Networks for Time Series Datasets
Echo State Networks (ESNs) are recurrent neural networks that only train
their output layer, thereby precluding the need to backpropagate gradients
through time, which leads to significant computational gains. Nevertheless, a
common issue in ESNs is determining its hyperparameters, which are crucial in
instantiating a well performing reservoir, but are often set manually or using
heuristics. In this work we optimize the ESN hyperparameters using Bayesian
optimization which, given a limited budget of function evaluations, outperforms
a grid search strategy. In the context of large volumes of time series data,
such as light curves in the field of astronomy, we can further reduce the
optimization cost of ESNs. In particular, we wish to avoid tuning
hyperparameters per individual time series as this is costly; instead, we want
to find ESNs with hyperparameters that perform well not just on individual time
series but rather on groups of similar time series without sacrificing
predictive performance significantly. This naturally leads to a notion of
clusters, where each cluster is represented by an ESN tuned to model a group of
time series of similar temporal behavior. We demonstrate this approach both on
synthetic datasets and real world light curves from the MACHO survey. We show
that our approach results in a significant reduction in the number of ESN
models required to model a whole dataset, while retaining predictive
performance for the series in each cluster
Bayesian comparison of latent variable models: Conditional vs marginal likelihoods
Typical Bayesian methods for models with latent variables (or random effects)
involve directly sampling the latent variables along with the model parameters.
In high-level software code for model definitions (using, e.g., BUGS, JAGS,
Stan), the likelihood is therefore specified as conditional on the latent
variables. This can lead researchers to perform model comparisons via
conditional likelihoods, where the latent variables are considered model
parameters. In other settings, however, typical model comparisons involve
marginal likelihoods where the latent variables are integrated out. This
distinction is often overlooked despite the fact that it can have a large
impact on the comparisons of interest. In this paper, we clarify and illustrate
these issues, focusing on the comparison of conditional and marginal Deviance
Information Criteria (DICs) and Watanabe-Akaike Information Criteria (WAICs) in
psychometric modeling. The conditional/marginal distinction corresponds to
whether the model should be predictive for the clusters that are in the data or
for new clusters (where "clusters" typically correspond to higher-level units
like people or schools). Correspondingly, we show that marginal WAIC
corresponds to leave-one-cluster out (LOcO) cross-validation, whereas
conditional WAIC corresponds to leave-one-unit out (LOuO). These results lead
to recommendations on the general application of the criteria to models with
latent variables.Comment: Manuscript in press at Psychometrika; 31 pages, 8 figure
Detection of regulator genes and eQTLs in gene networks
Genetic differences between individuals associated to quantitative phenotypic
traits, including disease states, are usually found in non-coding genomic
regions. These genetic variants are often also associated to differences in
expression levels of nearby genes (they are "expression quantitative trait
loci" or eQTLs for short) and presumably play a gene regulatory role, affecting
the status of molecular networks of interacting genes, proteins and
metabolites. Computational systems biology approaches to reconstruct causal
gene networks from large-scale omics data have therefore become essential to
understand the structure of networks controlled by eQTLs together with other
regulatory genes, and to generate detailed hypotheses about the molecular
mechanisms that lead from genotype to phenotype. Here we review the main
analytical methods and softwares to identify eQTLs and their associated genes,
to reconstruct co-expression networks and modules, to reconstruct causal
Bayesian gene and module networks, and to validate predicted networks in
silico.Comment: minor revision with typos corrected; review article; 24 pages, 2
figure
Evaluation of geospatial methods to generate subnational HIV prevalence estimates for local level planning
Objective: There is evidence of substantial subnational variation in the HIV epidemic.
However, robust spatial HIV data are often only available at high levels of geographic
aggregation and not at the finer resolution needed for decision making. Therefore,
spatial analysis methods that leverage available data to provide local estimates of HIV
prevalence may be useful. Such methods exist but have not been formally compared
when applied to HIV.
Design/methods: Six candidate methods – including those used by the Joint United
Nations Programme on HIV/AIDS to generate maps and a Bayesian geostatistical
approach applied to other diseases – were used to generate maps and subnational
estimates of HIV prevalence across three countries using cluster level data from
household surveys. Two approaches were used to assess the accuracy of predictions:
internal validation, whereby a proportion of input data is held back (test dataset) to
challenge predictions; and comparison with location-specific data from household
surveys in earlier years.
Results: Each of the methods can generate usefully accurate predictions of prevalence
at unsampled locations, with the magnitude of the error in predictions similar across
approaches. However, the Bayesian geostatistical approach consistently gave marginally the strongest statistical performance across countries and validation procedures.
Conclusions: Available methods may be able to furnish estimates of HIV prevalence at
finer spatial scales than the data currently allow. The subnational variation revealed can
be integrated into planning to ensure responsiveness to the spatial features of the
epidemic. The Bayesian geostatistical approach is a promising strategy for integrating
HIV data to generate robust local estimates
A PAC-Bayesian Analysis of Graph Clustering and Pairwise Clustering
We formulate weighted graph clustering as a prediction problem: given a
subset of edge weights we analyze the ability of graph clustering to predict
the remaining edge weights. This formulation enables practical and theoretical
comparison of different approaches to graph clustering as well as comparison of
graph clustering with other possible ways to model the graph. We adapt the
PAC-Bayesian analysis of co-clustering (Seldin and Tishby, 2008; Seldin, 2009)
to derive a PAC-Bayesian generalization bound for graph clustering. The bound
shows that graph clustering should optimize a trade-off between empirical data
fit and the mutual information that clusters preserve on the graph nodes. A
similar trade-off derived from information-theoretic considerations was already
shown to produce state-of-the-art results in practice (Slonim et al., 2005;
Yom-Tov and Slonim, 2009). This paper supports the empirical evidence by
providing a better theoretical foundation, suggesting formal generalization
guarantees, and offering a more accurate way to deal with finite sample issues.
We derive a bound minimization algorithm and show that it provides good results
in real-life problems and that the derived PAC-Bayesian bound is reasonably
tight
Spatio-Temporal Modelling of Perfusion Cardiovascular MRI
Myocardial perfusion MRI provides valuable insight into how coronary artery and microvascular diseases affect myocardial tissue. Stenosis in a coronary vessel leads to reduced maximum blood flow (MBF), but collaterals may secure the blood supply of the myocardium but with altered tracer kinetics. To date, quantitative analysis of myocardial perfusion MRI has only been performed on a local level, largely ignoring the contextual information inherent in different myocardial segments. This paper proposes to quantify the spatial dependencies between the local kinetics via a Hierarchical Bayesian Model (HBM). In the proposed framework, all local systems are modelled simultaneously along with their dependencies, thus allowing more robust context-driven estimation of local kinetics. Detailed validation on both simulated and patient data is provided
- …