7,341 research outputs found
Duration and Interval Hidden Markov Model for Sequential Data Analysis
Analysis of sequential event data has been recognized as one of the essential
tools in data modeling and analysis field. In this paper, after the examination
of its technical requirements and issues to model complex but practical
situation, we propose a new sequential data model, dubbed Duration and Interval
Hidden Markov Model (DI-HMM), that efficiently represents "state duration" and
"state interval" of data events. This has significant implications to play an
important role in representing practical time-series sequential data. This
eventually provides an efficient and flexible sequential data retrieval.
Numerical experiments on synthetic and real data demonstrate the efficiency and
accuracy of the proposed DI-HMM
Using parallel computation to improve Independent Metropolis--Hastings based estimation
In this paper, we consider the implications of the fact that parallel
raw-power can be exploited by a generic Metropolis--Hastings algorithm if the
proposed values are independent. In particular, we present improvements to the
independent Metropolis--Hastings algorithm that significantly decrease the
variance of any estimator derived from the MCMC output, for a null computing
cost since those improvements are based on a fixed number of target density
evaluations. Furthermore, the techniques developed in this paper do not
jeopardize the Markovian convergence properties of the algorithm, since they
are based on the Rao--Blackwell principles of Gelfand and Smith (1990), already
exploited in Casella and Robert (1996), Atchade and Perron (2005) and Douc and
Robert (2010). We illustrate those improvements both on a toy normal example
and on a classical probit regression model, but stress the fact that they are
applicable in any case where the independent Metropolis-Hastings is applicable.Comment: 19 pages, 8 figures, to appear in Journal of Computational and
Graphical Statistic
HYPA: Efficient Detection of Path Anomalies in Time Series Data on Networks
The unsupervised detection of anomalies in time series data has important
applications in user behavioral modeling, fraud detection, and cybersecurity.
Anomaly detection has, in fact, been extensively studied in categorical
sequences. However, we often have access to time series data that represent
paths through networks. Examples include transaction sequences in financial
networks, click streams of users in networks of cross-referenced documents, or
travel itineraries in transportation networks. To reliably detect anomalies, we
must account for the fact that such data contain a large number of independent
observations of paths constrained by a graph topology. Moreover, the
heterogeneity of real systems rules out frequency-based anomaly detection
techniques, which do not account for highly skewed edge and degree statistics.
To address this problem, we introduce HYPA, a novel framework for the
unsupervised detection of anomalies in large corpora of variable-length
temporal paths in a graph. HYPA provides an efficient analytical method to
detect paths with anomalous frequencies that result from nodes being traversed
in unexpected chronological order.Comment: 11 pages with 8 figures and supplementary material. To appear at SIAM
Data Mining (SDM 2020
Handwritten digit recognition by bio-inspired hierarchical networks
The human brain processes information showing learning and prediction
abilities but the underlying neuronal mechanisms still remain unknown.
Recently, many studies prove that neuronal networks are able of both
generalizations and associations of sensory inputs. In this paper, following a
set of neurophysiological evidences, we propose a learning framework with a
strong biological plausibility that mimics prominent functions of cortical
circuitries. We developed the Inductive Conceptual Network (ICN), that is a
hierarchical bio-inspired network, able to learn invariant patterns by
Variable-order Markov Models implemented in its nodes. The outputs of the
top-most node of ICN hierarchy, representing the highest input generalization,
allow for automatic classification of inputs. We found that the ICN clusterized
MNIST images with an error of 5.73% and USPS images with an error of 12.56%
Probabilistic uncertainty quantification and experiment design for nonlinear models: Applications in systems biology
Despite the ever-increasing interest in understanding biology at the system level, there are several factors that hinder studies and analyses of biological systems. First, unlike systems from other applied fields whose parameters can be effectively identified, biological systems are usually unidentifiable, even in the ideal case when all possible system outputs are known with high accuracy. Second, the presence of multivariate bifurcations often leads the system to behaviors that are completely different in nature. In such cases, system outputs (as function of parameters/inputs) are usually discontinuous or have sharp transitions across domains with different behaviors. Finally, models from systems biology are usually strongly nonlinear with large numbers of parameters and complex interactions. This results in high computational costs of model simulations that are required to study the systems, an issue that becomes more and more problematic when the dimensionality of the system increases. Similarly, wet-lab experiments to gather information about the biological model of interest are usually strictly constrained by research budget and experimental settings. The choice of experiments/simulations for inference, therefore, needs to be carefully addressed. ^ The work presented in this dissertation develops strategies to address theoretical and practical limitations in uncertainty quantification and experimental design of non-linear mathematical models, applied in the context of systems biology. This work resolves those issues by focusing on three separate but related approaches: (i) the use of probabilistic frameworks for uncertainty quantification in the face of unidentifiability (ii) the use of behavior discrimination algorithms to study systems with discontinuous model responses and (iii) the use of effective sampling schemes and optimal experimental design to reduce the computational/experimental costs. ^ This cumulative work also places strong emphasis on providing theoretical foundations for the use of the proposed framework: theoretical properties of algorithms at each step in the process are investigated carefully to give more insights about how the algorithms perform, and in many cases, to provide feedback to improve the performance of existing approaches. Through the newly developed procedures, we successfully created a general probabilistic framework for uncertainty quantification and experiment design for non-linear models in the face of unidentifiability, sharp model responses with limited number of model simulations, constraints on experimental setting, and even in the absence of data. The proposed methods have strong theoretical foundations and have also proven to be effective in studies of expensive high-dimensional biological systems in various contexts
On computational tools for Bayesian data analysis
While Robert and Rousseau (2010) addressed the foundational aspects of
Bayesian analysis, the current chapter details its practical aspects through a
review of the computational methods available for approximating Bayesian
procedures. Recent innovations like Monte Carlo Markov chain, sequential Monte
Carlo methods and more recently Approximate Bayesian Computation techniques
have considerably increased the potential for Bayesian applications and they
have also opened new avenues for Bayesian inference, first and foremost
Bayesian model choice.Comment: This is a chapter for the book "Bayesian Methods and Expert
Elicitation" edited by Klaus Bocker, 23 pages, 9 figure
- …