7,341 research outputs found

    Duration and Interval Hidden Markov Model for Sequential Data Analysis

    Full text link
    Analysis of sequential event data has been recognized as one of the essential tools in data modeling and analysis field. In this paper, after the examination of its technical requirements and issues to model complex but practical situation, we propose a new sequential data model, dubbed Duration and Interval Hidden Markov Model (DI-HMM), that efficiently represents "state duration" and "state interval" of data events. This has significant implications to play an important role in representing practical time-series sequential data. This eventually provides an efficient and flexible sequential data retrieval. Numerical experiments on synthetic and real data demonstrate the efficiency and accuracy of the proposed DI-HMM

    Using parallel computation to improve Independent Metropolis--Hastings based estimation

    Full text link
    In this paper, we consider the implications of the fact that parallel raw-power can be exploited by a generic Metropolis--Hastings algorithm if the proposed values are independent. In particular, we present improvements to the independent Metropolis--Hastings algorithm that significantly decrease the variance of any estimator derived from the MCMC output, for a null computing cost since those improvements are based on a fixed number of target density evaluations. Furthermore, the techniques developed in this paper do not jeopardize the Markovian convergence properties of the algorithm, since they are based on the Rao--Blackwell principles of Gelfand and Smith (1990), already exploited in Casella and Robert (1996), Atchade and Perron (2005) and Douc and Robert (2010). We illustrate those improvements both on a toy normal example and on a classical probit regression model, but stress the fact that they are applicable in any case where the independent Metropolis-Hastings is applicable.Comment: 19 pages, 8 figures, to appear in Journal of Computational and Graphical Statistic

    HYPA: Efficient Detection of Path Anomalies in Time Series Data on Networks

    Full text link
    The unsupervised detection of anomalies in time series data has important applications in user behavioral modeling, fraud detection, and cybersecurity. Anomaly detection has, in fact, been extensively studied in categorical sequences. However, we often have access to time series data that represent paths through networks. Examples include transaction sequences in financial networks, click streams of users in networks of cross-referenced documents, or travel itineraries in transportation networks. To reliably detect anomalies, we must account for the fact that such data contain a large number of independent observations of paths constrained by a graph topology. Moreover, the heterogeneity of real systems rules out frequency-based anomaly detection techniques, which do not account for highly skewed edge and degree statistics. To address this problem, we introduce HYPA, a novel framework for the unsupervised detection of anomalies in large corpora of variable-length temporal paths in a graph. HYPA provides an efficient analytical method to detect paths with anomalous frequencies that result from nodes being traversed in unexpected chronological order.Comment: 11 pages with 8 figures and supplementary material. To appear at SIAM Data Mining (SDM 2020

    Handwritten digit recognition by bio-inspired hierarchical networks

    Full text link
    The human brain processes information showing learning and prediction abilities but the underlying neuronal mechanisms still remain unknown. Recently, many studies prove that neuronal networks are able of both generalizations and associations of sensory inputs. In this paper, following a set of neurophysiological evidences, we propose a learning framework with a strong biological plausibility that mimics prominent functions of cortical circuitries. We developed the Inductive Conceptual Network (ICN), that is a hierarchical bio-inspired network, able to learn invariant patterns by Variable-order Markov Models implemented in its nodes. The outputs of the top-most node of ICN hierarchy, representing the highest input generalization, allow for automatic classification of inputs. We found that the ICN clusterized MNIST images with an error of 5.73% and USPS images with an error of 12.56%

    Probabilistic uncertainty quantification and experiment design for nonlinear models: Applications in systems biology

    Get PDF
    Despite the ever-increasing interest in understanding biology at the system level, there are several factors that hinder studies and analyses of biological systems. First, unlike systems from other applied fields whose parameters can be effectively identified, biological systems are usually unidentifiable, even in the ideal case when all possible system outputs are known with high accuracy. Second, the presence of multivariate bifurcations often leads the system to behaviors that are completely different in nature. In such cases, system outputs (as function of parameters/inputs) are usually discontinuous or have sharp transitions across domains with different behaviors. Finally, models from systems biology are usually strongly nonlinear with large numbers of parameters and complex interactions. This results in high computational costs of model simulations that are required to study the systems, an issue that becomes more and more problematic when the dimensionality of the system increases. Similarly, wet-lab experiments to gather information about the biological model of interest are usually strictly constrained by research budget and experimental settings. The choice of experiments/simulations for inference, therefore, needs to be carefully addressed. ^ The work presented in this dissertation develops strategies to address theoretical and practical limitations in uncertainty quantification and experimental design of non-linear mathematical models, applied in the context of systems biology. This work resolves those issues by focusing on three separate but related approaches: (i) the use of probabilistic frameworks for uncertainty quantification in the face of unidentifiability (ii) the use of behavior discrimination algorithms to study systems with discontinuous model responses and (iii) the use of effective sampling schemes and optimal experimental design to reduce the computational/experimental costs. ^ This cumulative work also places strong emphasis on providing theoretical foundations for the use of the proposed framework: theoretical properties of algorithms at each step in the process are investigated carefully to give more insights about how the algorithms perform, and in many cases, to provide feedback to improve the performance of existing approaches. Through the newly developed procedures, we successfully created a general probabilistic framework for uncertainty quantification and experiment design for non-linear models in the face of unidentifiability, sharp model responses with limited number of model simulations, constraints on experimental setting, and even in the absence of data. The proposed methods have strong theoretical foundations and have also proven to be effective in studies of expensive high-dimensional biological systems in various contexts

    On computational tools for Bayesian data analysis

    Full text link
    While Robert and Rousseau (2010) addressed the foundational aspects of Bayesian analysis, the current chapter details its practical aspects through a review of the computational methods available for approximating Bayesian procedures. Recent innovations like Monte Carlo Markov chain, sequential Monte Carlo methods and more recently Approximate Bayesian Computation techniques have considerably increased the potential for Bayesian applications and they have also opened new avenues for Bayesian inference, first and foremost Bayesian model choice.Comment: This is a chapter for the book "Bayesian Methods and Expert Elicitation" edited by Klaus Bocker, 23 pages, 9 figure
    • …
    corecore