143 research outputs found

    Imprecise Continuous-Time Markov Chains

    Get PDF
    Continuous-time Markov chains are mathematical models that are used to describe the state-evolution of dynamical systems under stochastic uncertainty, and have found widespread applications in various fields. In order to make these models computationally tractable, they rely on a number of assumptions that may not be realistic for the domain of application; in particular, the ability to provide exact numerical parameter assessments, and the applicability of time-homogeneity and the eponymous Markov property. In this work, we extend these models to imprecise continuous-time Markov chains (ICTMC's), which are a robust generalisation that relaxes these assumptions while remaining computationally tractable. More technically, an ICTMC is a set of "precise" continuous-time finite-state stochastic processes, and rather than computing expected values of functions, we seek to compute lower expectations, which are tight lower bounds on the expectations that correspond to such a set of "precise" models. Note that, in contrast to e.g. Bayesian methods, all the elements of such a set are treated on equal grounds; we do not consider a distribution over this set. The first part of this paper develops a formalism for describing continuous-time finite-state stochastic processes that does not require the aforementioned simplifying assumptions. Next, this formalism is used to characterise ICTMC's and to investigate their properties. The concept of lower expectation is then given an alternative operator-theoretic characterisation, by means of a lower transition operator, and the properties of this operator are investigated as well. Finally, we use this lower transition operator to derive tractable algorithms (with polynomial runtime complexity w.r.t. the maximum numerical error) for computing the lower expectation of functions that depend on the state at any finite number of time points

    Priors on network structures. Biasing the search for Bayesian networks

    Get PDF
    AbstractIn this paper we show how a user can influence recovery of Bayesian networks from a database by specifying prior knowledge. The main novelty of our approach is that the user only has to provide partial prior knowledge, which is then completed to a full prior over all possible network structures. This partial prior knowledge is expressed among variables in an intuitive pairwise way, which embodies the uncertainty of the user about his/her own prior knowledge. Thus, the uncertainty of the model is updated in the normal Bayesian way

    Complex Data: Mining using Patterns

    Get PDF
    There is a growing need to analyse sets of complex data, i.e., data in which the individual data items are (semi-) structured collections of data themselves, such as sets of time-series. To perform such analysis, one has to redefine familiar notions such as similarity on such complex data types. One can do that either on the data items directly, or indi- rectly, based on features or patterns computed from the individual data items. In this paper, we argue that wavelet decomposition is a general tool for the latter approac

    Wavelet transform in similarity paradigm

    Get PDF
    [INS-R9802] Searching for similarity in time series finds still broader applications in data mining. However, due to the very broad spectrum of data involved, there is no possibility of defining one single notion of similarity suitable to serve all applications. We present a powerful framework based on wavelet decomposition, which allows designing and implementing a variety of criteria for the evaluation of similarity between time series. As an example, two main classes of similarity measures are considered. One is the global, statistical similarity which uses the wavelet transform derived Hurst exponent to classify time series according to their global scaling properties. The second measure estimates similarity locally using the scale-position bifurcation representation derived from the wavelet transform modulus maxima representation of the time series. A variety of generic or custom designed matching criteria can be incorporated into the detail similarity measure. We demonstrate the ability of the technique to deal with the presence of scaling, translation and polynomial bias and we also test sensitivity to the addition of random noise. Other criteria can be designed and this flexibility can be built into the data mining system to allow for specific user requirements.#[INS-R9815] For the majority of data mining applications, there are no models of data which would facilitate the tasks of comparing records of time series, thus leaving one with `noise' as the only description. We propose a generic approach to comparing noise time series using the largest deviations from consistent statistical behaviour. For this purpose we use a powerful framework based on wavelet decomposition, which allows filtering polynomial bias, while capturing the essential singular behaviour. In particular we are able to reveal scale-wise ranking of singular events including their scale-free characteristic: the Hölder exponent. We use such characteristics to design a compact representation of the time series suitable for direct comparison, e.g. evaluation of the correlation product. We demonstrate that the distance between such representations closely corresponds to the subjective feeling of similarity between the time series. In order to test the validity of subjective criteria, we test the records of currency exchanges, finding convincing levels of (local) correlation

    The Haar Wavelet Transform in the Time Series Similarity Paradigm

    Get PDF
    Similarity measures play an important role in many data mining algorithms. To allow the use of such algorithms on non-standard databases, such as databases of financial time series, their similarity measure has to be defined. We present a simple and powerful technique which allows for the rapid evaluation of similarity between time series in large data bases. It is based on the orthonormal decomposition of the time series into the Haar basis. We demonstrate that this approach is capable of providing estimates of the local slope of the time series in the sequence of multi-resolution steps. The Haar representation and a number of related represenations derived from it are suitable for direct comparison, e.g. evaluation of the correlation product. We demonstrate that the distance between such representations closely corresponds to the subjective feeling of similarity between the time series. In order to test the validity of subjective criteria, we test the records of currency exchanges, finding convincing levels of correlation

    Wavelet transform in similarity paradigm II

    Get PDF
    For the majority of data mining applications, there are no models of data which would facilitate the tasks of comparing records of time series, thus leaving one with `noise' as the only description. We propose a generic approach to comparing noise time series using the largest deviations from consistent statistical behaviour. For this purpose we use a powerful framework based on wavelet decomposition, which allows filtering polynomial bias, while capturing the essential singular behaviour. In particular we are able to reveal scale-wise ranking of singular events including their scale-free characteristic: the H"older exponent. We use such characteristics to design a compact representation of the time series suitable for direct comparison, e.g. evaluation of the correlation product. We demonstrate that the distance between such representations closely corresponds to the subjective feeling of similarity between the time series. In order to test the validity of subjective criteria, we test the records of currency exchanges, finding convincing levels of (local) correlation

    Scaling Bayesian network discovery through incremental recovery

    Get PDF
    Bayesian networks are a type of graphical models that, e.g., allow one to analyze the interaction among the variables in a database. A well-known problem with the discovery of such models from a database is the ``problem of high-dimensionality''. That is, the discovery of a network from a database with a moderate to large number of variables quickly becomes intractable. Most solutions towards this problem have relied on prior knowledge on the structure of the network, e.g., through the definition of an order on the variables. With a growing number of variables, however, this becomes a considerable burden on the data miner. Moreover, mistakes in such prior knowledge have large effects on the final network. Another approach is rather than asking the expert insight in the structure of the final network, asking the database. Our work fits in this approach. More in particular, before we start recovering the network, we first cluster the variables based on a chi-squared measure of association. Then we use an incremental algorithm to discover the network. This algorithm uses the small networks discovered for the individual clusters of variables as its starting point. We illustrate the feasibility of our approach with some experiments. More in particular, we show that in the case where one knows the network, and thus the order, our algorithm yields almost the same network which is, moreover, still an I-map

    Outlier detection and localisation with wavelet based multifractal formalism

    Get PDF
    We present a method of detecting and localising outliers in stochastic processes. The method checks the internal consistency of the scaling behaviour of the process within the paradigm of the multifractal spectrum. Deviation from the expected spectrum is interpreted as the potential presence of outliers. The detection part of the method is then supplemented by the localisation analysis part, using the local scaling properties of the time series. Localised outliers can then be removed one by one, with the possibility of dynamic verification of spectral properties. Both the multifractal spectrum formalism and the local scaling properties of the time series are implemented on the wavelet transform modulus maxima tree
    corecore