2 research outputs found

    Statistical Detection of Collective Data Fraud

    Full text link
    Statistical divergence is widely applied in multimedia processing, basically due to regularity and interpretable features displayed in data. However, in a broader range of data realm, these advantages may no longer be feasible, and therefore a more general approach is required. In data detection, statistical divergence can be used as a similarity measurement based on collective features. In this paper, we present a collective detection technique based on statistical divergence. The technique extracts distribution similarities among data collections, and then uses the statistical divergence to detect collective anomalies. Evaluation shows that it is applicable in the real world.Comment: 6 pages, 6 figures and tables, submitted to ICME 202

    Sequence Modeling with Mixtures of Conditional Maximum Entropy Distributions

    No full text
    We present a novel approach to modeling sequences using mixtures of conditional maximum entropy distributions. Our method generalizes the mixture of first-order Markov models by including the "long-term" dependencies in model components. The "long-term" dependencies are represented by the frequently used in the natural language processing (NLP) domain probabilistic triggers or rules (such as "A occurred k positions back =) the current symbol is B with probability P "). The maximum entropy framework is then used to create a coherent probabilistic model from all triggers selected for modeling. In order to represent hidden or unobserved effects in the data we use probabilistic mixtures with maximum entropy models as components. We demonstrate how our mixture of conditional maximum entropy models can be learned from data using the EM algorithm that scales linearly in the dimensions of the data and the number of mixture components. We present empirical results on the simulated and real-world data sets and demonstrate that the proposed approach enables us to create better quality models than the mixtures of first-order Markov models and resist overfitting and curse of dimensionality that would inevitably present themselves for the higher order Markov models
    corecore