Search CORE

2 research outputs found

Statistical Detection of Collective Data Fraud

Author: Chen Shiping
Hu Xiaobo
Li Guoqiang
Liu Jianquan
Sun Daniel
Wang Ruoyu
Wong Raymond
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/11/2020
Field of study

Statistical divergence is widely applied in multimedia processing, basically due to regularity and interpretable features displayed in data. However, in a broader range of data realm, these advantages may no longer be feasible, and therefore a more general approach is required. In data detection, statistical divergence can be used as a similarity measurement based on collective features. In this paper, we present a collective detection technique based on statistical divergence. The technique extracts distribution similarities among data collections, and then uses the statistical divergence to detect collective anomalies. Evaluation shows that it is applicable in the real world.Comment: 6 pages, 6 figures and tables, submitted to ICME 202

arXiv.org e-Print Archive

Crossref

Sequence Modeling with Mixtures of Conditional Maximum Entropy Distributions

Author: Dmitry Pavlov
Publication venue
Publication date
Field of study

We present a novel approach to modeling sequences using mixtures of conditional maximum entropy distributions. Our method generalizes the mixture of first-order Markov models by including the "long-term" dependencies in model components. The "long-term" dependencies are represented by the frequently used in the natural language processing (NLP) domain probabilistic triggers or rules (such as "A occurred k positions back =) the current symbol is B with probability P "). The maximum entropy framework is then used to create a coherent probabilistic model from all triggers selected for modeling. In order to represent hidden or unobserved effects in the data we use probabilistic mixtures with maximum entropy models as components. We demonstrate how our mixture of conditional maximum entropy models can be learned from data using the EM algorithm that scales linearly in the dimensions of the data and the number of mixture components. We present empirical results on the simulated and real-world data sets and demonstrate that the proposed approach enables us to create better quality models than the mixtures of first-order Markov models and resist overfitting and curse of dimensionality that would inevitably present themselves for the higher order Markov models

CiteSeerX