Search CORE

2 research outputs found

Goodness-of-Fit Testing for the Newcomb-Benford Law With Application to the Detection of Customs Fraud

Author: Andrea Cerasa (2592886)
Andrea Cerioli (2592877)
Domenico Perrotta (2592883)
Lucio Barabesi (2592880)
Publication venue
Publication date: 28/04/2017
Field of study

The Newcomb-Benford law for digit sequences has recently attracted interest in antifraud analysis. However, most of its applications rely either on diagnostic checks of the data, or on informal decision rules. We suggest a new way of testing the Newcomb-Benford law that turns out to be particularly attractive for the detection of frauds in customs data collected from international trade. Our approach has two major advantages. The first one is that we control the rate of false rejections at each stage of the procedure, as required in antifraud applications. The second improvement is that our testing procedure leads to exact significance levels and does not rely on large-sample approximations. Another contribution of our work is the derivation of a simple expression for the digit distribution when the Newcomb-Benford law is violated, and a bound for a chi-squared type of distance between the actual digit distribution and the Newcomb-Benford one.</p

FigShare

Finding the Number of Normal Groups in Model-Based Clustering via Constrained Likelihoods

Author: Agustín Mayo-Iscar (4515589)
Andrea Cerioli (2592877)
Luis Angel García-Escudero (4515586)
Marco Riani (4515592)
Publication venue
Publication date: 16/10/2017
Field of study

Deciding the number of clusters k is one of the most difficult problems in cluster analysis. For this purpose, complexity-penalized likelihood approaches have been introduced in model-based clustering, such as the well known BIC and ICL criteria. However, the classification/mixture likelihoods considered in these approaches are unbounded without any constraint on the cluster scatter matrices. Constraints also prevent traditional EM and CEM algorithms from being trapped in (spurious) local maxima. Controlling the maximal ratio between the eigenvalues of the scatter matrices to be smaller than a fixed constant c ≥ 1 is a sensible idea for setting such constraints. A new penalized likelihood criterion which takes into account the higher model complexity that a higher value of c entails, is proposed. Based on this criterion, a novel and fully automated procedure, leading to a small ranked list of optimal (k, c) couples is provided. A new plot called “car-bike” which provides a concise summary of the solutions is introduced. The performance of the procedure is assessed both in empirical examples and through a simulation study as a function of cluster overlap. Supplemental materials for the article are available online.</p

FigShare