30 research outputs found

    Scoring anomalies: a M-estimation formulation

    No full text
    International audienceIt is the purpose of this paper to formulate the issue of scoring multivariate observations depending on their degree of abnormal-ity/novelty as an unsupervised learning task. Whereas in the 1-d situation, this problem can be dealt with by means of tail estimation techniques, observations being viewed as all the more "abnormal" as they are located far in the tail(s) of the underlying probability distribution. In a wide variety of applications , it is desirable to dispose of a scalar valued "scoring" function allowing for comparing the degree of abnormality of multi-variate observations. Here we formulate the issue of scoring anomalies as a M-estimation problem. A (functional) performance criterion is proposed, whose optimal elements are, as expected, nondecreasing transforms of the density. The question of empirical estimation of this criterion is tackled and preliminary statistical results related to the accuracy of partition-based techniques for optimizing empirical estimates of the empirical performance measure are establishe

    Kantorovich Distances between Rankings with Applications to Rank Aggregation

    No full text
    International audienc

    ON-LINE GOSSIP-BASED DISTRIBUTED EXPECTATION MAXIMIZATION ALGORITHM

    No full text
    IEEE Statistical Signal Processing Workshop (SSP

    Multiresolution analysis of incomplete rankings with applications to prediction

    No full text
    International audienceData representing preferences of users are a typical example of the Big Datasets modern technologies, such as e- commerce portals, now permit to collect, in an explicit or implicit fashion. Such data are highly complex, insofar as the number of items n for which users may possibly express their preferences is explosive and the collection of items or products a given user actually examines and is capable of comparing is highly variable and of extremely low cardinality compared to n. It is the main purpose of this paper to promote a new representation of preference data, viewed as incomplete rankings. In contrast to alternative approaches, the very nature of preference data is preserved by the "multiscale analysis" we propose, identifying here "scale" with the set of items over which preferences are expressed, whose construction relies on recent results in algebraic topology. The representation of preference data it provides shares similarities with wavelet multiresolution analysis on a Euclidean space and can be computed at a reasonable cost given the complexity of the original data. Beyond computational and theoretical advantages, the "wavelet like" transform is shown to compress preference data into relatively few basis coefficients and thus facilitates statistical tasks such as distribution estimation or prediction. This is illustrated here by very encouraging empirical work based on popular benchmark real dataset
    corecore