Search CORE

231 research outputs found

High-dimensional clustering

Author: Biernacki Christophe
Maugis Cathy
Publication venue: HAL CCSD
Publication date: 19/09/2017
Field of study

International audienceHigh-dimensional (HD) data sets are now frequent, mostly motivated by technological reasons which concern automation in variable acquisition, cheaper availability of data storage and more powerful standard computers for quick data management possibility. All fields are impacted by this general phenomenon of variable number inflation, only the definition of ``high'' being domain dependent. In marketing, this number can be of order 10e2, in microarray gene expression between 10e2 and 10e4, in text mining 10e3 or more, of order 10e6 for single nucleotide polymorphism (SNP) data, etc. Note also that sometimes much more variables can be involved, what can be typically the case with discretized curves, for instance curves coming from temporal sequences.Such a technological revolution has a huge impact in other scientific fields, as societal or also mathematical ones. In particular, high-dimensional data management brings some new challenges to statisticians since standard (low-dimensional) data analysis methods struggle to directly apply to the new (high-dimensional) data sets. The reason can be twofold, sometimes linked, involving either combinatorial difficulties or disastrously large estimate variance increase. Data analysis methods are essential for providing a synthetic view of data sets, allowing data summary and data exploratory for future decision making for instance. This need is even more acute in the high-dimensional setting since on the one hand the large number of variables suggests that a lot of information is conveyed by data but, in the other hand, such information may be hidden behind their volume

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

HAL Descartes

HAL-INSA Toulouse

MASSICCC: A SaaS Platform for Clustering and Co-Clustering of Mixed Data

Author: Biernacki Christophe
Publication venue: HAL CCSD
Publication date: 15/10/2019
Field of study

International audienc

INRIA a CCSD electronic archive server

Mixture models

Author: Biernacki Christophe
Publication venue: Technip
Publication date: 25/09/2017
Field of study

International audienceFinite mixture models are one of the probabilistic frameworks which reach an especially diverse community of people, including statisticians and practitioners (scientific or not). Initial reasons for being confronted with mixtures may be different for impacted communities but lead finally to close interconnections between them. Indeed, applied statisticians and practitioners usually discover finite mixture models from the numerous application fields where they meet numerous successes. It typically gathers {none,un,semi-} supervised classification and density estimation. The keys of these successes are both their high meaningfulness and flexibility. However, flexibility is in return a matter of algorithmic and mathematical questionings for methodological and theoretical statisticians. In particular, it addresses estimation and model selection issues, on both computational and mathematical aspects. But, solutions to be provided to these issues highly beneficiate to depend on initial related application fields

INRIA a CCSD electronic archive server

HAL Descartes

Model selection theory and considerations in large scale scenarios

Author: Biernacki Christophe
Publication venue: HAL CCSD
Publication date: 15/06/2018
Field of study

International audienc

INRIA a CCSD electronic archive server

HAL Descartes

About Two Disinherited Sides of Statistics: Data Units and Computational Saving

Author: Biernacki Christophe
Publication venue: HAL CCSD
Publication date: 06/04/2017
Field of study

International audienc

INRIA a CCSD electronic archive server

BigStat for Big Data: Big Data clustering through the BigStat SaaS platform

Author: Biernacki Christophe
Publication venue: HAL CCSD
Publication date: 28/10/2016
Field of study

International audienceBigStat is a web platform devoted to clustering of big data sets through two hosted software, MixtComp and BlockCluster. The former adresses mixed, missing and uncertain data in a moderate dimensional setting, whereas the latter is devoted to high dimensional data sets with non-mixed, non-missing and non-un certain data. Mathematical foundations of both rely on mixture models and related algorithms

INRIA a CCSD electronic archive server

Introduction to cluster analysis and classification: Evaluating clustering

Author: Biernacki Christophe
Publication venue: HAL CCSD
Publication date: 21/05/2018
Field of study

International audienc

INRIA a CCSD electronic archive server