MixtComp software: Model-based clustering/imputation with mixed data, missing data and uncertain data

Christophe Biernacki

Communication Dans Un Congrès Année : 2015

MixtComp software: Model-based clustering/imputation with mixed data, missing data and uncertain data

Logiciel MixtComp : Classification et imputation à base de modèles pour données mixtes, manquantes et incertaines

(1, 2)

1
2

Christophe Biernacki

Fonction : Auteur

MOdel for Data Analysis and Learning

Laboratoire Paul Painlevé - UMR 8524

Résumé

The "Big Data" paradigm involves large and complex data sets. Complexity includes both variety (mixed data: continuous and/or categorical and/or ordinal and/or functional...) and missing, or partially missing (binned), items. Clustering is a suitable response for volume but it needs also to deal with complexity, especially as volume promotes complexity emergence. Model-based clustering has demonstrated many theoretical and practical successes (McLachlan 2000), including multivariate mixed data with conditional (Biernacki 2013) or without conditional independence (Marbac et al. 2014). In addition, this full generative design allows to straightforwardly handle missing or binned data (McLachlan 2000; Biernacki 2007). Model estimation can also be performed by simple EM-like algorithms, as the SEM one (Celeux and Diebolt 1985). MixComp is a new R software, written in C++, implementing model-based clustering for multivariate missing/binned/mixed data under the conditional independence assumption (Goodman 1974). Current implemented mixed data are continuous (Gaussian), categorical (multinomial) and integer (Poisson) ones. However, architecture of MixComp is designed for incremental insertion of new kinds of data (ordinal, ranks, functional...) and related models. Currently, MixComp is not freely available as an R package but will be soon freely available through a specific web interface. Beyond its clustering task, it allows also to perform imputation of missing/binned data (with associated confidence intervals) by using the mixture model ability for density estimation as well. Topics will include: mixture models - conditional independence - SEM algorithm - model selection criteria Prerequisites: elementary knowledge of general statistical concepts, of mixture models, of EM algorithm and of standard model selection criteria is assumed. Moreover, basic programming in R is useful.

Domaines

Méthodologie [stat.ME]

slides.pdf (1.62 Mo)

Format : Présentation
Origine : Fichiers produits par l'(les) auteur(s)

Christophe Biernacki : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01253393

Soumis le : dimanche 10 janvier 2016-10:17:41

Dernière modification le : lundi 12 février 2024-15:38:10

Dates et versions

hal-01253393 , version 1 (10-01-2016)

Identifiants

HAL Id : hal-01253393 , version 1

Citer

Christophe Biernacki. MixtComp software: Model-based clustering/imputation with mixed data, missing data and uncertain data. MISSDATA 2015, Jun 2015, Rennes, France. ⟨hal-01253393⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA INRIA2 UNIV-LILLE LPP-MATH

307 Consultations

121 Téléchargements

MixtComp software: Model-based clustering/imputation with mixed data, missing data and uncertain data

Logiciel MixtComp : Classification et imputation à base de modèles pour données mixtes, manquantes et incertaines

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager