MixtComp software: Model-based clustering/imputation with mixed data, missing data and uncertain data - Université de Lille Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

MixtComp software: Model-based clustering/imputation with mixed data, missing data and uncertain data

Logiciel MixtComp : Classification et imputation à base de modèles pour données mixtes, manquantes et incertaines

Résumé

The "Big Data" paradigm involves large and complex data sets. Complexity includes both variety (mixed data: continuous and/or categorical and/or ordinal and/or functional...) and missing, or partially missing (binned), items. Clustering is a suitable response for volume but it needs also to deal with complexity, especially as volume promotes complexity emergence. Model-based clustering has demonstrated many theoretical and practical successes (McLachlan 2000), including multivariate mixed data with conditional (Biernacki 2013) or without conditional independence (Marbac et al. 2014). In addition, this full generative design allows to straightforwardly handle missing or binned data (McLachlan 2000; Biernacki 2007). Model estimation can also be performed by simple EM-like algorithms, as the SEM one (Celeux and Diebolt 1985). MixComp is a new R software, written in C++, implementing model-based clustering for multivariate missing/binned/mixed data under the conditional independence assumption (Goodman 1974). Current implemented mixed data are continuous (Gaussian), categorical (multinomial) and integer (Poisson) ones. However, architecture of MixComp is designed for incremental insertion of new kinds of data (ordinal, ranks, functional...) and related models. Currently, MixComp is not freely available as an R package but will be soon freely available through a specific web interface. Beyond its clustering task, it allows also to perform imputation of missing/binned data (with associated confidence intervals) by using the mixture model ability for density estimation as well. Topics will include: mixture models - conditional independence - SEM algorithm - model selection criteria Prerequisites: elementary knowledge of general statistical concepts, of mixture models, of EM algorithm and of standard model selection criteria is assumed. Moreover, basic programming in R is useful.
slides.pdf (1.62 Mo) Télécharger le fichier
Format : Présentation
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01253393 , version 1 (10-01-2016)

Identifiants

  • HAL Id : hal-01253393 , version 1

Citer

Christophe Biernacki. MixtComp software: Model-based clustering/imputation with mixed data, missing data and uncertain data. MISSDATA 2015, Jun 2015, Rennes, France. ⟨hal-01253393⟩
307 Consultations
121 Téléchargements

Partager

Gmail Facebook X LinkedIn More