Comparison of dimensionality assessment methods in Principal Component Analysis based on permutation tests
Résumé
We compare the performance of several data permutation methods for assessing dimensionality in Principal Component Analysis. We consider the classical Horn's Parallel Analysis, Dray's approach based on the similarity between the data matrix under study and its lower rank approximation and Vitale et al.’s method based on sequential deflation and rank reduction. Their potential is assessed on a large array of simulated data sets accounting for different data correlation structures, data distributions and homo- and heteroscedastic noise, and on 15 experimental data sets from different disciplines, such as metabolomics, proteomics, chemometrics and sensory analysis. In both the simulated and real life case-studies we report differential behaviours of the concerned techniques for which we propose theoretical explanations. The paper also discusses their limits of applicability and some guidelines are offered to practitioners.