Solving the missing value problem in PCA by Orthogonalized-Alternating Least Squares (O-ALS) - Université de Lille
Article Dans Une Revue Chemometrics and Intelligent Laboratory Systems Année : 2024

Solving the missing value problem in PCA by Orthogonalized-Alternating Least Squares (O-ALS)

Résumé

Dealing with missing data poses a challenge in Principal Component Analysis (PCA) since the most common algorithms are not designed to handle them. Several approaches have been proposed to solve the missing value problem in PCA, such as Imputation based on SVD (I-SVD), where missing entries are filled by imputation and updated in every iteration until convergence of the PCA model, and the adaptation of the Nonlinear Iterative Partial Least Squares (NIPALS) algorithm, able to work skipping the missing entries during the least-squares estimation of scores and loadings. However, some limitations have been reported for both approaches. On the one hand, convergence of the I-SVD algorithm can be very slow for data sets with a high percentage of missing data. On the other hand, the orthogonality properties among scores and loadings might be lost when using NIPALS. To solve these issues and perform PCA of data sets with missing values without the need of imputation steps, a novel algorithm called Orthogonalized-Alternating Least Squares (O-ALS) is proposed. The O-ALS algorithm is an alternating least-squares algorithm that estimates the scores and loadings subject to the Gram-Schmidt orthogonalization constraint. The way to estimate scores and loadings is adapted to work only with the available information. In this study, the performance of O-ALS is tested and compared with NIPALS and I-SVD in simulated data sets and in a real case study. The results show that O-ALS is an accurate and fast algorithm to analyze data with any percentage and distribution pattern of missing entries, being able to provide correct scores and loadings in cases where I-SVD and NIPALS do not perform satisfactorily.

Dates et versions

hal-04818014 , version 1 (04-12-2024)

Identifiants

Citer

Adrian Gomez Sanchez, Raffaele Vitale, Cyril Ruckebusch, Anna de Juan. Solving the missing value problem in PCA by Orthogonalized-Alternating Least Squares (O-ALS). Chemometrics and Intelligent Laboratory Systems, 2024, Chemometrics Intell. Lab. Syst., 250, ⟨10.1016/j.chemolab.2024.105153⟩. ⟨hal-04818014⟩
0 Consultations
0 Téléchargements

Altmetric

Partager

More