Online Principal Component Analysis in High Dimension: Which Algorithm to Choose?
Affiliation auteurs | Affiliation ok |
Titre | Online Principal Component Analysis in High Dimension: Which Algorithm to Choose? |
Type de publication | Journal Article |
Year of Publication | 2018 |
Auteurs | Cardot H, Degras D |
Journal | INTERNATIONAL STATISTICAL REVIEW |
Volume | 86 |
Pagination | 29-50 |
Date Published | APR |
Type of Article | Article |
ISSN | 0306-7734 |
Mots-clés | Eigenvalue decomposition, generalised Hebbian algorithm, incremental SVD, perturbation methods, Stochastic gradient |
Résumé | Principal component analysis (PCA) is a method of choice for dimension reduction. In the current context of data explosion, online techniques that do not require storing all data in memory are indispensable to perform the PCA of streaming data and/or massive data. Despite the wide availability of recursive algorithms that can efficiently update the PCA when new data are observed, the literature offers little guidance on how to select a suitable algorithm for a given application. This paper reviews the main approaches to online PCA, namely, perturbation techniques, incremental methods and stochastic optimisation, and compares the most widely employed techniques in terms statistical accuracy, computation time and memory requirements using artificial and real data. Extensions of online PCA to missing data and to functional data are detailed. All studied algorithms are available in the package onlinePCA on CRAN. |
DOI | 10.1111/insr.12220 |