Background: Sparse principal component analysis (PCA) is a popular tool for dimensionality reduction, pattern\nrecognition, and visualization of high dimensional data. It has been recognized that complex biological mechanisms\noccur through concerted relationships of multiple genes working in networks that are often represented by graphs.\nRecent work has shown that incorporating such biological information improves feature selection and prediction\nperformance in regression analysis, but there has been limited work on extending this approach to PCA. In this article,\nwe propose two new sparse PCA methods called Fused and Grouped sparse PCA that enable incorporation of prior\nbiological information in variable selection.\nResults: Our simulation studies suggest that, compared to existing sparse PCA methods, the proposed methods\nachieve higher sensitivity and specificity when the graph structure is correctly specified, and are fairly robust to\nmisspecified graph structures. Application to a glioblastoma gene expression dataset identified pathways that are\nsuggested in the literature to be related with glioblastoma.\nConclusions: The proposed sparse PCA methods Fused and Grouped sparse PCA can effectively incorporate prior\nbiological information in variable selection, leading to improved feature selection and more interpretable principal\ncomponent loadings and potentially providing insights on molecular underpinnings of complex diseases.
Loading....