Current Issue : January - March Volume : 2018 Issue Number : 1 Articles : 5 Articles
This paper addresses the question of biomarker discovery in proteomics. Given clinical data regarding a list of\nproteins for a set of individuals, the tackled problem is to extract a short subset of proteins the concentrations of\nwhich are an indicator of the biological status (healthy or pathological). In this paper, it is formulated as a specific\ninstance of variable selection. The originality is that the proteins are not investigated one after the other but the best\npartition between discriminant and non-discriminant proteins is directly sought. In this way, correlations between the\nproteins are intrinsically taken into account in the decision. The developed strategy is derived in a Bayesian setting,\nand the decision is optimal in the sense that it minimizes a global mean error. It is finally based on the posterior\nprobabilities of the partitions. The main difficulty is to calculate these probabilities since they are based on the\nso-called evidence that require marginalization of all the unknown model parameters. Two models are presented that\nrelate the status to the protein concentrations, depending whether the latter are biomarkers or not. The first model\naccounts for biological variabilities by assuming that the concentrations are Gaussian distributed with a mean and a\ncovariance matrix that depend on the status only for the biomarkers. The second one is an extension that also takes\ninto account the technical variabilities that may significantly impact the observed concentrations. The main\ncontributions of the paper are: (1) a new Bayesian formulation of the biomarker selection problem, (2) the closed-form\nexpression of the posterior probabilities in the noiseless case, and (3) a suitable approximated solution in the noisy\ncase. The methods are numerically assessed and compared to the state-of-the-art methods (t test, LASSO,\nBattacharyya distance, FOHSIC) on synthetic and real data from proteins quantified in human serum by mass\nspectrometry in selected reaction monitoring mode....
Background: Sparse principal component analysis (PCA) is a popular tool for dimensionality reduction, pattern\nrecognition, and visualization of high dimensional data. It has been recognized that complex biological mechanisms\noccur through concerted relationships of multiple genes working in networks that are often represented by graphs.\nRecent work has shown that incorporating such biological information improves feature selection and prediction\nperformance in regression analysis, but there has been limited work on extending this approach to PCA. In this article,\nwe propose two new sparse PCA methods called Fused and Grouped sparse PCA that enable incorporation of prior\nbiological information in variable selection.\nResults: Our simulation studies suggest that, compared to existing sparse PCA methods, the proposed methods\nachieve higher sensitivity and specificity when the graph structure is correctly specified, and are fairly robust to\nmisspecified graph structures. Application to a glioblastoma gene expression dataset identified pathways that are\nsuggested in the literature to be related with glioblastoma.\nConclusions: The proposed sparse PCA methods Fused and Grouped sparse PCA can effectively incorporate prior\nbiological information in variable selection, leading to improved feature selection and more interpretable principal\ncomponent loadings and potentially providing insights on molecular underpinnings of complex diseases....
Background: Gene Ontology (GO) is a community effort to represent functional features of gene products. GO\nannotations (GOA) provide functional associations between GO terms and gene products. Due to resources limitation,\nonly a small portion of annotations are manually checked by curators, and the others are electronically inferred.\nAlthough quality control techniques have been applied to ensure the quality of annotations, the community\nconsistently report that there are still considerable noisy (or incorrect) annotations. Given the wide application of\nannotations, however, how to identify noisy annotations is an important but yet seldom studied open problem.\nResults: We introduce a novel approach called NoGOA to predict noisy annotations. NoGOA applies sparse\nrepresentation on the gene-term association matrix to reduce the impact of noisy annotations, and takes advantage\nof sparse representation coefficients to measure the semantic similarity between genes. Secondly, it preliminarily\npredicts noisy annotations of a gene based on aggregated votes from semantic neighborhood genes of that gene.\nNext, NoGOA estimates the ratio of noisy annotations for each evidence code based on direct annotations in GOA files\narchived on different periods, and then weights entries of the association matrix via estimated ratios and propagates\nweights to ancestors of direct annotations using GO hierarchy. Finally, it integrates evidence-weighted association\nmatrix and aggregated votes to predict noisy annotations. Experiments on archived GOA files of six model species (H.\nsapiens, A. thaliana, S. cerevisiae, G. gallus, B. Taurus and M. musculus) demonstrate that NoGOA achieves significantly\nbetter results than other related methods and removing noisy annotations improves the performance of gene\nfunction prediction.\nConclusions: The comparative study justifies the effectiveness of integrating evidence codes with sparse\nrepresentation for predicting noisy GO annotations. Codes and datasets are available at http://mlda.swu.edu.cn/\ncodes.php?name=NoGOA....
Background: As a newly emerged research area, RNA epigenetics has drawn increasing attention recently for the\nparticipation of RNA methylation and other modifications in a number of crucial biological processes. Thanks to\nhigh throughput sequencing techniques, such as, MeRIP-Seq, transcriptome-wide RNA methylation profile is now\navailable in the form of count-based data, with which it is often of interests to study the dynamics at\nepitranscriptomic layer. However, the sample size of RNA methylation experiment is usually very small due to its\ncosts; and additionally, there usually exist a large number of genes whose methylation level cannot be accurately\nestimated due to their low expression level, making differential RNA methylation analysis a difficult task.\nResults: We present QNB, a statistical approach for differential RNA methylation analysis with count-based smallsample\nsequencing data. Compared with previous approaches such as DRME model based on a statistical test covering\nthe IP samples only with 2 negative binomial distributions, QNB is based on 4 independent negative binomial\ndistributions with their variances and means linked by local regressions, and in the way, the input control samples are\nalso properly taken care of. In addition, different from DRME approach, which relies only the input control sample only\nfor estimating the background, QNB uses a more robust estimator for gene expression by combining information from\nboth input and IP samples, which could largely improve the testing performance for very lowly expressed genes.\nConclusion: QNB showed improved performance on both simulated and real MeRIP-Seq datasets when compared\nwith competing algorithms. And the QNB model is also applicable to other datasets related RNA modifications,\nincluding but not limited to RNA bisulfite sequencing, m1A-Seq, Par-CLIP, RIP-Seq, etc....
Background: Detecting local correlations in expression between neighboring genes along the genome has proved\nto be an effective strategy to identify possible causes of transcriptional deregulation in cancer. It has been successfully\nused to illustrate the role of mechanisms such as copy number variation (CNV) or epigenetic alterations as factors that\nmay significantly alter expression in large chromosomal regions (gene silencing or gene activation).\nResults: The identification of correlated regions requires segmenting the gene expression correlation matrix into\nregions of homogeneously correlated genes and assessing whether the observed local correlation is significantly\nhigher than the background chromosomal correlation. A unified statistical framework is proposed to achieve these\ntwo tasks, where optimal segmentation is efficiently performed using dynamic programming algorithm, and\ndetection of highly correlated regions is then achieved using an exact test procedure. We also propose a simple and\nefficient procedure to correct the expression signal for mechanisms already known to impact expression correlation.\nThe performance and robustness of the proposed procedure, called SegCorr, are evaluated on simulated data. The\nprocedure is illustrated on cancer data, where the signal is corrected for correlations caused by copy number\nvariation. It permitted the detection of regions with high correlations linked to epigenetic marks like DNA methylation.\nConclusions: SegCorr is a novel method that performs correlation matrix segmentation and applies a test procedure\nin order to detect highly correlated regions in gene expression...
Loading....