Current Issue : July - September Volume : 2016 Issue Number : 3 Articles : 5 Articles
Background: Low-cost DNA sequencing allows organizations to accumulate massive amounts of genomic data and\nuse that data to answer a diverse range of research questions. Presently, users must search for relevant genomic data\nusing a keyword, accession number of meta-data tag. However, in this search paradigm the form of the query ââ?¬â?? a\ntext-based string ââ?¬â?? is mismatched with the form of the target ââ?¬â?? a genomic profile.\nResults: To improve access to massive genomic data resources, we have developed a fast search engine, GEMINI,\nthat uses a genomic profile as a query to search for similar genomic profiles. GEMINI implements a nearest-neighbor\nsearch algorithm using a vantage-point tree to store a database of n profiles and in certain circumstances achieves an\nO(log n) expected query time in the limit. We tested GEMINI on breast and ovarian cancer gene expression data from\nThe Cancer Genome Atlas project and show that it achieves a query time that scales as the logarithm of the number\nof records in practice on genomic data. In a database with 105 samples, GEMINI identifies the nearest neighbor in 0.05\nsec compared to a brute force search time of 0.6 sec.\nConclusions: GEMINI is a fast search engine that uses a query genomic profile to search for similar profiles in a very\nlarge genomic database. It enables users to identify similar profiles independent of sample label, data origin or other\nmeta-data information....
Protein-protein interaction (PPI) prediction is a central task in achieving a better understanding of cellular and\nintracellular processes. Because high-throughput experimental methods are both expensive and time-consuming,\nand are also known of suffering from the problems of incompleteness and noise, many computational methods have\nbeen developed, with varied degrees of success. However, the inference of PPI network from multiple heterogeneous\ndata sources remains a great challenge. In this work, we developed a novel method based on approximate Bayesian\ncomputation and modified differential evolution sampling (ABC-DEP) and regularized laplacian (RL) kernel. The\nmethod enables inference of PPI networks from topological properties and multiple heterogeneous features including\ngene expression and Pfam domain profiles, in forms of weighted kernels. The optimal weights are obtained by ABCDEP,\nand the kernel fusion built based on optimal weights serves as input to RL to infer missing or new edges in the\nPPI network. Detailed comparisons with control methods have been made, and the results show that the accuracy of\nPPI prediction measured by AUC is increased by up to 23 %, as compared to a baseline without using optimal weights.\nThe method can provide insights into the relations between PPIs and various feature kernels and demonstrates strong\ncapability of predicting faraway interactions that cannot be well detected by traditional RL method....
Ever...
High dimensionality of microarray data sets may lead to low efficiency and overfitting. In this paper, a multiphase cooperative\ngame theoretic feature selection approach is proposed for microarray data classification. In the first phase, due to high dimension\nof microarray data sets, the features are reduced using one of the two filter-based feature selection methods, namely, mutual\ninformation and Fisher ratio. In the second phase, Shapley index is used to evaluate the power of each feature. The main innovation\nof the proposed approach is to employ Qualitative Mutual Information (QMI) for this purpose. The idea of Qualitative Mutual\nInformation causes the selected features to have more stability and this stability helps to deal with the problem of data imbalance\nand scarcity. In the third phase, a forward selection scheme is applied which uses a scoring function to weight each feature. The\nperformance of the proposed method is compared with other popular feature selection algorithms such as Fisher ratio, minimum\nredundancy maximum relevance, and previous works on cooperative game based feature selection. The average classification\naccuracy on eleven microarray data sets shows that the proposed method improves both average accuracy and average stability\ncompared to other approaches....
RNA interference (RNAi) screening is extensively used in the field of reverse genetics. RNAi libraries constructed\nusing random oligonucleotides have made this technology affordable. However, the new methodology requires\nexploration of the RNAi target gene information after screening because the RNAi library includes non-natural\nsequences that are not found in genes. Here, we developed a web-based tool to support RNAi screening. The\nsystem performs short hairpin RNA (shRNA) target prediction that is informed by comprehensive enquiry (SPICE).\nSPICE automates several tasks that are laborious but indispensable to evaluate the shRNAs obtained by RNAi\nscreening. SPICE has four main functions: (i) sequence identification of shRNA in the input sequence (the sequence\nmight be obtained by sequencing clones in the RNAi library), (ii) searching the target genes in the database, (iii)\ndemonstrating biological information obtained from the database, and (iv) preparation of search result files that can\nbe utilized in a local personal computer (PC). Using this system, we demonstrated that genes targeted by random\noligonucleotide-derived shRNAs were not different from those targeted by organism-specific shRNA. The system\nfacilitates RNAi screening, which requires sequence analysis after screening. The SPICE web application is available\nat http://www.spice.sugysun.org/....
Loading....