Current Issue : January - March Volume : 2017 Issue Number : 1 Articles : 7 Articles
The evolution of sequencing technology has lead to an enormous increase in the number of genomes that have been sequenced. This is\nespecially true in the field of virus genomics. In order to extract meaningful biological information from these genomes, whole genome\ndata mining software tools must be utilized. Hundreds of tools have been developed to analyze biological sequence data. However,\nonly some of these tools are user-friendly to biologists. Several of these tools that have been successfully used to analyze adenovirus\ngenomes are described here. These include Artemis, EMBOSS, pDRAW, zPicture, CoreGenes, Gene Order, and PipMaker. These tools\nprovide functionalities such as visualization, restriction enzyme analysis, alignment, and proteome comparisons that are extremely\nuseful in the bioinformatics analysis of adenovirus genomes....
The rapidly expanding corpus of medical research literature presents major challenges in the understanding of\nprevious work, the extraction of maximum information from collected data, and the identification of promising\nresearch directions. We present a case for the use of advanced machine learning techniques as an aide in this task and\nintroduce a novel methodology that is shown to be capable of extracting meaningful information from large\nlongitudinal corpora and of tracking complex temporal changes within it. Our framework is based on (i) the\ndiscretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model,\nand (iii) a temporal similarity graph which allows for the modelling of complex topic changes. More specifically, this is\nthe first work that discusses and distinguishes between two groups of particularly challenging topic evolution\nphenomena: topic splitting and speciation and topic convergence and merging, in addition to the more widely\nrecognized emergence and disappearance and gradual evolution. The proposed framework is evaluated on a public\nmedical literature corpus...
By 2050, it is estimated that the number of worldwide Alzheimer�s disease (AD) patients will quadruple from the\ncurrent number of 36 million, while no proven disease-modifying treatments are available. At present, the underlying\ndisease mechanisms remain under investigation, and recent studies suggest that the disease involves multiple\netiological pathways. To better understand the disease and develop treatment strategies, a number of ongoing studies\nincluding the Alzheimer�s Disease Neuroimaging Initiative (ADNI) enroll many study participants and acquire a large\nnumber of biomarkers from various modalities including demographic, genotyping, fluid biomarkers, neuroimaging,\nneuropsychometric test, and clinical assessments. However, a systematic approach that can integrate all the collected\ndata is lacking. The overarching goal of our study is to use machine learning techniques to understand the relationships\namong different biomarkers and to establish a system-level model that can better describe the interactions among\nbiomarkers and provide superior diagnostic and prognostic information. In this pilot study, we use Bayesian network\n(BN) to analyze multimodal data from ADNI, including demographics, volumetric MRI, PET, genotypes, and\nneuropsychometric measurements and demonstrate our approach to have superior prediction accuracy....
With the exponential growth in the capacity of information generated and the emerging need for data to be stored for prolonged\nperiod of time, there emerges a need for a storage medium with high capacity, high storage density, and possibility to withstand\nextreme environmental conditions. DNA emerges as the prospective medium for data storage with its striking features. Diverse\nencoding models for reading and writing data onto DNA, codes for encrypting data which addresses issues of error generation,\nand approaches for developing codons and storage styles have been developed over the recent past. DNA has been identified as\na potential medium for secret writing, which achieves the way towards DNA cryptography and stenography. DNA utilized as an\norganic memory device along with big data storage and analytics in DNA has paved the way towards DNA computing for solving\ncomputational problems. This paper critically analyzes the various methods used for encoding and encrypting data onto DNA\nwhile identifying the advantages and capability of every scheme to overcome the drawbacks identified priorly. Cryptography and\nstenography techniques have been analyzed in a critical approach while identifying the limitations of each method. This paper also\nidentifies the advantages and limitations of DNA as a memory device and memory applications....
A prediction method of protein disulfide bond based on support vector machine and\nsample selection is proposed in this paper. First, the protein sequences selected are\nencoded according to a certain encoding, input data for the prediction model of protein\ndisulfide bond is generated; Then sample selection technique is used to select a\nportion of input data as training samples of support vector machine; finally the prediction\nmodel training samples trained is used to predict protein disulfide bond. The\nresult of simulation experiment shows that the prediction model based on support\nvector machine and sample selection can increase the prediction accuracy of protein\ndisulfide bond....
Background.Quaternary structures of proteins are closely relevant to gene regulation, signal transduction, andmany other biological\nfunctions of proteins. In the current study, a new method based on protein-conservedmotif composition in block format for feature\nextraction is proposed, which is termed block composition. Results. The protein quaternary assembly states prediction system\nwhich combines blocks with functional domain composition, called QuaBingo, is constructed by three layers of classifiers that can\ncategorize quaternary structural attributes of monomer, homooligomer, and heterooligomer.The building of the first layer classifier\nuses support vector machines (SVM) based on blocks and functional domains of proteins, and the second layer SVM was utilized\nto process the outputs of the first layer. Finally, the result is determined by the Random Forest of the third layer.We compared the\neffectiveness of the combination of block composition, functional domain composition, and pseudoamino acid composition of the\nmodel. In the 11 kinds of functional protein families, QuaBingo is 23% ofMatthews Correlation Coefficient (MCC) higher than the\nexisting prediction system. The results also revealed the biological characterization of the top five block compositions. Conclusions.\nQuaBingo provides better predictive ability for predicting the quaternary structural attributes of proteins....
The recognition of splicing sites is a very important step in the eukaryotic DNA sequence\nanalysis. Many scholars are working hard to improve the accuracy of identification.\nOur team carried out research on this issue based on support vector machine,\nwhich is one famous algorithm in data mining. The training and testing data is\nfrom the HS3D dataset, and excellent accuracy rate is achieved by nucleic acid sequence\northogonal coding and RBF core function, and the cross validation experiment\nhints that base pattern information is mainly located within 20 nucleotides upstream\nand downstream splice sites....
Loading....