Inventi Impact: Bioinformatics

Journal Scope:

Quarterly published in print and online "Inventi Impact: Bioinformatics" publishes high quality unpublished as well as high impact pre-published research and reviews catering to the needs of researchers and professionals from IT as well as of life sciences domains. It focuses on storing, retrieving, organizing and analyzing the biological data, and on new developments in genome bioinformatics and computational biology.

Articles

Inventi:ebi/80070/24

123VCF: An Intuitive and Efficient Tool for Filtering VCF Files

Milad Eidi, Samaneh Abdolalizadeh, Soheila Moeini, Masoud Garshasbi, Javad Zahiri

>Research Download Full Text

Background: The advent of Next-Generation Sequencing (NGS) has catalyzed a paradigm shift in medical genetics, enabling the identification of disease-associated variants. However, the vast quantum of data produced by NGS necessitates a robust and dependable mechanism for filtering irrelevant variants. Annotation-based variant filtering, a pivotal step in this process, demands a profound understanding of the casespecific conditions and the relevant annotation instruments. To tackle this complex task, we sought to design an accessible, efficient and more importantly easy to understand variant filtering tool. Results: Our efforts culminated in the creation of 123VCF, a tool capable of processing both compressed and uncompressed Variant Calling Format (VCF) files. Built on a Java framework, the tool employs a disk-streaming real-time filtering algorithm, allowing it to manage sizable variant files on conventional desktop computers. 123VCF filters input variants in accordance with a predefined filter sequence applied to the input variants. Users are provided the flexibility to define various filtering parameters, such as quality, coverage depth, and variant frequency within the populations. Additionally, 123VCF accommodates user-defined filters tailored to specific case requirements, affording users enhanced control over the filtering process. We evaluated the performance of 123VCF by analyzing different types of variant files and comparing its runtimes to the most similar algorithms like BCFtools filter and GATK VariantFiltration. The results indicated that 123VCF performs relatively well. The tool’s intuitive interface and potential for reproducibility make it a valuable asset for both researchers and clinicians. Conclusion: The 123VCF filtering tool provides an effective, dependable approach for filtering variants in both research and clinical settings. As an open-source tool available at https:// proje ct123 vcf. sourc eforge. io, it is accessible to the global scientific and clinical community, paving the way for the discovery of disease-causing variants and facilitating the advancement of personalized medicine....
Read More

Inventi:ebi/14136/14

A Comparative Study of the SVM and k-NN Machine Learning Algorithms for the Diagnosis of\nRespiratory Pathologies using Pulmonary Acoustic Signals

Rajkumar Palaniappan, Kenneth Sundaraj, Sebastian Sundaraj

>Research Download Full Text

Background: Pulmonary acoustic parameters extracted from recorded respiratory sounds provide valuable information\nfor the detection of respiratory pathologies. The automated analysis of pulmonary acoustic signals can serve as a\ndifferential diagnosis tool for medical professionals, a learning tool for medical students, and a self-management tool for\npatients. In this context, we intend to evaluate and compare the performance of the support vector machine (SVM) and\nK-nearest neighbour (K-nn) classifiers in diagnosis respiratory pathologies using respiratory sounds from R.A.L.E database.\nResults: The pulmonary acoustic signals used in this study were obtained from the R.A.L.E lung sound database. The\npulmonary acoustic signals were manually categorised into three different groups, namely normal, airway obstruction\npathology, and parenchymal pathology. The mel-frequency cepstral coefficient (MFCC) features were extracted from the\npre-processed pulmonary acoustic signals. The MFCC features were analysed by one-way ANOVA and then fed separately\ninto the SVM and K-nn classifiers. The performances of the classifiers were analysed using the confusion matrix technique.\nThe statistical analysis of the MFCC features using one-way ANOVA showed that the extracted MFCC features are\nsignificantly different (p < 0.001). The classification accuracies of the SVM and K-nn classifiers were found to be 92.19%\nand 98.26%, respectively.\nConclusion: Although the data used to train and test the classifiers are limited, the classification accuracies found are\nsatisfactory. The K-nn classifier was better than the SVM classifier for the discrimination of pulmonary acoustic signals\nfrom pathological and normal subjects obtained from the RALE database....
Read More

Inventi:ebi/17443/15

A Composite Genome Approach to Identify Phylogenetically Informative Data from Next-generation\nSequencing

Rachel S Schwartz, Kelly M Harkins, Anne C Stone, Reed A Cartwright

>Research Download Full Text

Background: Improvements in sequencing technology now allow easy acquisition of large datasets; however,\nanalyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain\nhomologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a\nreference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly,\nmultiple genome alignment, and annotation.\nResults: For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic\nsignal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate\nphylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of\nplacental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using\ndatasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent\nwith the major hypotheses for the relationships among mammals, all of which have been supported previously by\ndifferent molecular datasets.\nConclusions: SISRS has the potential to transform phylogenetic research. This method eliminates the need for\nexpensive marker development in many studies by using whole genome shotgun sequence data directly. SISRS is\nopen source and freely available at https://github.com/rachelss/SISRS/releases....
Read More

Inventi:ebi/81399/24

A Computational Approach to Demonstrate the Control of Gene Expression via Chromosomal Access in Colorectal Cancer

Caleb J Pecka, Ishwor Thapa, Amar B Singh, Dhundy Bastola

>Research Download Full Text

Background: Improved technologies for chromatin accessibility sequencing such as ATACseq have increased our understanding of gene regulation mechanisms, particularly in disease conditions such as cancer. Methods: This study introduces a computational tool that quantifies and establishes connections between chromatin accessibility, transcription factor binding, transcription factor mutations, and gene expression using publicly available colorectal cancer data. The tool has been packaged using a workflow management system to allow biologists and researchers to reproduce the results of this study. Results: We present compelling evidence linking chromatin accessibility to gene expression, with particular emphasis on SNP mutations and the accessibility of transcription factor genes. Furthermore, we have identified significant upregulation of key transcription factor interactions in colon cancer patients, including the apoptotic regulation facilitated by E2F1, MYC, and MYCN, as well as activation of the BCL-2 protein family facilitated by TP73. Conclusion: This study demonstrates the effectiveness of the computational tool in linking chromatin accessibility to gene expression and highlights significant transcription factor interactions in colorectal cancer. The code for this project is openly available on GitHub....
Read More

Inventi:ebi/17951/15

A Decision Support System Based on Multi-agent Technology for Gene Expression Analysis

Edna Marquez, Jesus Savage, Jaime Berumen, Christian Lemaitre, Ana Lilia Laureano-Cruces, Ana Espinosa, Ron Leder, Alfredo Weitzenfeld

>Research Download Full Text

The genetic microarrays give to researchers a huge amount of data of many diseases represented\nby intensities of gene expression. In genomic medicine gene expression analysis is guided to find\nstrategies for prevention and treatment of diseases with high rate of mortality like the different\ncancers. So, genomic medicine requires the use of complex information technology. The purpose\nof our paper is to present a multi-agent system developed in order to improve gene expression\nanalysis with the automation of tasks about identification of genes involved in a cancer, and classification\nof tumors according to molecular biology. Agents that integrate the system, carry out\nreading files of intensity data of genes from microarrays, pre-processing of this information, and\nwith machine learning methods make groups of genes involved in the process of a disease as well\nas the classification of samples that could propose new subtypes of tumors difficult to identify\nbased on their morphology. Our results we prove that the multi-agent system requires a minimal\nintervention of user, and the agents generate knowledge that reduce the time and complexity of\nthe work of prevention and diagnosis, and thus allow a more effective treatment of tumors....
Read More

Inventi:ebi/15139/15

A Fault-tolerant Method for HLA Typing with PacBio Data

Chia-Jung Chang, Pei-Lung Chen, Wei-Shiung Yang, Kun-Mao Chao

>Research Download Full Text

Background: Human leukocyte antigen (HLA) genes are critical genes involved in important bio medical aspects,\nincluding organ transplantation, autoimmune diseases and infectious diseases. The gene family contains the most\npolymorphic genes in humans and the difference between two alleles is only a single base pair substitution in many\ncases. The next generation sequencing (NGS) technologies could be used for high throughput HLA typing but in silico\nmethods are still needed to correctly assign the alleles of a sample. Computer scientists have developed such\nmethods for various NGS platforms, such as Illumina, Roche 454 and Ion Torrent, based on the characteristics of the\nreads they generate. However, the method for PacBio reads was less addressed, probably owing to its high error rates.\nThe PacBio system has the longest read length among available NGS platforms, and therefore is the only platform\ncapable of having exon 2 and exon 3 of HLA genes on the same read to unequivocally solve the ambiguity problem\ncaused by the Ã¢â?¬Å?phasingÃ¢â?¬Â issue.\nResults: We proposed a new method Bayes Typing1 to assign HLA alleles for Pac Bio circular consensus sequencing\nreads using BayesÃ¢â?¬â?¢ theorem. The method was applied to simulated data of the three loci HLA-A, HLA-B and HLA-DRB1.\nThe experimental results showed its capability to tolerate the disturbance of sequencing errors and external noise\nreads.\nConclusions: The Bayes Typing1 method could overcome the problems of HLA typing using PacBio reads, which\nmostly arise from sequencing errors of Pac Bio reads and the divergence of HLA genes, to some extent....
Read More

Inventi:ebi/11/12

A GROWTH CURVE MODEL WITH FRACTIONAL POLYNOMIALS FOR ANALYSING INCOMPLETE TIME-COURSE DATA INMICROARRAY GENE EXPRESSION STUDIES

Qihua Tan, Mads Thomassen, Jacob v B Hjelmborg, Anders Clemmensen, Klaus Ejner Andersen, Thomas K Petersen, Matthew McGue, Kaare Christensen, Torben A Kruse

>Research Download Full Text

Identifying the various gene expression response patterns is a challenging issue in expression microarray time-course experiments. Due to heterogeneity in the regulatory reaction among thousands of genes tested, it is impossible to manually characterize a parametric form for each of the time-course pattern in a gene by gene manner. We introduce a growth curve model with fractional polynomials to automatically capture the various time-dependent expression patterns and meanwhile efficiently handle missing values due to incomplete observations. For each gene, our procedure compares the performances among fractional polynomial models with power terms from a set of fixed values that offer a wide range of curve shapes and suggests a best fitting model. After a limited simulation study, the model has been applied to our human in vivo irritated epidermis data with missing observations to investigate time-dependent transcriptional responses to a chemical irritant. Our method was able to identify the various nonlinear time-course expression trajectories. The integration of growth curves with fractional polynomials provides a flexible way to model different time-course patterns together with model selection and significant gene identification strategies that can be applied in microarray-based time-course gene expression experiments with missing observations....
Read More

Inventi:ebi/25252/18

A Hierarchical Multi-Label Classification Algorithm for Gene Function Prediction

Shou Feng, Ping Fu, Wenbin Zheng

>Research Download Full Text

Gene function prediction is a complicated and challenging hierarchical multi-label classification\n(HMC) task, in which genes may have many functions at the same time and these functions are organized\nin a hierarchy. This paper proposed a novel HMC algorithm for solving this problem based on the\nGene Ontology (GO), the hierarchy of which is a directed acyclic graph (DAG) and is more difficult\nto tackle. In the proposed algorithm, the HMC task is firstly changed into a set of binary classification\ntasks. Then, two measures are implemented in the algorithm to enhance the HMC performance by\nconsidering the hierarchy structure during the learning procedures. Firstly, negative instances selecting\npolicy associated with the SMOTE approach are proposed to alleviate the imbalanced data set problem.\nSecondly, a nodes interaction method is introduced to combine the results of binary classifiers. It can\nguarantee that the predictions are consistent with the hierarchy constraint. The experiments on eight\nbenchmark yeast data sets annotated by the Gene Ontology show the promising performance of the\nproposed algorithm compared with other state-of-the-art algorithms...
Read More

Inventi:ebi/16658/15

A Highly Conserved GEQYQQLR Epitope has been Identified in the Nucleoprotein of Ebola Virus by\nUsing an In silico Approach

Mohammad Tuhin Ali, Md Ohedul Islam

>Research Download Full Text

Ebola virus (EBOV) is a deadly virus that has caused several fatal outbreaks. Recently it caused another outbreak and resulted in\nthousands afflicted cases. Effective and approved vaccine or therapeutic treatment against this virus is still absent. In this study,\nwe aimed to predict B-cell epitopes from several EBOV encoded proteins which may aid in developing new antibody-based\ntherapeutics or viral antigen detection method against this virus. Multiple sequence alignment (MSA) was performed for the\nidentification of conserved region among glycoprotein (GP), nucleoprotein (NP), and viral structural proteins (VP40, VP35, and\nVP24) of EBOV. Next, different consensus immunogenic and conserved sites were predicted from the conserved region(s) using\nvarious computational tools which are available in Immune Epitope Database (IEDB). Among GP, NP, VP40, VP35, and VP30\nprotein, only NP gave a 100% conserved GEQYQQLR B-cell epitope that fulfills the ideal features of an effective B-cell epitope and\ncould lead a way in themilieu of Ebola treatment. However, successful in vivo and in vitro studies are prerequisite to determine the\nactual potency of our predicted epitope and establishing it as a preventing medication against all the fatal strains of EBOV....
Read More

Inventi:ebi/15719/15

A Hybrid Method for Endocardial Contour Extraction of Right Ventricle in 4-Slices from 3D\nEchocardiography Dataset

Faten A Dawood, Rahmita W Rahmat, Suhaini B Kadiman, Lili N Abdullah, Mohd D Zamrin

>Research Download Full Text

This paper presents a hybrid method to extract endocardial contour of the right ventricular (RV) in 4-slices from 3D\nechocardiography dataset. The overall framework comprises four processing phases. In Phase I, the region of interest (ROI) is\nidentified by estimating the cavity boundary. Speckle noise reduction and contrast enhancement were implemented in Phase II as\npreprocessing tasks. In Phase III, the RV cavity region was segmented by generating intensity threshold which was used for once\nfor all frames. Finally, Phase IV is proposed to extract the RV endocardial contour in a complete cardiac cycle using a combination\nof shape-based contour detection and improved radial search algorithm. The proposed method was applied to 16 datasets of 3D\nechocardiography encompassing the RV in long-axis view. The accuracy of experimental results obtained by the proposed method\nwas evaluated qualitatively and quantitatively. It has been done by comparing the segmentation results of RV cavity based on\nendocardial contour extraction with the ground truth.The comparative analysis results show that the proposed method performs\nefficiently in all datasets with overall performance of 95% and the root mean square distances (RMSD) measure in terms of mean\nÃ?Â± SD was found to be 2.21 Ã?Â± 0.35mm for RV endocardial contours....
Read More

Call Us: +4 (800) 888-0008

Inventi Impact: Bioinformatics

Journal Scope:

Articles

Inventi:ebi/80070/24

123VCF: An Intuitive and Efficient Tool for Filtering VCF Files

Inventi:ebi/14136/14

A Comparative Study of the SVM and k-NN Machine Learning Algorithms for the Diagnosis of\nRespiratory Pathologies using Pulmonary Acoustic Signals

Inventi:ebi/17443/15

A Composite Genome Approach to Identify Phylogenetically Informative Data from Next-generation\nSequencing

Inventi:ebi/81399/24

A Computational Approach to Demonstrate the Control of Gene Expression via Chromosomal Access in Colorectal Cancer

Inventi:ebi/17951/15

A Decision Support System Based on Multi-agent Technology for Gene Expression Analysis

Inventi:ebi/15139/15

A Fault-tolerant Method for HLA Typing with PacBio Data

Inventi:ebi/11/12

A GROWTH CURVE MODEL WITH FRACTIONAL POLYNOMIALS FOR ANALYSING INCOMPLETE TIME-COURSE DATA INMICROARRAY GENE EXPRESSION STUDIES

Inventi:ebi/25252/18

A Hierarchical Multi-Label Classification Algorithm for Gene Function Prediction

Inventi:ebi/16658/15

A Highly Conserved GEQYQQLR Epitope has been Identified in the Nucleoprotein of Ebola Virus by\nUsing an In silico Approach

Inventi:ebi/15719/15

A Hybrid Method for Endocardial Contour Extraction of Right Ventricle in 4-Slices from 3D\nEchocardiography Dataset

Links

Contact Us