Frequency: Quarterly E- ISSN: 2250-2920 P- ISSN: Awaited Abstracted/ Indexed in: Ulrich's International Periodical Directory, Google Scholar, SCIRUS, getCITED, Genamics JournalSeek, EBSCO Information Services
Quarterly published in print and online "Inventi Impact: Bioinformatics" publishes high quality unpublished as well as high impact pre-published research and reviews catering to the needs of researchers and professionals from IT as well as of life sciences domains. It focuses on storing, retrieving, organizing and analyzing the biological data, and on new developments in genome bioinformatics and computational biology.
Background: The advent of Next-Generation Sequencing (NGS) has catalyzed a paradigm shift in medical genetics, enabling the identification of disease-associated variants. However, the vast quantum of data produced by NGS necessitates a robust and dependable mechanism for filtering irrelevant variants. Annotation-based variant filtering, a pivotal step in this process, demands a profound understanding of the casespecific conditions and the relevant annotation instruments. To tackle this complex task, we sought to design an accessible, efficient and more importantly easy to understand variant filtering tool. Results: Our efforts culminated in the creation of 123VCF, a tool capable of processing both compressed and uncompressed Variant Calling Format (VCF) files. Built on a Java framework, the tool employs a disk-streaming real-time filtering algorithm, allowing it to manage sizable variant files on conventional desktop computers. 123VCF filters input variants in accordance with a predefined filter sequence applied to the input variants. Users are provided the flexibility to define various filtering parameters, such as quality, coverage depth, and variant frequency within the populations. Additionally, 123VCF accommodates user-defined filters tailored to specific case requirements, affording users enhanced control over the filtering process. We evaluated the performance of 123VCF by analyzing different types of variant files and comparing its runtimes to the most similar algorithms like BCFtools filter and GATK VariantFiltration. The results indicated that 123VCF performs relatively well. The tool’s intuitive interface and potential for reproducibility make it a valuable asset for both researchers and clinicians. Conclusion: The 123VCF filtering tool provides an effective, dependable approach for filtering variants in both research and clinical settings. As an open-source tool available at https:// proje ct123 vcf. sourc eforge. io, it is accessible to the global scientific and clinical community, paving the way for the discovery of disease-causing variants and facilitating the advancement of personalized medicine....
Background: Pulmonary acoustic parameters extracted from recorded respiratory sounds provide valuable information\nfor the detection of respiratory pathologies. The automated analysis of pulmonary acoustic signals can serve as a\ndifferential diagnosis tool for medical professionals, a learning tool for medical students, and a self-management tool for\npatients. In this context, we intend to evaluate and compare the performance of the support vector machine (SVM) and\nK-nearest neighbour (K-nn) classifiers in diagnosis respiratory pathologies using respiratory sounds from R.A.L.E database.\nResults: The pulmonary acoustic signals used in this study were obtained from the R.A.L.E lung sound database. The\npulmonary acoustic signals were manually categorised into three different groups, namely normal, airway obstruction\npathology, and parenchymal pathology. The mel-frequency cepstral coefficient (MFCC) features were extracted from the\npre-processed pulmonary acoustic signals. The MFCC features were analysed by one-way ANOVA and then fed separately\ninto the SVM and K-nn classifiers. The performances of the classifiers were analysed using the confusion matrix technique.\nThe statistical analysis of the MFCC features using one-way ANOVA showed that the extracted MFCC features are\nsignificantly different (p < 0.001). The classification accuracies of the SVM and K-nn classifiers were found to be 92.19%\nand 98.26%, respectively.\nConclusion: Although the data used to train and test the classifiers are limited, the classification accuracies found are\nsatisfactory. The K-nn classifier was better than the SVM classifier for the discrimination of pulmonary acoustic signals\nfrom pathological and normal subjects obtained from the RALE database....
Background: Improvements in sequencing technology now allow easy acquisition of large datasets; however,\nanalyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain\nhomologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a\nreference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly,\nmultiple genome alignment, and annotation.\nResults: For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic\nsignal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate\nphylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of\nplacental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using\ndatasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent\nwith the major hypotheses for the relationships among mammals, all of which have been supported previously by\ndifferent molecular datasets.\nConclusions: SISRS has the potential to transform phylogenetic research. This method eliminates the need for\nexpensive marker development in many studies by using whole genome shotgun sequence data directly. SISRS is\nopen source and freely available at https://github.com/rachelss/SISRS/releases....
The genetic microarrays give to researchers a huge amount of data of many diseases represented\nby intensities of gene expression. In genomic medicine gene expression analysis is guided to find\nstrategies for prevention and treatment of diseases with high rate of mortality like the different\ncancers. So, genomic medicine requires the use of complex information technology. The purpose\nof our paper is to present a multi-agent system developed in order to improve gene expression\nanalysis with the automation of tasks about identification of genes involved in a cancer, and classification\nof tumors according to molecular biology. Agents that integrate the system, carry out\nreading files of intensity data of genes from microarrays, pre-processing of this information, and\nwith machine learning methods make groups of genes involved in the process of a disease as well\nas the classification of samples that could propose new subtypes of tumors difficult to identify\nbased on their morphology. Our results we prove that the multi-agent system requires a minimal\nintervention of user, and the agents generate knowledge that reduce the time and complexity of\nthe work of prevention and diagnosis, and thus allow a more effective treatment of tumors....
Background: Human leukocyte antigen (HLA) genes are critical genes involved in important bio medical aspects,\nincluding organ transplantation, autoimmune diseases and infectious diseases. The gene family contains the most\npolymorphic genes in humans and the difference between two alleles is only a single base pair substitution in many\ncases. The next generation sequencing (NGS) technologies could be used for high throughput HLA typing but in silico\nmethods are still needed to correctly assign the alleles of a sample. Computer scientists have developed such\nmethods for various NGS platforms, such as Illumina, Roche 454 and Ion Torrent, based on the characteristics of the\nreads they generate. However, the method for PacBio reads was less addressed, probably owing to its high error rates.\nThe PacBio system has the longest read length among available NGS platforms, and therefore is the only platform\ncapable of having exon 2 and exon 3 of HLA genes on the same read to unequivocally solve the ambiguity problem\ncaused by the ââ?¬Å?phasingââ?¬Â issue.\nResults: We proposed a new method Bayes Typing1 to assign HLA alleles for Pac Bio circular consensus sequencing\nreads using Bayesââ?¬â?¢ theorem. The method was applied to simulated data of the three loci HLA-A, HLA-B and HLA-DRB1.\nThe experimental results showed its capability to tolerate the disturbance of sequencing errors and external noise\nreads.\nConclusions: The Bayes Typing1 method could overcome the problems of HLA typing using PacBio reads, which\nmostly arise from sequencing errors of Pac Bio reads and the divergence of HLA genes, to some extent....
Identifying the various gene expression response patterns is a challenging issue in expression microarray time-course experiments. Due to heterogeneity in the regulatory reaction among thousands of genes tested, it is impossible to manually characterize a parametric form for each of the time-course pattern in a gene by gene manner. We introduce a growth curve model with fractional polynomials to automatically capture the various time-dependent expression patterns and meanwhile efficiently handle missing values due to incomplete observations. For each gene, our procedure compares the performances among fractional polynomial models with power terms from a set of fixed values that offer a wide range of curve shapes and suggests a best fitting model. After a limited simulation study, the model has been applied to our human in vivo irritated epidermis data with missing observations to investigate time-dependent transcriptional responses to a chemical irritant. Our method was able to identify the various nonlinear time-course expression trajectories. The integration of growth curves with fractional polynomials provides a flexible way to model different time-course patterns together with model selection and significant gene identification strategies that can be applied in microarray-based time-course gene expression experiments with missing observations....
Gene function prediction is a complicated and challenging hierarchical multi-label classification\n(HMC) task, in which genes may have many functions at the same time and these functions are organized\nin a hierarchy. This paper proposed a novel HMC algorithm for solving this problem based on the\nGene Ontology (GO), the hierarchy of which is a directed acyclic graph (DAG) and is more difficult\nto tackle. In the proposed algorithm, the HMC task is firstly changed into a set of binary classification\ntasks. Then, two measures are implemented in the algorithm to enhance the HMC performance by\nconsidering the hierarchy structure during the learning procedures. Firstly, negative instances selecting\npolicy associated with the SMOTE approach are proposed to alleviate the imbalanced data set problem.\nSecondly, a nodes interaction method is introduced to combine the results of binary classifiers. It can\nguarantee that the predictions are consistent with the hierarchy constraint. The experiments on eight\nbenchmark yeast data sets annotated by the Gene Ontology show the promising performance of the\nproposed algorithm compared with other state-of-the-art algorithms...
Ebola virus (EBOV) is a deadly virus that has caused several fatal outbreaks. Recently it caused another outbreak and resulted in\nthousands afflicted cases. Effective and approved vaccine or therapeutic treatment against this virus is still absent. In this study,\nwe aimed to predict B-cell epitopes from several EBOV encoded proteins which may aid in developing new antibody-based\ntherapeutics or viral antigen detection method against this virus. Multiple sequence alignment (MSA) was performed for the\nidentification of conserved region among glycoprotein (GP), nucleoprotein (NP), and viral structural proteins (VP40, VP35, and\nVP24) of EBOV. Next, different consensus immunogenic and conserved sites were predicted from the conserved region(s) using\nvarious computational tools which are available in Immune Epitope Database (IEDB). Among GP, NP, VP40, VP35, and VP30\nprotein, only NP gave a 100% conserved GEQYQQLR B-cell epitope that fulfills the ideal features of an effective B-cell epitope and\ncould lead a way in themilieu of Ebola treatment. However, successful in vivo and in vitro studies are prerequisite to determine the\nactual potency of our predicted epitope and establishing it as a preventing medication against all the fatal strains of EBOV....
This paper presents a hybrid method to extract endocardial contour of the right ventricular (RV) in 4-slices from 3D\nechocardiography dataset. The overall framework comprises four processing phases. In Phase I, the region of interest (ROI) is\nidentified by estimating the cavity boundary. Speckle noise reduction and contrast enhancement were implemented in Phase II as\npreprocessing tasks. In Phase III, the RV cavity region was segmented by generating intensity threshold which was used for once\nfor all frames. Finally, Phase IV is proposed to extract the RV endocardial contour in a complete cardiac cycle using a combination\nof shape-based contour detection and improved radial search algorithm. The proposed method was applied to 16 datasets of 3D\nechocardiography encompassing the RV in long-axis view. The accuracy of experimental results obtained by the proposed method\nwas evaluated qualitatively and quantitatively. It has been done by comparing the segmentation results of RV cavity based on\nendocardial contour extraction with the ground truth.The comparative analysis results show that the proposed method performs\nefficiently in all datasets with overall performance of 95% and the root mean square distances (RMSD) measure in terms of mean\n�± SD was found to be 2.21 �± 0.35mm for RV endocardial contours....
Background: Drug-target interaction prediction is of great significance for narrowing\ndown the scope of candidate medications, and thus is a vital step in drug discovery.\nBecause of the particularity of biochemical experiments, the development of new\ndrugs is not only costly, but also time-consuming. Therefore, the computational\nprediction of drug target interactions has become an essential way in the process of\ndrug discovery, aiming to greatly reducing the experimental cost and time....
Loading....