Current Issue : January-March Volume : 2024 Issue Number : 1 Articles : 5 Articles
Background: Accurate case report data are essential to understand arbovirus dynamics, including spread and evolution of arboviruses such as Zika, dengue and chikungunya viruses. Giving the multi-country nature of arbovirus epidemics in the Americas, these data are not often accessible or are reported at different time scales (weekly, monthly) from different sources. Results: We developed a publicly available and user-friendly database for arboviral case data in the Americas: ARCA. ARCA is a relational database that is hosted on the ARCA website. Users can interact with the database through the website by submitting queries through the website, which generates displays results and allows users to download these results in different, convenient file formats. Users can choose to view arboviral case data through a table which containscontaining the number of cases for a particular week, a plot, or through a map. Conclusion: Our ARCA database is a useful tool for arboviral epidemiology research allowing for complex queries, data visualization, integration, and formatting....
Background: With the continuous advances in third-generation sequencing technology and the increasing affordability of next-generation sequencing technology, sequencing data from different sequencing technology platforms is becoming more common. While numerous benchmarking studies have been conducted to compare variant-calling performance across different platforms and approaches, little attention has been paid to the potential of leveraging the strengths of different platforms to optimize overall performance, especially integrating Oxford Nanopore and Illumina sequencing data. Results: We investigated the impact of multi-platform data on the performance of variant calling through carefully designed experiments with a deep learningbased variant caller named Clair3-MP (Multi-Platform). Through our research, we not only demonstrated the capability of ONT-Illumina data for improved variant calling, but also identified the optimal scenarios for utilizing ONT-Illumina data. In addition, we revealed that the improvement in variant calling using ONT-Illumina data comes from an improvement in difficult genomic regions, such as the large low-complexity regions and segmental and collapse duplication regions. Moreover, Clair3-MP can incorporate reference genome stratification information to achieve a small but measurable improvement in variant calling. Clair3-MP is accessible as an open-source project at: https:// github. com/ HKU- BAL/ Clair3- MP. Conclusions: These insights have important implications for researchers and practitioners alike, providing valuable guidance for improving the reliability and efficiency of genomic analysis in diverse applications....
Background: In cancer, genomic rearrangements can create fusion genes that either combine protein-coding sequences from two different partner genes or place one gene under the control of the promoter of another gene. These fusion genes can act as oncogenic drivers in tumor development and several fusions involving kinases have been successfully exploited as drug targets. Expressed fusions can be identified in RNA sequencing (RNA-Seq) data, but fusion prediction software often has a high fraction of false positive fusion transcript predictions. This is problematic for both research and clinical applications. Results: We describe a method for validation of fusion transcripts detected by RNASeq in matched whole-genome sequencing (WGS) data. Our pipeline uses discordant read pairs to identify supported fusion events and analyzes soft-clipped read alignments to determine genomic breakpoints. We have tested it on matched RNA-Seq and WGS data for both tumors and cancer cell lines and show that it can be used to validate both new predicted gene fusions and experimentally validated fusion events. It was considerably faster and more sensitive than using BreakDancer and Manta, software that is instead designed to detect many different types of structural variants on a genome-wide scale. Conclusions: We have developed a fast and very sensitive pipeline for validation of gene fusions detected by RNA-Seq in matched WGS data. It can be used to identify high-quality gene fusions for further bioinformatic and experimental studies, including validation of genomic breakpoints and studies of the mechanisms that generate fusions. In a clinical setting, it could help find expressed gene fusions for personalized therapy....
Background: Visualizing genome coverage is of vital importance to inspect and interpret various next-generation sequencing (NGS) data. Besides genome coverage, genome annotations are also crucial in the visualization. While different NGS data require different annotations, how to visualize genome coverage and add the annotations appropriately and conveniently is challenging. Many tools have been developed to address this issue. However, existing tools are often inflexible, complicated, lack necessary preprocessing steps and annotations, and the figures generated support limited customization. Results: Here, we introduce ggcoverage, an R package to visualize and annotate genome coverage of multi-groups and multi-omics. The input files for ggcoverage can be in BAM, BigWig, BedGraph and TSV formats. For better usability, ggcoverage provides reliable and efficient ways to perform read normalization, consensus peaks generation and track data loading with state-of-the-art tools. ggcoverage provides various available annotations to adapt to different NGS data (e.g. WGS/WES, RNA-seq, ChIP-seq) and all the available annotations can be easily superimposed with ‘ + ’. ggcoverage can generate publication-quality plots and users can customize the plots with ggplot2. In addition, ggcoverage supports the visualization and annotation of protein coverage. Conclusions: ggcoverage provides a flexible, programmable, efficient and user-friendly way to visualize and annotate genome coverage of multi-groups and multi-omics. The ggcoverage package is available at https:// github. com/ showt eeth/ ggcov erage under the MIT license, and the vignettes are available at https:// showt eeth. github. io/ ggcov erage/....
Background: The central biological clock governs numerous facets of mammalian physiology, including sleep, metabolism, and immune system regulation. Understanding gene regulatory relationships is crucial for unravelling the mechanisms that underlie various cellular biological processes. While it is possible to infer circadian gene regulatory relationships from time-series gene expression data, relying solely on correlation-based inference may not provide sufficient information about causation. Moreover, gene expression data often have high dimensions but a limited number of observations, posing challenges in their analysis. Methods: In this paper, we introduce a new hybrid framework, referred to as Circadian Gene Regulatory Framework (CGRF), to infer circadian gene regulatory relationships from gene expression data of rats. The framework addresses the challenges of high-dimensional data by combining the fuzzy C-means clustering algorithm with dynamic time warping distance. Through this approach, we efficiently identify the clusters of genes related to the target gene. To determine the significance of genes within a specific cluster, we employ the Wilcoxon signed-rank test. Subsequently, we use a dynamic vector autoregressive method to analyze the selected significant gene expression profiles and reveal directed causal regulatory relationships based on partial correlation. Conclusion: The proposed CGRF framework offers a comprehensive and efficient solution for understanding circadian gene regulation. Circadian gene regulatory relationships are inferred from the gene expression data of rats based on the Aanat target gene. The results show that genes Pde10a, Atp7b, Prok2, Per1, Rhobtb3 and Dclk1 stand out, which have been known to be essential for the regulation of circadian activity. The potential relationships between genes Tspan15, Eprs, Eml5 and Fsbp with a circadian rhythm need further experimental research....
Loading....