Current Issue : July - September Volume : 2013 Issue Number : 3 Articles : 6 Articles
Background: Illumina sequencing platform is widely used in genome research. Sequence reads quality assessment\r\nand control are needed for downstream analysis. However, software that provides efficient quality assessment and\r\nversatile filtration methods is still lacking.\r\nResults: We have developed a toolkit named HTQC ââ?¬â?? abbreviation of High-Throughput Quality Control ââ?¬â?? for\r\nsequence reads quality control, which consists of six programs for reads quality assessment, reads filtration and\r\ngeneration of graphic reports.\r\nConclusions: The HTQC toolkit can generate reads quality assessment faster than existing tools, providing guidance\r\nfor reads filtration utilities that allow users to choose different strategies to remove low quality reads....
Background: Constraint-based modeling uses mass balances, flux capacity, and reaction directionality constraints\r\nto predict fluxes through metabolism. Although transcriptional regulation and thermodynamic constraints have\r\nbeen integrated into constraint-based modeling, kinetic rate laws have not been extensively used.\r\nResults: In this study, an in vivo kinetic parameter estimation problem was formulated and solved using multi-omic\r\ndata sets for Escherichia coli. To narrow the confidence intervals for kinetic parameters, a series of kinetic model\r\nsimplifications were made, resulting in fewer kinetic parameters than the full kinetic model. These new parameter\r\nvalues are able to account for flux and concentration data from 20 different experimental conditions used in our\r\ntraining dataset. Concentration estimates from the simplified kinetic model were within one standard deviation for\r\n92.7% of the 790 experimental measurements in the training set. Gibbs free energy changes of reaction were\r\ncalculated to identify reactions that were often operating close to or far from equilibrium. In addition, enzymes\r\nwhose activities were positively or negatively influenced by metabolite concentrations were also identified. The\r\nkinetic model was then used to calculate the maximum and minimum possible flux values for individual reactions\r\nfrom independent metabolite and enzyme concentration data that were not used to estimate parameter values.\r\nIncorporating these kinetically-derived flux limits into the constraint-based metabolic model improved predictions\r\nfor uptake and secretion rates and intracellular fluxes in constraint-based models of central metabolism.\r\nConclusions: This study has produced a method for in vivo kinetic parameter estimation and identified strategies\r\nand outcomes of kinetic model simplification. We also have illustrated how kinetic constraints can be used to\r\nimprove constraint-based model predictions for intracellular fluxes and biomass yield and identify potential\r\nmetabolic limitations through the integrated analysis of multi-omics datasets....
Background: Traditional methods for computational motif discovery often suffer from poor performance. In\r\nparticular, methods that search for sequence matches to known binding motifs tend to predict many\r\nnon-functional binding sites because they fail to take into consideration the biological state of the cell. In recent\r\nyears, genome-wide studies have generated a lot of data that has the potential to improve our ability to identify\r\nfunctional motifs and binding sites, such as information about chromatin accessibility and epigenetic states in\r\ndifferent cell types. However, it is not always trivial to make use of this data in combination with existing motif\r\ndiscovery tools, especially for researchers who are not skilled in bioinformatics programming.\r\nResults: Here we present MotifLab, a general workbench for analysing regulatory sequence regions and\r\ndiscovering transcription factor binding sites and cis-regulatory modules. MotifLab supports comprehensive motif\r\ndiscovery and analysis by allowing users to integrate several popular motif discovery tools as well as different kinds\r\nof additional information, including phylogenetic conservation, epigenetic marks, DNase hypersensitive sites,\r\nChIP-Seq data, positional binding preferences of transcription factors, transcription factor interactions and gene\r\nexpression. MotifLab offers several data-processing operations that can be used to create, manipulate and analyse\r\ndata objects, and complete analysis workflows can be constructed and automatically executed within MotifLab,\r\nincluding graphical presentation of the results.\r\nConclusions: We have developed MotifLab as a flexible workbench for motif analysis in a genomic context. The\r\nflexibility and effectiveness of this workbench has been demonstrated on selected test cases, in particular two\r\npreviously published benchmark data sets for single motifs and modules, and a realistic example of genes\r\nresponding to treatment with forskolin. MotifLab is freely available at http://www.motiflab.org....
Background: The size of the protein sequence database has been exponentially increasing due to advances in\r\ngenome sequencing. However, experimentally characterized proteins only constitute a small portion of the\r\ndatabase, such that the majority of sequences have been annotated by computational approaches. Current\r\nautomatic annotation pipelines inevitably introduce errors, making the annotations unreliable. Instead of such\r\nerror-prone automatic annotations, functional interpretation should rely on annotations of ââ?¬Ë?reference proteinsââ?¬â?¢ that\r\nhave been experimentally characterized or manually curated.\r\nResults: The Seq2Ref server uses BLAST to detect proteins homologous to a query sequence and identifies the\r\nreference proteins among them. Seq2Ref then reports publications with experimental characterizations of the\r\nidentified reference proteins that might be relevant to the query. Furthermore, a plurality-based rating system is\r\ndeveloped to evaluate the homologous relationships and rank the reference proteins by their relevance to the query.\r\nConclusions: The reference proteins detected by our server will lend insight into proteins of unknown function and\r\nprovide extensive information to develop in-depth understanding of uncharacterized proteins. Seq2Ref is available at:\r\nhttp://prodata.swmed.edu/seq2ref....
Background: The Sequence Read Archive (SRA) is the largest public repository of sequencing data from the next\r\ngeneration of sequencing platforms including Illumina (Genome Analyzer, HiSeq, MiSeq, .etc), Roche 454 GS System,\r\nApplied Biosystems SOLiD System, Helicos Heliscope, PacBio RS, and others.\r\nResults: SRAdb is an attempt to make queries of the metadata associated with SRA submission, study, sample,\r\nexperiment and run more robust and precise, and make access to sequencing data in the SRA easier. We have parsed\r\nall the SRA metadata into a SQLite database that is routinely updated and can be easily distributed. The SRAdb\r\nR/Bioconductor package then utilizes this SQLite database for querying and accessing metadata. Full text search\r\nfunctionality makes querying metadata very flexible and powerful. Fastq files associated with query results can be\r\ndownloaded easily for local analysis. The package also includes an interface from R to a popular genome browser, the\r\nIntegrated Genomics Viewer.\r\nConclusions: SRAdb Bioconductor package provides a convenient and integrated framework to query and access\r\nSRA metadata quickly and powerfully from within R....
Background: The digitization of biodiversity data is leading to the widespread application of taxon names that are\r\nsuperfluous, ambiguous or incorrect, resulting in mismatched records and inflated species numbers. The ultimate\r\nconsequences of misspelled names and bad taxonomy are erroneous scientific conclusions and faulty policy\r\ndecisions. The lack of tools for correcting this ââ?¬Ë?names problemââ?¬â?¢ has become a fundamental obstacle to integrating\r\ndisparate data sources and advancing the progress of biodiversity science.\r\nResults: The TNRS, or Taxonomic Name Resolution Service, is an online application for automated and\r\nuser-supervised standardization of plant scientific names. The TNRS builds upon and extends existing open-source\r\napplications for name parsing and fuzzy matching. Names are standardized against multiple reference taxonomies,\r\nincluding the Missouri Botanical Garden''s Tropicos database. Capable of processing thousands of names in a single\r\noperation, the TNRS parses and corrects misspelled names and authorities, standardizes variant spellings, and\r\nconverts nomenclatural synonyms to accepted names. Family names can be included to increase match accuracy\r\nand resolve many types of homonyms. Partial matching of higher taxa combined with extraction of annotations,\r\naccession numbers and morphospecies allows the TNRS to standardize taxonomy across a broad range of active\r\nand legacy datasets.\r\nConclusions: We show how the TNRS can resolve many forms of taxonomic semantic heterogeneity, correct\r\nspelling errors and eliminate spurious names. As a result, the TNRS can aid the integration of disparate biological\r\ndatasets. Although the TNRS was developed to aid in standardizing plant names, its underlying algorithms and\r\ndesign can be extended to all organisms and nomenclatural codes. The TNRS is accessible via a web interface at\r\nhttp://tnrs.iplantcollaborative.org/ and as a RESTful web service and application programming interface. Source\r\ncode is available at https://github.com/iPlantCollaborativeOpenSource/TNRS/....
Loading....