Current Issue : April - June Volume : 2014 Issue Number : 2 Articles : 5 Articles
Background: Multicellular organisms consist of cells of many different types that are established during\r\ndevelopment. Each type of cell is characterized by the unique combination of expressed gene products as a result of\r\nspatiotemporal gene regulation. Currently, a fundamental challenge in regulatory biology is to elucidate the gene\r\nexpression controls that generate the complex body plans during development. Recent advances in high-throughput\r\nbiotechnologies have generated spatiotemporal expression patterns for thousands of genes in the model organism\r\nfruit fly Drosophila melanogaster. Existing qualitative methods enhanced by a quantitative analysis based on\r\ncomputational tools we present in this paper would provide promising ways for addressing key scientific questions.\r\nResults: We develop a set of computational methods and open source tools for identifying co-expressed embryonic\r\ndomains and the associated genes simultaneously. To map the expression patterns of many genes into the same\r\ncoordinate space and account for the embryonic shape variations, we develop a mesh generation method to deform\r\na meshed generic ellipse to each individual embryo. We then develop a co-clustering formulation to cluster the genes\r\nand the mesh elements, thereby identifying co-expressed embryonic domains and the associated genes\r\nsimultaneously. Experimental results indicate that the gene and mesh co-clusters can be correlated to key\r\ndevelopmental events during the stages of embryogenesis we study. The open source software tool has been made\r\navailable at http://compbio.cs.odu.edu/fly/.\r\nConclusions: Our mesh generation and machine learning methods and tools improve upon the flexibility,\r\nease-of-use and accuracy of existing methods...
Background: RNA-seq is now widely used to quantitatively assess gene expression, expression differences and\r\nisoform switching, and promises to deliver results for the entire transcriptome. However, whether the transcriptional\r\nstate of a gene can be captured accurately depends critically on library preparation, read alignment, expression\r\nestimation and the tests for differential expression and isoform switching. There are comparisons available for the\r\nindividual steps but there is not yet a systematic investigation which specific genes are impacted by biases\r\nthroughout the entire analysis workflow. It is especially unclear whether for a given gene, with current methods\r\nand protocols, expression changes and isoform switches can be detected.\r\nResults: For the human genes, we report their detectability under various conditions using different approaches.\r\nOverall, we find that the input material has the biggest influence and may, depending on the protocol and RNA\r\ndegradation, exhibit already strong length-dependent over- and underrepresentation of transcripts. The alignment\r\nstep aligns for 50% of the isoforms up to 99% of the reads correctly; only in the presence of transcript modifications\r\nmainly short isoforms will have a low alignment rate. In our dataset, we found that, depending on the aligner and\r\nthe input material used, the expression estimation of up to 93% of the genes being accurate within a factor of two;\r\nwith the deviations being due to ambiguous alignments. Detection of differential expression using a negativebinomial\r\ncount model works reliably for our simulated data but is dependent on the count accuracy. Interestingly,\r\nusing the fold-change instead of the p-value as a score for differential expression yields the same performance in\r\nthe situation of three replicates and the true change being two-fold. Isoform switching is harder to detect and for\r\nat least 109 genes the isoform differences evade detection independent of the method used.\r\nConclusions: RNA-seq is a reliable tool but the repetitive nature of the human genome makes the origin of the\r\nreads ambiguous and limits the detectability for certain genes. RNA-seq does not equally well represent isoforms\r\nindependent of their size which may range from ~200nt to ~100'000nt. Researchers are advised to verify that their\r\ntarget genes do not have extreme properties with respect to repeated regions, GC content, and isoform length and\r\ncomplexity....
Background: Coalescent simulation is pivotal for understanding population evolutionary models and demographic\r\nhistories, as well as for developing novel analytical methods for genetic association studies for DNA sequence data.\r\nA plethora of coalescent simulators are developed, but selecting the most appropriate program remains\r\nchallenging.\r\nResults: We extensively compared performances of five widely used coalescent simulators ââ?¬â?? Hudsonââ?¬â?¢s ms, msHOT,\r\nMaCS, Simcoal2, and fastsimcoal, to provide a practical guide considering three crucial factors, 1) speed, 2)\r\nscalability and 3) recombination hotspot position and intensity accuracy. Although ms represents a popular\r\nstandard coalescent simulator, it lacks the ability to simulate sequences with recombination hotspots. An extended\r\nprogram msHOT has compensated for the deficiency of ms by incorporating recombination hotspots and gene\r\nconversion events at arbitrarily chosen locations and intensities, but remains limited in simulating long stretches of\r\nDNA sequences. Simcoal2, based on a discrete generation-by-generation approach, could simulate more complex\r\ndemographic scenarios, but runs comparatively slow. MaCS and fastsimcoal, both built on fast, modified sequential\r\nMarkov coalescent algorithms to approximate standard coalescent, are much more efficient whilst keeping salient\r\nfeatures of msHOT and Simcoal2, respectively. Our simulations demonstrate that they are more advantageous over\r\nother programs for a spectrum of evolutionary models. To validate recombination hotspots, LDhat 2.2 rhomap\r\npackage, sequenceLDhot and Haploview were compared for hotspot detection, and sequenceLDhot exhibited the\r\nbest performance based on both real and simulated data.\r\nConclusions: While ms remains an excellent choice for general coalescent simulations of DNA sequences, MaCS\r\nand fastsimcoal are much more scalable and flexible in simulating a variety of demographic events under different\r\nrecombination hotspot models. Furthermore, sequenceLDhot appears to give the most optimal performance in\r\ndetecting and validating cross-over hotspots....
Background: RNA silencing is a process triggered by 21ââ?¬â??24 small RNAs to repress gene expression. Many organisms\r\nincluding plants use RNA silencing to regulate development and physiology, and to maintain genome stability. Plants\r\npossess two classes of small RNAs: microRNAs (miRNAs) and small interfering RNAs (siRNAs). The frameworks of miRNA\r\nand siRNA pathways have been established in the model plant, Arabidopsis thaliana (Arabidopsis).\r\nResults: Here we report the identification of putative genes that are required for the generation and function of\r\nmiRNAs and siRNAs in soybean and sorghum, based on knowledge obtained from Arabidopsis. The gene families,\r\nincluding DCL, HEN1, SE, HYL1, HST, RDR, NRPD1, NRPD2/NRPE2, NRPE1, and AGO, were analyzed for gene structures,\r\nphylogenetic relationships, and protein motifs. The gene expression was validated using RNA-seq, expressed sequence\r\ntags (EST), and reverse transcription PCR (RT-PCR).\r\nConclusions: The identification of these components could provide not only insight into RNA silencing mechanism in\r\nsoybean and sorghum but also basis for further investigation. All data are available at http://sysbio.unl.edu/....
Background: The Acel_2062 protein from Acidothermus cellulolyticus is a protein of unknown function. Initial\r\nsequence analysis predicted that it was a metallopeptidase from the presence of a motif conserved amongst the\r\nAsp-zincins, which are peptidases that contain a single, catalytic zinc ion ligated by the histidines and aspartic acid\r\nwithin the motif (HEXXHXXGXXD). The Acel_2062 protein was chosen by the Joint Center for Structural Genomics\r\nfor crystal structure determination to explore novel protein sequence space and structure-based function annotation.\r\nResults: The crystal structure confirmed that the Acel_2062 protein consisted of a single, zincin-like\r\nmetallopeptidase-like domain. The Met-turn, a structural feature thought to be important for a Met-zincin because\r\nit stabilizes the active site, is absent, and its stabilizing role may have been conferred to the C-terminal Tyr113. In our\r\ncrystallographic model there are two molecules in the asymmetric unit and from size-exclusion chromatography, the\r\nprotein dimerizes in solution. A water molecule is present in the putative zinc-binding site in one monomer, which is\r\nreplaced by one of two observed conformations of His95 in the other.\r\nConclusions: The Acel_2062 protein is structurally related to the zincins. It contains the minimum structural features of\r\na member of this protein superfamily, and can be described as a ââ?¬Å?mini- zincinââ?¬Â. There is a striking parallel with the\r\nstructure of a mini-Glu-zincin, which represents the minimum structure of a Glu-zincin (a metallopeptidase in which\r\nthe third zinc ligand is a glutamic acid). Rather than being an ancestral state, phylogenetic analysis suggests that the\r\nmini-zincins are derived from larger proteins....
Loading....