Background. Large-scale bisulfite treatment and short reads sequencing technology allow comprehensive estimation of methylation\nstates of Cs in the genomes of different tissues, cell types, and developmental stages. Accurate characterization of DNA methylation\nis essential for understanding genotype phenotype association, gene and environment interaction, diseases, and cancer. Aligning\nbisulfite short reads to a reference genome has been a challenging task.We compared five bisulfite short read mapping tools, BSMAP,\nBismark, BS-Seeker, BiSS, and BRAT-BW, representing two classes of mapping algorithms (hash table and suffix/prefix tries). We\nexamined their mapping efficiency (i.e., the percentage of reads that can be mapped to the genomes), usability, running time,\nand effects of changing default parameter settings using both real and simulated reads. We also investigated how preprocessing\ndata might affect mapping efficiency. Conclusion. Among the five programs compared, in terms of mapping efficiency, Bismark\nperforms the best on the real data, followed by BiSS, BSMAP, and finally BRAT-BWand BS-Seeker with very similar performance.\nIf CPU time is not a constraint, Bismark is a good choice of program formapping bisulfite treated short reads. Data quality impacts\na great deal mapping efficiency. Although increasing the number of mismatches allowed can increase mapping efficiency, it not only\nsignificantly slows down the program, but also runs the risk of having increased false positives. Therefore, users should carefully\nset the related parameters depending on the quality of their sequencing data.
Loading....