Background: In the biological experiments of soybean species, molecular markers are widely used to verify the\nsoybean genome or construct its genetic map. Among a variety of molecular markers, insertions and deletions\n(InDels) are preferred with the advantages of wide distribution and high density at the whole-genome level. Hence,\nthe problem of detecting InDels based on next-generation sequencing data is of great importance for the design\nof InDel markers. To tackle it, this paper integrated machine learning techniques with existing software and\ndeveloped two algorithms for InDel detection, one is the best F-score method (BF-M) and the other is the Support\nVector Machine (SVM) method (SVM-M), which is based on the classical SVM model.\nResults: The experimental results show that the performance of BF-M was promising as indicated by the high precision\nand recall scores, whereas SVM-M yielded the best performance in terms of recall and F-score. Moreover, based on the\nInDel markers detected by SVM-M from soybeans that were collected from 56 different regions, highly polymorphic loci\nwere selected to construct an InDel marker database for soybean.\nConclusions: Compared to existing software tools, the two algorithms proposed in this work produced substantially\nhigher precision and recall scores, and remained stable in various types of genomic regions. Moreover, based on SVM-M,\nwe have constructed a database for soybean InDel markers and published it for academic research.
Loading....