Cancerlectins have an inhibitory effect on the growth of cancer cells and are currently being employed as therapeutic agents.\nThe accurate identification of the cancerlectins should provide insight into the molecular mechanisms of cancers. In this study,\na new computational method based on the RF (Random Forest) algorithm is proposed for further improving the performance\nof identifying cancerlectins. Hybrid feature space before feature selection is developed by combining different individual feature\nspaces, CTD (Composition, Transition, and Distribution), PseAAC (Pseudo Amino Acid Composition), PSSM (Position-Specific\nScoring Matrix), and disorder.The SMOTE (Synthetic Minority Oversampling Technique) is applied to solve the imbalanced data\nproblem. To reduce feature redundancy and computation complexity, we propose a two-step feature selection process to select\ninformative features. A 5-fold cross-validation technique is used for the evaluation of various prediction strategies. The proposed\nmethod achieves a sensitivity of 0.779, a specificity of 0.717, an accuracy of 0.748, and anMCC (Matthew�s Correlation Coefficient)\nof 0.497.The prediction results are also compared with other existing methods on the same dataset using 5-fold cross-validation.\nThe comparison results demonstrate the high effectiveness of our method for predicting cancerlectins
Loading....