Purpose: To explore imaging biomarkers that can be used for diagnosis and prediction of pathologic stage in\nnon-small cell lung cancer (NSCLC) using multiple machine learning algorithms based on CT image feature\nanalysis.\nMethods: Patients with stage IA to IV NSCLC were included, and the whole dataset was divided into training\nand testing sets and an external validation set. To tackle imbalanced datasets in NSCLC, we generated a new\ndataset and achieved equilibrium of class distribution by using SMOTE algorithm. The datasets were randomly\nsplit up into a training/testing set. We calculated the importance value of CT image features by means of\nmean decrease gini impurity generated by random forest algorithm and selected optimal features according\nto feature importance (mean decrease gini impurity >0.005). The performance of prediction model in training\nand testing sets were evaluated from the perspectives of classification accuracy, average precision (AP) score\nand precision-recall curve. The predictive accuracy of the model was externally validated using lung\nadenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) samples from TCGA database.\nResults: The prediction model that incorporated nine image features exhibited a high classification accuracy,\nprecision and recall scores in the training and testing sets. In the external validation, the predictive accuracy\nof the model in LUAD outperformed that in LUSC.\nConclusions: The pathologic stage of patients with NSCLC can be accurately predicted based on CT image\nfeatures, especially for LUAD. Our findings extend the application of machine learning algorithms in CT image\nfeature prediction for pathologic staging and identify potential imaging biomarkers that can be used for\ndiagnosis of pathologic stage in NSCLC patients.
Loading....