Background: Statistical learning (SL) techniques can address non-linear relationships\r\nand small datasets but do not provide an output that has an epidemiologic\r\ninterpretation.\r\nMethods: A small set of clinical variables (CVs) for stage-1 non-small cell lung cancer\r\npatients was used to evaluate an approach for using SL methods as a preprocessing\r\nstep for survival analysis. A stochastic method of training a probabilistic neural\r\nnetwork (PNN) was used with differential evolution (DE) optimization. Survival scores\r\nwere derived stochastically by combining CVs with the PNN. Patients (n = 151) were\r\ndichotomized into favorable (n = 92) and unfavorable (n = 59) survival outcome\r\ngroups. These PNN derived scores were used with logistic regression (LR) modeling\r\nto predict favorable survival outcome and were integrated into the survival analysis\r\n(i.e. Kaplan-Meier analysis and Cox regression). The hybrid modeling was compared\r\nwith the respective modeling using raw CVs. The area under the receiver operating\r\ncharacteristic curve (Az) was used to compare model predictive capability. Odds\r\nratios (ORs) and hazard ratios (HRs) were used to compare disease associations with\r\n95% confidence intervals (CIs).\r\nResults: The LR model with the best predictive capability gave Az = 0.703. While\r\ncontrolling for gender and tumor grade, the OR = 0.63 (CI: 0.43, 0.91) per standard\r\ndeviation (SD) increase in age indicates increasing age confers unfavorable outcome.\r\nThe hybrid LR model gave Az = 0.778 by combining age and tumor grade with the\r\nPNN and controlling for gender. The PNN score and age translate inversely with\r\nrespect to risk. The OR = 0.27 (CI: 0.14, 0.53) per SD increase in PNN score indicates\r\nthose patients with decreased score confer unfavorable outcome. The tumor grade\r\nadjusted hazard for patients above the median age compared with those below the\r\nmedian was HR = 1.78 (CI: 1.06, 3.02), whereas the hazard for those patients below\r\nthe median PNN score compared to those above the median was HR = 4.0 (CI: 2.13,\r\n7.14).\r\nConclusion: We have provided preliminary evidence showing that the SL\r\npreprocessing may provide benefits in comparison with accepted approaches. The\r\nwork will require further evaluation with varying datasets to confirm these findings.
Loading....