Equally partitioned data are essential for prediction. However, in some important cases,\nthe data distribution is severely unbalanced. In this study, several algorithms are utilized to maximize\nthe learning accuracy when dealing with a highly unbalanced dataset. A linguistic algorithm is\napplied to evaluate the input and output relationship, namely Fuzzy c-Means (FCM), which is\napplied as a clustering algorithm for the majority class to balance the minority class data from about\n3 million cases. Each cluster is used to train several artificial neural network (ANN) models. Different\ntechniques are applied to generate an ensemble genetic fuzzy neuro model (EGFNM) in order to\nselect the models. The first ensemble technique, the intra-cluster EGFNM, works by evaluating\nthe best combination from all the models generated by each cluster. Another ensemble technique\nis the inter-cluster model EGFNM, which is based on selecting the best model from each cluster.\nThe accuracy of these techniques is evaluated using the receiver operating characteristic (ROC)\nvia its area under the curve (AUC). Results show that the AUC of the unbalanced data is 0.67974.\nThe random cluster and best ANN single model have AUCs of 0.7177 and 0.72806, respectively.\nFor the ensemble evaluations, the intra-cluster and the inter-cluster EGFNMs produce 0.7293 and\n0.73038, respectively. In conclusion, this study achieved improved results by performing the EGFNM\nmethod compared with the unbalanced training. This study concludes that selecting several best\nmodels will produce a better result compared with all models combined.
Loading....