We introduce a new model for describing word frequency distributions in documents for automatic text classification tasks. In the\nmodel, the gamma-Poisson probability distribution is used to achieve better text modeling. The framework of the modeling and its\napplication to text categorization are demonstrated with practical techniques for parameter estimation and vector normalization.\nTo investigate the efficiency of our model, text categorization experiments were performed on 20 Newsgroups, Reuters-21578,\nIndustry Sector, and TechTC-100 datasets.The results show that the model allows performance comparable to that of the support\nvector machine and clearly exceeding that of themultinomial model and the Dirichlet-multinomial model.The time complexity of\nthe proposed classifier and its advantage in practical applications are also discussed.
Loading....