Extracting information about academic activity transactions from unstructured documents is a key problem in the analysis of\nacademic behaviors of researchers. The academic activities transaction includes five elements: person, activities, objects, attributes,\nand time phrases. The traditional method of information extraction is to extract shallow text features and then to recognize\nadvanced features from text with supervision. Since the information processing of different levels is completed in steps, the error\ngenerated from various steps will be accumulated and affect the accuracy of final results. However, because Deep Belief Network\n(DBN) model has the ability to automatically unsupervise learning of the advanced features from shallow text features, the model\nis employed to extract the academic activities transaction. In addition, we use character-based feature to describe the raw features\nof named entities of academic activity, so as to improve the accuracy of named entity recognition. In this paper, the accuracy of the\nacademic activities extraction is compared by using character-based feature vector and word-based feature vector to express the\ntext features, respectively, and with the traditional text information extraction based on Conditional Random Fields. The results\nshow that DBN model is more effective for the extraction of academic activities transaction information.
Loading....