Inventi Impact: Bioinformatics

Articles

Inventi:ebi/43457/21

Improving the Recall of Biomedical Named Entity Recognition with Label Re-Correction and Knowledge Distillation

01-Oct-2021 Research 2021 : October-December

Huiwei Zhou, Zhe Liu, Chengkun Lang, Yibin Xu, Yingyu Lin, Junjie Hou

Background: Biomedical named entity recognition is one of the most essential tasks in biomedical information extraction. Previous studies suffer from inadequate annotated datasets, especially the limited knowledge contained in them. Methods: To remedy the above issue, we propose a novel Biomedical Named Entity Recognition (BioNER) framework with label re-correction and knowledge distillation strategies, which could not only create large and high-quality datasets but also obtain a high-performance recognition model. Our framework is inspired by two points: (1) named entity recognition should be considered from the perspective of both coverage and accuracy; (2) trustable annotations should be yielded by iterative correction. Firstly, for coverage, we annotate chemical and disease entities in a large-scale unlabeled dataset by PubTator to generate a weakly labeled dataset. For accuracy, we then filter it by utilizing multiple knowledge bases to generate another weakly labeled dataset. Next, the two datasets are revised by a label re-correction strategy to construct two high-quality datasets, which are used to train two recognition models, respectively. Finally, we compress the knowledge in the two models into a single recognition model with knowledge distillation. Results: Experiments on the BioCreative V chemical-disease relation corpus and NCBI Disease corpus show that knowledge from large-scale datasets significantly improves the performance of BioNER, especially the recall of it, leading to new state-of-the-art results. Conclusions: We propose a framework with label re-correction and knowledge distillation strategies. Comparison results show that the two perspectives of knowledge in the two re-corrected datasets respectively are complementary and both effective for BioNER.

How to Cite this Article
Attribution/ CC Compliant Citation: Zhou, H., Liu, Z., Lang, C. et al. Improving the recall of biomedical named entity recognition with label re-correction and knowledge distillation. BMC Bioinformatics 22, 295 (2021). https://doi.org/10.1186/s12859-021-04200-w http://creativecommons.org/licenses/by/4.0/ Some formatting elements, header, footer, logos, dates and pagination were modified while adapting this article.
Download Full Text

Call Us: +4 (800) 888-0008

Inventi Impact: Bioinformatics

Articles

Inventi:ebi/43457/21

Improving the Recall of Biomedical Named Entity Recognition with Label Re-Correction and Knowledge Distillation

How to Cite this Article

Links

Contact Us