Machine learning techniques are a standard approach in spam detection. Their quality depends on the quality of the learning set,\nandwhen the set is out of date, the quality of classification falls rapidly.Themost popular publicweb spamdataset that can be used to\ntrain a spamdetectorâ??WEBSPAM-UK2007â??is over ten years old. Therefore, there is a place for a lifelong machine learning system\nthat can replace the detectors based on a static learning set. In this paper, we propose a novel web spam recognition system.The\nsystem automatically rebuilds the learning set to avoid classification based on outdated data. Using a built-in automatic selection\nof the active classifier the system very quickly attains productive accuracy despite a limited learning set. Moreover, the system\nautomatically rebuilds the learning set using external data from spam traps and popular web services. A test on real data from\nQuora, Reddit, and Stack Overflow proved the high recognition quality. Both the obtained average accuracy and the F-measure\nwere 0.98 and 0.96 for semiautomatic and fullâ??automatic mode, respectively.
Loading....