The analysis of frame sequences in talk show videos, which is necessary for media mining\nand television production, requires significant manual efforts and is a very time-consuming process.\nGiven the vast amount of unlabeled face frames from talk show videos, we address and propose a\nsolution to the problem of recognizing and clustering faces. In this paper, we propose a TV media\nmining system that is based on a deep convolutional neural network approach, which has been\ntrained with a triplet loss minimization method. The main function of the proposed system is\nthe indexing and clustering of video data for achieving an effective media production analysis of\nindividuals in talk show videos and rapidly identifying a specific individual in video data in real-time\nprocessing. Our system uses several face datasets from Labeled Faces in theWild (LFW), which is\na collection of unlabeled web face images, as well as YouTube Faces and talk show faces datasets.\nIn the recognition (person spotting) task, our system achieves an F-measure of 0.996 for the collection\nof unlabeled web face images dataset and an F-measure of 0.972 for the talk show faces dataset. In the\nclustering task, our system achieves an F-measure of 0.764 and 0.935 for the YouTube Faces database\nand the LFW dataset, respectively, while achieving an F-measure of 0.832 for the talk show faces\ndataset, an improvement of 5.4%, 6.5%, and 8.2% over the previous methods.
Loading....