The learning-based speech recovery approach using statistical spectral conversion has been used for some kind of distorted speech\nas alaryngeal speech and body-conducted speech (or bone-conducted speech). This approach attempts to recover clean speech\n(undistorted speech) from noisy speech (distorted speech) by converting the statistical models of noisy speech into that of clean\nspeech without the prior knowledge on characteristics and distributions of noise source. Presently, this approach has still not\nattractedmany researchers to apply in general noisy speech enhancement because of somemajor problems: those are the difficulties\nof noise adaptation and the lack of noise robust synthesizable features in different noisy environments. In this paper, we adopted\nthe methods of state-of-the-art voice conversions and speaker adaptation in speech recognition to the proposed speech recovery\napproach applied in different kinds of noisy environment, especially in adverse environments with joint compensation of additive\nand convolutive noises.We proposed to use the decorrelated wavelet packet coefficients as a low-dimensional robust synthesizable\nfeature under noisy environments. We also proposed a noise adaptation for speech recovery with the eigennoise similar to the\neigenvoice in voice conversion. The experimental results showed that the proposed approach highly outperformed traditional\nnonlearning-based approaches.
Loading....