Current Issue : April - June Volume : 2017 Issue Number : 2 Articles : 5 Articles
The paper describes the key concepts of a word spotting system for Russian based on large vocabulary continuous speech\nrecognition. Key algorithms and system settings are described, including the pronunciation variation algorithm, and the\nexperimental results on the real-life telecom data are provided. The description of system architecture and the user interface\nis provided. The system is based on CMU Sphinx open-source speech recognition platform and on the linguistic models and\nalgorithms developed by Speech Drive LLC. The effective combination of baseline statistic methods, real-world training data, and\nthe intensive use of linguistic knowledge led to a quality result applicable to industrial use....
An adaptive multi-rate wideband (AMR-WB) code is a speech codec developed on the\nbasis of an algebraic code-excited linear-prediction (ACELP) coding technique, and has a double\nadvantage of low bit rates and high speech quality. This coding technique is widely used in modern\nmobile communication systems for a high speech quality in handheld devices. However, a major\ndisadvantage is that a vector quantization (VQ) of immittance spectral frequency (ISF) coefficients\noccupies a significant computational load in the AMR-WB encoder. Hence, this paper presents a\ntriangular inequality elimination (TIE) algorithm combined with a dynamic mechanism and an\nintersection mechanism, abbreviated as the DI-TIE algorithm, to remarkably improve the complexity\nof ISF coefficient quantization in the AMR-WB speech codec. Both mechanisms are designed in a\nway that recursively enhances the performance of the TIE algorithm. At the end of this work, this\nproposal is experimentally validated as a superior search algorithm relative to a conventional TIE,\na multiple TIE (MTIE), and an equal-average equal-variance equal-norm nearest neighbor search\n(EEENNS) approach. With a full search algorithm as a benchmark for search load comparison, this\nwork provides a search load reduction above 77%, a figure far beyond 36% in the TIE, 49% in the\nMTIE, and 68% in the EEENNS approach....
The development and popularity of voice-user interfaces made spontaneous speech processing an important research field. One\nof the main focus areas in this field is automatic speech recognition (ASR) that enables the recognition and translation of spoken\nlanguage into text by computers. However, ASR systems often work less efficiently for spontaneous than for read speech, since the\nformer differs from any other type of speech in many ways. And the presence of speech disfluencies is its prominent characteristic.\nThese phenomena are an important feature in human-human communication and at the same time they are a challenging obstacle\nfor the speech processing tasks. In this paper we address an issue of voiced hesitations (filled pauses and sound lengthenings)\ndetection in Russian spontaneous speech by utilizing different machine learning techniques, from grid search and gradient descent\nin rule-based approaches to such data-driven ones as ELM and SVM based on the automatically extracted acoustic features.\nExperimental results on the mixed and quality diverse corpus of spontaneous Russian speech indicate the efficiency of the\ntechniques for the task in question, with SVM outperforming other methods....
We present a novel non-iterative and rigorously motivated approach for estimating hidden Markov models (HMMs)\nand factorial hidden Markov models (FHMMs) of high-dimensional signals. Our approach utilizes the asymptotic\nproperties of a spectral, graph-based approach for dimensionality reduction and manifold learning, namely the\ndiffusion framework. We exemplify our approach by applying it to the problem of single microphone speech\nseparation, where the log-spectra of two unmixed speakers are modeled as HMMs, while their mixture is modeled as\nan FHMM. We derive two diffusion-based FHMM estimation schemes. One of which is experimentally shown to\nprovide separation results that compare with contemporary speech separation approaches based on HMM. The\nsecond scheme allows a reduced computational burden....
Statistics of pauses appearing in Polish as a potential source of biometry information for automatic speaker recognition\nwere described. The usage of three main types of acoustic pauses (silent, filled and breath pauses) and syntactic pauses\n(punctuation marks in speech transcripts) was investigated quantitatively in three types of spontaneous speech\n(presentations, simultaneous interpretation and radio interviews) and read speech (audio books). Selected parameters of\npauses extracted for each speaker separately or for speaker groups were examined statistically to verify usefulness of\ninformation on pauses for speaker recognition and speaker profile estimation. Quantity and duration of filled pauses,\naudible breaths, and correlation between the temporal structure of speech and the syntax structure of the spoken\nlanguage were the features which characterize speakers most. The experiment of using pauses in speaker biometry\nsystem (using Universal Background Model and i-vectors) resulted in 30 % equal error rate. Including pause-related\nfeatures to the baseline Mel-frequency cepstral coefficient system has not significantly improved its performance. In the\nexperiment with automatic recognition of three types of spontaneous speech, we achieved 78 % accuracy, using GMM\nclassifier. Silent pause-related features allowed distinguishing between read and spontaneous speech by extreme\ngradient boosting with 75 % accuracy....
Loading....