Current Issue : October - December Volume : 2019 Issue Number : 4 Articles : 5 Articles
The purpose of this work is to develop a spoken language processing system for smart device\ntroubleshooting using human-machine interaction. This system combines a software Bidirectional\nLong Short Term Memory Cell (BLSTM)-based speech recognizer and a hardware LSTM-based\nlanguage processor for Natural Language Processing (NLP) using the serial RS232 interface. Mel\nFrequency Cepstral Coecient (MFCC)-based feature vectors from the speech signal are directly\ninput into a BLSTM network. A dropout layer is added to the BLSTM layer to reduce over-fitting and\nimprove robustness. The speech recognition component is a combination of an acoustic modeler,\npronunciation dictionary, and a BLSTM network for generating query text, and executes in real time\nwith an 81.5% Word Error Rate (WER) and average training time of 45 s. The language processor\ncomprises a vectorizer, lookup dictionary, key encoder, Long Short Term Memory Cell (LSTM)-based\ntraining and prediction network, and dialogue manager, and transforms query intent to generate\nresponse text with a processing time of 0.59 s, 5% hardware utilization, and an F1 score of 95.2%.\nThe proposed system has a 4.17% decrease in accuracy compared with existing systems. The existing\nsystems use parallel processing and high-speed cache memories to perform additional training, which\nimproves the accuracy. However, the performance of the language processor has a 36.7% decrease in\nprocessing time and 50% decrease in hardware utilization, making it suitable for troubleshooting\nsmart devices....
Temporal feature integration refers to a set of strategies attempting to capture the information\nconveyed in the temporal evolution of the signal. It has been extensively applied in the context\nof semantic audio showing performance improvements against the standard frame-based audio\nclassification methods. This paper investigates the potential of an enhanced temporal feature\nintegration method to classify environmental sounds. The proposed method utilizes newly introduced\nintegration functions that capture the texture window shape in combination with standard functions\nlike mean and standard deviation in a classification scheme of 10 environmental sound classes.\nThe results obtained from three classification algorithms exhibit an increase in recognition accuracy\nagainst a standard temporal integration with simple statistics, which reveals the discriminative ability\nof the new metrics....
Fractional linear prediction (FLP), as a generalization of conventional linear prediction (LP),\nwas recently successfully applied in different fields of research and engineering, such as biomedical\nsignal processing, speech modeling and image processing. The FLP model has a similar design as\nthe conventional LP model, i.e., it uses a linear combination of â??fractional termsâ? with different\norders of fractional derivative. Assuming only one â??fractional termâ? and using limited number of\nprevious samples for prediction, FLP model with â??restricted memoryâ? is presented in this paper\nand the closed-form expressions for calculation of FLP coefficients are derived. This FLP model\nis fully comparable with the widely used low-order LP, as it uses the same number of previous\nsamples, but less predictor coefficients, making it more efficient. Two different datasets, MIDI\nAligned Piano Sounds (MAPS) and Orchset, were used for the experiments. Triads representing\nthe chords composed of three randomly chosen notes and usual Western musical chords (both of\nthem from MAPS dataset) served as the test signals, while the piano recordings from MAPS dataset\nand orchestra recordings from the Orchset dataset served as the musical signal. The results show\nenhancement of FLP over LP in terms of model complexity, whereas the performance is comparable....
Several researchers have contemplated deep learning-based post-filters to increase the\nquality of statistical parametric speech synthesis, which perform a mapping of the synthetic speech\nto the natural speech, considering the different parameters separately and trying to reduce the\ngap between them. The Long Short-term Memory (LSTM) Neural Networks have been applied\nsuccessfully in this purpose, but there are still many aspects to improve in the results and in the\nprocess itself. In this paper, we introduce a new pre-training approach for the LSTM, with the objective\nof enhancing the quality of the synthesized speech, particularly in the spectrum, in a more efficient\nmanner. Our approach begins with an auto-associative training of one LSTM network, which is used\nas an initialization for the post-filters. We show the advantages of this initialization for the enhancing\nof the Mel-Frequency Cepstral parameters of synthetic speech. Results show that the initialization\nsucceeds in achieving better results in enhancing the statistical parametric speech spectrum in most\ncases when compared to the common random initialization approach of the networks....
O2 is a communication protocol for music systems that extends and interoperates with the popular Open Sound Control (OSC)\nprotocol. Many computer musicians routinely deal with problems of interconnection, unreliable message delivery, and clock\nsynchronization. O2 solves these problems, offering named services, automatic network address discovery, clock synchronization,\nand a reliable message delivery option, as well as inter operability with existing OSC libraries and applications. Aside from these new\nfeatures, O2 owes much of its design to OSC, making it easy to migrate existing OSC applications to O2 or for developers familiar\nwith OSC to begin using O2. O2 addresses the problems of interprocess communication within distributed music applications....
Loading....