Current Issue : July - September Volume : 2014 Issue Number : 3 Articles : 6 Articles
We present in this paper a voice conversion (VC) method for a person with an articulation disorder resulting from\nathetoid cerebral palsy. The movement of such speakers is limited by their athetoid symptoms, and their consonants\nare often unstable or unclear, which makes it difficult for them to communicate. In this paper, exemplar-based\nspectral conversion using nonnegative matrix factorization (NMF) is applied to a voice with an articulation disorder. To\npreserve the speaker�s individuality, we used an individuality-preserving dictionary that is constructed from the source\nspeaker�s vowels and target speaker�s consonants. Using this dictionary, we can create a natural and clear voice\npreserving their voice�s individuality. Experimental results indicate that the performance of NMF-based VC is\nconsiderably better than conventional GMM-based VC....
This paper presents an optical music recognition (OMR) system to process the handwritten musical scores of Kunqu\nOpera written in Gong-Che Notation (GCN). First, it introduces the background of Kunqu Opera and GCN. Kunqu\nOpera is one of the oldest forms of musical activity, spanning the sixteenth to eighteenth centuries, and GCN has\nbeen the most popular notation for recording musical works in China since the seventh century. Many Kunqu\nOperas that use GCN are available as original manuscripts or photocopies, and transforming these versions into\na machine-readable format is a pressing need. The OMR system comprises six stages: image pre-processing,\nsegmentation, feature extraction, symbol recognition, musical semantics, and musical instrument digital interface\n(MIDI) representation. This paper focuses on the symbol recognition stage and obtains the musical information\nwith Bayesian, genetic algorithm, and K-nearest neighbor classifiers. The experimental results indicate that symbol\nrecognition for Kunqu Opera''s handwritten musical scores is effective. This work will help to preserve and popularize\nChinese cultural heritage and to store Kunqu Opera scores in a machine-readable format, thereby ensuring the\npossibility of spreading and performing original Kunqu Opera musical scores....
In this paper, an analytical approach to estimate the instantaneous frequencies of a multicomponent signal is\npresented. A non-stationary signal composed of oscillation modes or resonances is described by a multicomponent\nAM-FM model. The proposed method has two main stages. At first, the signal is decomposed into its oscillation\ncomponents. Afterwards, the instantaneous frequency of each component is estimated. The decomposition stage is\nperformed through the basis expansion exploiting orthogonal rational functions in the complex plane. Orthogonal\nrational bases are generalized to expand linear time-varying systems. To decompose the non-stationary signal, its\nequivalent time-varying system is sought. The time-varying poles of this system are required to construct appropriate\nbasis functions. An adaptive data segmentation algorithm is provided for this purpose. The effect of noise is\nscrutinized analytically and evaluated experimentally to verify the robustness of the new method. The performance of\nthis method in extraction of embedded instantaneous frequencies is asserted by simulations on both synthetic data\nand real-world audio signal...
We propose an integrative method of recognizing gestures such as pointing, accompanying speech. Speech\ngenerated simultaneously with gestures can assist in the recognition of gestures, and since this occurs in a\ncomplementary manner, gestures can also assist in the recognition of speech. Our integrative recognition method\nuses a probability distribution which expresses the distribution of the time interval between the starting times of\ngestures and of the corresponding utterances. We evaluate the rate of improvement of the proposed integrative\nrecognition method with a task involving the solution of a geometry problem....
An approach is proposed for creating location-specific audio textures for virtual location-exploration services. The\npresented approach creates audio textures by processing a small amount of audio recorded at a given location,\nproviding a cost-effective way to produce a versatile audio signal that characterizes the location. The resulting texture\nis non-repetitive and conserves the location-specific characteristics of the audio scene, without the need of collecting\nlarge amount of audio from each location. The method consists of two stages: analysis and synthesis. In the analysis\nstage, the source audio recording is segmented into homogeneous segments. In the synthesis stage, the audio\ntexture is created by randomly drawing segments from the source audio so that the consecutive segments will have\ntimbral similarity near the segment boundaries. Results obtained in listening experiments show that there is no\nstatistically significant difference in the audio quality or location-specificity of audio when the created audio textures\nare compared to excerpts of the original recordings. Therefore, the proposed audio textures could be utilized in virtual\nlocation-exploration services. Examples of source signals and audio textures created from them are available at\nwww.cs.tut.fi/~heittolt/audiotexture....
Speech production and speech perception studies were conducted to compare (de)voicing in the Romance\nlanguages European Portuguese (EP) and Italian. For the speech production part, velar stops in two positions and\nfour vowel contexts were recorded. The voicing status for 10 consecutive landmarks during stop closure was\ncomputed. Results showed that during the complete stop closure voicing was always maintained for Italian, and\nthat for EP, there was strong devoicing for all vowel contexts and positions. Both language and vowel context had a\nsignificant effect on voicing during stop closure. The duration values and voicing patterns from the production\nstudy were then used as input factors to a follow-up perceptual experiment to test the effects of vowel duration,\nstop duration and voicing maintenance on voicing perception by EP and Italian listeners. Perceptual stimuli (VCV)\nwere generated using biomechanical modelling so that they would include physically realistic transitions between\nphonemes. The consonants were velar stops, with no burst or noise included in the signal. A strong language\ndependency of the three factors on listeners'' voicing distinction was found, with high sensitivity for all three\ncues for EP listeners and low sensitivity for Italian listeners. For EP stimuli with high voicing maintenance during\nstop closure, this cue was very strong and overrode the other two acoustic cues. However, for stimuli with low\nvoicing maintenance (i.e. highly devoiced stimuli), the acoustic cues vowel duration and stop duration take\nover. Even in the absence of both voicing maintenance during stop closure and a burst, the acoustic cues\nvowel duration and stop duration guaranteed stable voicing distinction in EP. Italian listeners were insensitive\nto all three acoustic cues examined in this study, with stable voiced responses throughout all of the varying\nfully crossed factors. None of the examined acoustic cues appeared to be used by Italian listeners to obtain a\nrobust voicing distinction, thus pointing to the use of other acoustic cues or combination of other cues to\nguarantee stable voicing distinction in this language....
Loading....