Current Issue : October-December Volume : 2024 Issue Number : 4 Articles : 5 Articles
Music appreciation class is to appreciate and evaluate musical works through listening, experiencing, exploring and evaluating, so as to cultivate students’ aesthetic perception ability and music cultural literacy. This paper examines the relationship between music emotion and music elements, selects traditional music elements that have a significant impact on people’s perception of music emotion for labeling, and creates a data set. Then, based on the emotional characteristics of music elements, the role mechanism of traditional music elements in music art appreciation was explored. Based on the coupling model, we also explored the law of time evolution of the coupling system of “traditional elements-music empathy level.” Traditional music elements have a significant positive correlation (P=0.0021) with students’ empathy in music appreciation, as demonstrated in the results. The results show that students’ music appreciation ability can be significantly enhanced through the synergy of excellent traditional Chinese music elements in music and art appreciation, which can help them to establish a correct worldview, outlook on life, and values....
Several attempts for speech brain–computer interfacing (BCI) have been made to decode phonemes, sub-words, words, or sentences using invasive measurements, such as the electrocorticogram (ECoG), during auditory speech perception, overt speech, or imagined (covert) speech. Decoding sentences from covert speech is a challenging task. Sixteen epilepsy patients with intracranially implanted electrodes participated in this study, and ECoGs were recorded during overt speech and covert speech of eight Japanese sentences, each consisting of three tokens. In particular, Transformer neural network model was applied to decode text sentences from covert speech, which was trained using ECoGs obtained during overt speech. We first examined the proposed Transformer model using the same task for training and testing, and then evaluated the model’s performance when trained with overt task for decoding covert speech. The Transformer model trained on covert speech achieved an average token error rate (TER) of 46.6% for decoding covert speech, whereas the model trained on overt speech achieved a TER of 46.3% (p > 0.05; d = 0.07) . Therefore, the challenge of collecting training data for covert speech can be addressed using overt speech. The performance of covert speech can improve by employing several overt speeches....
In voice analysis, the electroglottographic (EGG) signal has long been recognized as a useful complement to the acoustic signal, but only when the vocal folds are actually contacting, such that this signal has an appreciable amplitude. However, phonation can also occur without the vocal folds contacting, as in breathy voice, in which case the EGG amplitude is low, but not zero. It is of great interest to identify the transition from non-contacting to contacting, because this will substantially change the nature of the vocal fold oscillations; however, that transition is not in itself audible. The magnitude of the cycle-normalized peak derivative of the EGG signal is a convenient indicator of vocal fold contacting, but no current EGG hardware has a sufficient signal-to-noise ratio of the derivative. We show how the textbook techniques of spectral thresholding and static notch filtering are straightforward to implement, can run in real time, and can mitigate several noise problems in EGG hardware. This can be useful to researchers in vocology....
BLC Theory proposes that individual differences (IDs) in language proficiency (in both native and non-native speakers) can poorly be mapped on a single proficiency scale. Instead, IDs can best be understood and studied in terms of two fundamentally different dimensions: (1) the cognition of oral language (receptive and productive speech processing) and (2) the cognition of the written language (reading and writing). This paper presents an update of BLC Theory placed under a non-nativist, usage-based, neural-network metatheory of language as a complex system. The paper includes predictions for the absence or presence of IDs in the oral and written domains, separately for native and non-native speakers. The theory predicts that while cognitive factors such as executive functions, non-verbal memory, and intelligence positively affect the acquisition of reading and writing skills in both native and non-native speakers, they do not play a significant role in the acquisition of speech processing in either native or non-native speakers. Contrary to folk wisdom, one does not need to be particularly intelligent to learn to understand and produce speech in a non-native language. Attention is given to typological differences between children’s home language(s) and the standard language(s) of literacy....
Voice conversion is the task of changing the speaker characteristics of input speech while preserving its linguistic content. It can be used in various areas, such as entertainment, medicine, and education. The quality of the converted speech is crucial for voice conversion algorithms to be useful in these various applications. Deep learning-based voice conversion algorithms, which have been showing promising results recently, generally consist of three modules: a feature extractor, feature converter, and vocoder. The feature extractor accepts the waveform as the input and extracts speech feature vectors for further processing. These speech feature vectors are later synthesized back into waveforms by the vocoder. The feature converter module performs the actual voice conversion; therefore, many previous studies separately focused on improving this module. These works combined the separately trained vocoder to synthesize the final waveform. Since the feature converter and the vocoder are trained independently, the output of the converter may not be compatible with the input of the vocoder, which causes performance degradation. Furthermore, most voice conversion algorithms utilize mel-spectrogram-based speech feature vectors without modification. These feature vectors have performed well in a variety of speech-processing areas but could be further optimized for voice conversion tasks. To address these problems, we propose a novel wave-to-wave (wav2wav) voice conversion method that integrates the feature extractor, the feature converter, and the vocoder into a single module and trains the system in an end-to-end manner. We evaluated the efficiency of the proposed method using the VCC2018 dataset....
Loading....