Current Issue : January-March Volume : 2024 Issue Number : 1 Articles : 5 Articles
The significance of emotion recognition technology is continuing to grow, and research in this field enables artificial intelligence to accurately understand and react to human emotions. This study aims to enhance the efficacy of emotion recognition from speech by using dimensionality reduction algorithms for visualization, effectively outlining emotion-specific audio features. As a model for emotion recognition, we propose a new model architecture that combines the bidirectional long short-term memory (BiLSTM)–Transformer and a 2D convolutional neural network (CNN). The BiLSTM–Transformer processes audio features to capture the sequence of speech patterns, while the 2D CNN handles Mel-Spectrograms to capture the spatial details of audio. To validate the proficiency of the model, the 10-fold cross-validation method is used. The methodology proposed in this study was applied to Emo-DB and RAVDESS, two major emotion recognition from speech databases, and achieved high unweighted accuracy rates of 95.65% and 80.19%, respectively. These results indicate that the use of the proposed transformer-based deep learning model with appropriate feature selection can enhance performance in emotion recognition from speech....
In recent years, the use of electroencephalography (EEG) has grown as a tool for diagnostic and brain function monitoring, being a simple and non-invasive method compared with other procedures like histological sampling. Typically, in order to extract functional brain responses from EEG signals, prolonged and repeated stimuli are needed because of the artifacts generated in recordings which adversely impact the stimulus-response analysis. To mitigate the artifact effect, correlation analysis (CA) methods are applied in the literature, where the predominant approaches focus on enhancing stimulus-response correlations through the use of linear analysis methods like canonical correlation analysis (CCA). This paper introduces a novel CA framework based on a neural network with a loss function specifically designed to maximize correlation between EEG and speech stimuli. Compared with other deep learning CA approaches (DCCAs) in the literature, this framework introduces a single multilayer perceptron (MLP) network instead of two networks for each stimulus. To validate the proposed approach, a comparison with linear CCA (LCCA) and DCCA was performed, using a dataset containing the EEG traces of subjects listening to speech stimuli. The experimental results show that the proposed method improves the overall Pearson correlation by 10.56% compared with the state-of-the-art DCCA method....
A quantitative evaluation of the musical timbre and its variations is important for the analysis of audio recordings and computer-aided music composition. Using the FFT acoustic descriptors and their representation in an abstract timbral space, variations in a sample of monophonic sounds of chordophones (violin, cello) and aerophones (trumpet, transverse flute, and clarinet) sounds are analyzed. It is concluded that the FFT acoustic descriptors allow us to distinguish the timbral variations in the musical dynamics, including crescendo and vibrato. Furthermore, using the Random Forest algorithm, it is shown that the FFT-Acoustic provides a statistically significant classification to distinguish musical instruments, families of instruments, and dynamics. We observed an improvement in the FFT-Acoustic descriptors when classifying pitch compared to some timbral features of Librosa....
Students may encounter problems concentrating during a lecture due to various reasons, which can be related to the educator’s accent or the student’s auditory difficulties. This may lead to reduced participation and poor performance in the class. In this paper, we explored whether the incorporation of the humanoid robot Pepper can help in improving the learning experience. Pepper can capture the audio of a person; however, there is no guarantee of accuracy of the recorded audio due to various factors. Therefore, we investigated the limitations of Pepper’s speech recognition system with the aim of observing the effect of distance, age, gender, and the complexity of statements. We conducted an experiment with eight persons including five females and three males who spoke provided statements at different distances. These statements were classified using different statistical scores. Pepper does not have the functionality to transcribe speeches into text. To overcome this problem, we integrated Pepper with a speech-to-text recognition tool, Whisper, which transcribes speech into text that can be displayed on Pepper’s screen using its service. The purpose of the study is to develop a system where the humanoid robot Pepper and the speech-to-text recognition tool Whisper act in synergy to bridge the gap between verbal and visual communication in education. This system could be beneficial for students as they will better understand the content through the visual representation of the teacher’s spoken words regardless of any hearing impairments and accent problems. The methodology involves recording the participant’s speech, followed by its transcription to text by Whisper, and then evaluation of the generated text using various statistical scores. We anticipate that the proposed system will be able to increase the student’s learning experience, engagement, and immersion in a classroom environment....
Algorithmic music composition has been gaining prominence and recognition as an innovative approach to music education, providing students with opportunities to explore creativity, computational thinking, and musical knowledge. This study aims to investigate the impact of integrating algorithmic music composition in the classroom, examining its influence on student engagement, musical knowledge, and creative expression, as well as to enhance computational thinking skills. A mixed-method case study was conducted in three Basic Music Education classrooms in the north of Portugal, involving 71 participants (68 students and 3 music teachers). The results reveal: (i) both successes and challenges in integrating computational thinking concepts and practices; (ii) pedagogical benefits of integrating programming platforms, where programming concepts overlapped with music learning outcomes; and (iii) positive impact on participants’ programming self-confidence and recognition of programming’s importance. Integrating algorithmic music composition in the classroom positively influences student engagement, musical knowledge, and creative expression. The use of algorithmic techniques provides a novel and engaging platform for students to explore music composition, fostering their creativity, critical thinking, and collaboration skills. Educators can leverage algorithmic music composition as an effective pedagogical approach to enhance music education, allowing students to develop a deeper understanding of music theory and fostering their artistic expression. Future research should contribute to the successful integration of digital technologies in the Portuguese curriculum by further exploring the long-term effects and potential applications of algorithmic music composition in different educational contexts....
Loading....