Current Issue : April-June Volume : 2026 Issue Number : 2 Articles : 5 Articles
The assessment of Chinese text readability plays a significant role in Chinese language education. Due to the intrinsic differences between alphabetic languages and Chinese character representations, the readability assessment becomes more challenging in terms of the language’s inherent complexity in vocabulary, syntax, and semantics. The article proposed the conceptual analogy between Chinese readability assessment and music’s rhythm and tempo patterns, in which the syntactic structures of the Chinese sentences could be transformed into an image. The Chinese Knowledge and Information Processing Tagger (CkipTagger) tool developed by Sinica‑Taiwan is utilized to decompose the Chinese text into a set of tokens. These tokens are then refined through a user‑defined token pool to retain meaningful units. An image with part‑of‑speech (POS) information will be generated by using the token versus syntax alignment. A discrete cosine transform (DCT) is then applied to extract the temporal characteristics of the text. Moreover, the study integrated four categories: linguistic features–type–token ratio, average sentence length, total word, and difficulty level of vocabulary for the readability assessment. Finally, these features were fed into the Support Vector Machine (SVM) network for the classifications. Furthermore, a bidirectional long short‑term memory (Bi‑LSTM) network is adopted for quantitative comparisons. In simulation, a total of 774 Chinese texts fitted with Taiwan Benchmarks for the Chinese Language were selected and graded by Chinese language experts, consisting of equal amounts of basic, intermediate, and advanced levels. The finding indicated the proposed POS with the linguistic features work well in the SVM network, and the performance matches with the more complex architectures like the Bi‑LSTM network in Chinese readability assessments....
Background and Objectives: Speech-in-noise testing is essential for evaluating functional hearing abilities in clinical practice. Although the Quick Speech-in-Noise test (QuickSIN) is widely used, no equivalent tool existed for European Portuguese. This study aimed to develop a Speech-in-Noise Test for European Portuguese (SiN-EP), linguistically adapted and calibrated for native speakers, to support clinical assessment of speech perception in realistic listening environments. Materials and Methods: The SiN-EP was developed through a multi-stage process. Sentences were drafted to reflect natural speech patterns and reviewed by native speakers for clarity and grammatical accuracy. Selected sentences were recorded by a female native speaker in a controlled acoustic environment and mixed with multi-talker babble at signal-to-noise ratios (SNR (dB)) from 25 to 0 SNR (dB). A pre-test in a free-field setting at 65 dB SPL was conducted with fifteen normal-hearing young adults. Participants repeated each sentence, and their responses were analyzed to refine list composition, adjust difficulty, and ensure phonetic balance. Results: Intelligibility decreased systematically as SNR (dB) worsened, with ceiling effects at 25 and 20 SNR (dB). At 5 SNR (dB), high variability was observed, with set 5 showing disproportionate difficulty and set 14 containing an incomplete sentence; both were removed. At 0 SNR (dB), all sets demonstrated expected low intelligibility. The final test comprises thirteen lists of six sentences, each maintaining stable intelligibility, phonetic representativeness, and consistent difficulty across SNRs (dB). Conclusions: The SiN-EP provides a linguistically appropriate, phonetically balanced, and SNR (dB) calibrated instrument for assessing speech-in-noise perception in European Portuguese. The refinement process improved reliability and list equivalence, supporting the test’s clinical and research applicability. The SiN-EP fills a critical gap in assessing speech-in-noise perception in European Portuguese speakers, providing a reliable tool for both clinical and research applications....
Background/Objectives: Cochlear implants (CIs) are a common treatment of severe-toprofound hearing loss and provide reasonable speech understanding, at least in quiet situations. However, their limited spectro-temporal resolution restricts sound quality, which is especially crucial for music appraisal. Many CI recipients wear a hearing aid (HA) on the non-implanted ear (bimodal users), which may enhance music perception by adding acoustic fine structure cues. Since it is unclear how the HA should be fitted in conjunction with the CI to achieve optimal benefit, this study aimed to systematically vary HA fitting parameters and assess their impact on music sound quality in bimodal users. Methods: Thirteen bimodal CI recipients participated in a listening experiment using a master hearing aid that allowed controlled manipulation of HA settings. Participants evaluated three music excerpts (pop with vocals, pop without vocals, classical) using the multiple-stimulus with hidden reference and anchor (MUSHRA) test. To assess the reliability of individual judgments, each participant repeated the test, and responses were analyzed with the eGauge method. Results: Most participants provided reliable and consistent sound quality ratings. Compared to a standard DSL v5.0 prescriptive fitting, modifications in compression settings and low-frequency gain significantly influenced perceived music quality. The effect of low-frequency gain adjustments was especially pronounced for pop music with vocals, indicating stimulus-dependent benefits. Conclusions: The study demonstrates that HA fitting for bimodal CI users can be optimized beyond standard prescriptive rules to enhance music sound quality by increasing low-frequency gain, particularly for vocal-rich pieces. Additionally, the testing method shows promise for clinical application, enabling individualized HA adjustments based on patient-specific listening preferences, hence fostering personalized audiology care....
Segmental models compute likelihood scores in segment units instead of frame units to recognize sequence data. Motivated by some promising results in speech recognition and natural language processing, we apply segmental models to sound event detection for the first time and verify their effectiveness compared to the conventional frame-based approaches. The proposed model processes variable-length segments of sound signals by encoding feature vectors employing deep learning techniques. These encoded vectors are subsequently embedded to derive representative values for each segment, which are then scored to identify the best matches for each input sound signal. Owing to the inherent variation in lengths and types of input sound signals, segmental models incur high computational and memory costs. To address this issue, a simple segment-scoring function with efficient computation and memory usage is employed in our end-to-end model. We use marginal log loss as the cost function while training the segment model, which eliminates the reliance on strong labels for sound events. Experiments performed on the detection and classification of acoustic scenes and events challenge 2019 dataset reveal that the proposed method achieves a better F-score in sound event detection compared with conventional convolutional recurrent neural network-based models....
Large language models (LLMs) have been increasingly applied in Automatic Speech Recognition (ASR), achieving significant advancements. However, the performance of LLM-based ASR (LLM-ASR) models remains unsatisfactory when applied across domains due to domain shifts between acoustic and linguistic conditions. To address this challenge, we propose a decoupled two-stage domain adaptation framework that separates the adaptation process into text-only and audio-only stages. In the first stage, we leverage abundant text data from the target domain to refine the LLM component, thereby improving its contextual and linguistic alignment with the target domain. In the second stage, we employ a pseudo-labeling method with unlabeled audio data in the target domain and introduce two key enhancements: (1) incorporating decoupled auxiliary Connectionist Temporal Classification (CTC) loss to improve the robustness of the speech encoder under different acoustic conditions; (2) adopting a synchronous LLM tuning strategy, allowing the LLM to continuously learn linguistic alignment from pseudo-labeled transcriptions enriched with domain textual knowledge. The experimental results demonstrate that our proposed methods significantly improve the performance of LLM-ASR in the target domain, achieving a relative word error rate reduction of 19.2%....
Loading....