N-phoneme model of speech signal recognition based on hidden Markov processes
DOI:
https://doi.org/10.18372/2073-4751.82.20366Keywords:
speech signals, phoneme speech recognition, hidden Markov processes, influence of interference, speech recognition probabilityAbstract
This paper solves the urgent scientific problem of increasing the probability of recognizing commands and fused speech in radio engineering devices and telecommunications under the influence of distorting factors by developing new recognition models. It is proposed to use hidden Markov processes to conduct a probabilistic description of the one-, three-, and four-phoneme model of speech signal recognition, which makes it possible to theoretically estimate the probability of recognition using each of the models. On the basis of a comparative analysis, the four-phoneme model of speech signal recognition was investigated, which, by improving the three-phoneme model by adding one more state to the model, allows, unlike other models of speech signal recognition, to increase the probability of their recognition. The probability of recognizing speech signals and commands using the four-phoneme method is established, and it is shown that its application in practice with the help of the developed software allows to achieve a probability of 98%. The influence of amplitude and phase distortion of the speech signal on the recognition probability was studied, which showed that the recognition probability decreases when amplitude noise (recognition probability is 81.7%) and phase noise (recognition probability is 92.3%) are introduced into the speech signal. A comparative analysis of the four- and three-phoneme models is carried out, which shows that the recognition probability error of the four-phoneme model is 40% less than that of the three-phoneme model.
References
Shahamiri S. R. Speech Vision: An End-to-End Deep Learning-Based Dysarthric Automatic Speech Recognition System. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2021. Vol. 29. P. 852–861. DOI: 10.1109/TNSRE.2021.3076778.
Shi Y. et al. Keyword Guided Target Speech Recognition. IEEE Signal Processing Letters. 2024. Vol. 31. P. 1945–1949. DOI: 10.1109/LSP.2024.3432324.
Yu Z., Wang H., Ren J. RealPRNet: A Real-Time Phoneme-Recognized Network for “Believable” Speech Animation. IEEE Internet of Things Journal. 2022. Vol. 9(7). P. 5357–5367. DOI: 10.1109/JIOT.2021.3110468.
Zhu D. et al. TWLip: Exploring Through-Wall Word-Level Lip Reading Based on Coherent SISO Radar. IEEE Internet of Things Journal. 2024. Vol. 11(19). P. 32310–32323. DOI: 10.1109/JIOT.2024.3427329.
Hsiao C.-H. et al. A Text-Dependent End-to-End Speech Sound Disorder Detection and Diagnosis in Mandarin-Speaking Children. IEEE Transactions on Instrumentation and Measurement. 2024. Vol. 73. P. 1–11. doi: 10.1109/TIM.2024.3438853.
Kurtoğlu E. et al. ASL Trigger Recognition in Mixed Activity/Signing Sequences for RF Sensor-Based User Interfaces. IEEE Transactions on Human-Machine Systems. 2022. Vol. 52(4). P. 699–712. DOI: 10.1109/THMS.2021.3131675.
Lee S. et al. IR-UWB Radar-Based Contactless Silent Speech Recognition of Vowels, Consonants, Words, and Phrases. IEEE Access. 2023. Vol. 11. P. 144844–144859. DOI: 10.1109/ACCESS.2023.3344177.
Xue J. et al. Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2024. Vol. 32. P. 4700–4712. DOI: 10.1109/TASLP.2024.3485485.
O'Shaughnessy D. Review of Methods for Automatic Speaker Verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2024. Vol. 32. P. 1776–1789. DOI: 10.1109/TASLP.2023.3346293.
Rademacher P., Wagner K. Efficient Bayesian Sequential Classification Under the Markov Assumption for Various Loss Functions. IEEE Signal Processing Letters. 2020. Vol. 27. P. 401–405. DOI: 10.1109/LSP.2020.2973854.
Marie B. et al. Phase-Sensitive Optical Time-Domain Reflectometry-Based Audio Excitation Signal Demodulation and Reproduction. IEEE Transactions on Instrumentation and Measurement. 2025. Vol. 74. P. 1–14. DOI: 10.1109/TIM.2025.3529561.
Lehmann F., Pieczynski W. Suboptimal Kalman Filtering in Triplet Markov Models Using Model Order Reduction. IEEE Signal Processing Letters. 2020. Vol. 27. P. 1100–1104. DOI: 10.1109/LSP.2020.3002420.
Liu Q. et al. Modular End-to-End Automatic Speech Recognition Framework for Acoustic-to-Word Model. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2020. Vol. 28. P. 2174–2183. DOI: 10.1109/TASLP.2020.3009477.
Truong L. V. Replica Analysis of the Linear Model With Markov or Hidden Markov Signal Priors. IEEE Transactions on Information Theory. 2023. Vol. 69(12). P. 7953–7975. DOI: 10.1109/TIT.2023.3299490.
Downloads
Published
Issue
Section
License
Автори, які публікуються у цьому журналі, погоджуються з наступними умовами:- Автори залишають за собою право на авторство своєї роботи та передають журналу право першої публікації цієї роботи на умовах ліцензії Creative Commons Attribution License, котра дозволяє іншим особам вільно розповсюджувати опубліковану роботу з обов'язковим посиланням на авторів оригінальної роботи та першу публікацію роботи у цьому журналі.
- Автори мають право укладати самостійні додаткові угоди щодо неексклюзивного розповсюдження роботи у тому вигляді, в якому вона була опублікована цим журналом (наприклад, розміщувати роботу в електронному сховищі установи або публікувати у складі монографії), за умови збереження посилання на першу публікацію роботи у цьому журналі.
- Політика журналу дозволяє і заохочує розміщення авторами в мережі Інтернет (наприклад, у сховищах установ або на особистих веб-сайтах) рукопису роботи, як до подання цього рукопису до редакції, так і під час його редакційного опрацювання, оскільки це сприяє виникненню продуктивної наукової дискусії та позитивно позначається на оперативності та динаміці цитування опублікованої роботи (див. The Effect of Open Access).