ANALYSIS OF THE EFFICIENCY OF THE VOICE IDENTIFICATION SYSTEM BASED ON MFCC AND GMM-SVM UNDER THE INFLUENCE OF INTERFERENCE IN THE COMMUNICATION CHANNEL
Keywords:speech signal, voice identification, short-time energy, zero-crossing rate, adaptive wavelet thresholding, mel-frequency cepstral coefficients, Gaussian mixture model, support vector machine
The article deals with the issue of voice identification of a person under the influence of interference in the communication channel of information and telecommunication networks. Such identification is subject to all kinds of hardware distortions and interference due to the peculiarities of equipment and devices for recording, processing and storing information, and it should also be noted that external acoustic noise inevitably superimposes on the speech signal, which can significantly distort individual informative characteristics. For this reason, identification systems that have demonstrated fairly high efficiency in laboratory conditions may show much lower reliability when analyzing speech information with external noise. Finally, in a number of tasks, identification has to be performed in very difficult conditions of overlapping voices of several speakers, in particular, with similar acoustic characteristics. It should be noted that there has been virtually no research on voice identification capabilities for this most difficult case. In view of this, the main objective of the study is to analyze the effectiveness of a voice identification system based on MFCC and GMM-SVM under the influence of interference in the communication channel of information and telecommunication networks, which will allow us to quantify the threshold values of noise power under the influence of which the identification of a person will be correct and at which it will be false. The proposed voice identification system is implemented using the following technologies: 1) selection of active speech areas with finding the values of the change in short-term energy and the number of zero crossings between adjacent frames of the speech signal; 2) adaptive wavelet filtering of the speech signal to solve the problem of noise removal, where it is necessary to conduct adaptive generation of micro-local thresholds, which will reduce the effect of additive noise on the pure form of the speech signal; 3) identification of recognition features, where mel-frequency cepstral coefficients based on two key concepts - cepstrum and mel-scale are used as informative features of speech signal recognition in automatic voice identification; 4) classification of speech signal recognition features based on mixtures of Gaussian distributions and the support vector method using the linear Campbell kernel and the principal component method with a projection on latent structures, which in total will increase the reliability of identification, which is manifested in the reduction of errors of the 1st and 2nd kind. A methodology is proposed that allows classifying speech signals with noise by mathematical modeling of distortions through the application of a resampling algorithm based on the use of a discrete Fourier transform and allowing to increase the sampling rate by a given integer or fractional number of times, where the nonlinear distortion coefficient is used as a value that quantitatively characterizes the distortion, which is introduced as the ratio of the root mean square sum of the spectral components of the output speech signal to the root mean square sum of the spectral components of the input speech signal. Mathematical modeling of speech signal distortions made it possible to quantify the magnitude of these distortions, which can be used for correct identification of a person. This shows that the proposed approach to assessing the effects of distortion can be used to analyze the reliability of voice identification methods.
S. Kinkiri and S. Keates, “Speaker Identification: Variations of a Human voice,” 2020 International Conference on Advances in Computing and Communication Engineering (ICACCE), Las Vegas, NV, USA, 2020, pp. 1-4, doi: 10.1109/ICACCE49060.2020.9154998.
M. Saleh and I. Jouny, “Multimodal Person Identification through the Fusion of Face and Voice Biometrics,” 2022 17th Annual System of Systems Engineering Conference (SOSE), Rochester, NY, USA, 2022, pp. 164-169, doi: 10.1109/SOSE55472.2022.9812670.
J. Gomes, H. Fernandes, S. Abraham and S. Chavan, “Person identification based on voice recognition,” 2021 4th Biennial International Conference on Nascent Technologies in Engineering (ICNTE), NaviMumbai, India, 2021, pp. 1-5, doi: 10.1109/ICNTE51185.2021.9487756.
O. Tymchenko, B. Havrysh, O. O. Tymchenko, O. Khamula, B. Kovalskyi and K. Havrysh, “Person Voice Recognition Methods,” 2020 IEEE Third International Conference on Data Stream Mining & Processing (DSMP), Lviv, Ukraine, 2020, pp. 287-290, doi: 10.1109/DSMP47368.2020.9204023.
V. UmaRani, M. P, S. M and S. Nischitha, “A Hybrid Mel Frequency Cepstral Coefficients and Bayesian Gaussian Mixure Model for Voice based Authentication Websites,” 2023 International Conference on Device Intelligence, Computing and Communication Technologies, (DICCT), Dehradun, India, 2023, pp. 367-370, doi: 10.1109/DICCT56244.2023.10110176.
Q. Chen, J. Li and Y. Li, “Forensic identification for electronic disguised voice based on supervector and statistical analysis,” 2016 Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA), Bali, Indonesia, 2016, pp. 147-150, doi: 10.1109/ICSDA.2016.7919001.
M. Nalini, R. Gayathiri, A. V, A. L. G and H. D, “Automatic Optimized Voice Based Gender Identification for Speech Recognition,” 2022 International Conference on Power, Energy, Control and Transmission Systems (ICPECTS), Chennai, India, 2022, pp. 1-4, doi: 10.1109/ICPECTS56089.2022.10047573.
M. Aliaskar, T. Mazakov, A. Mazakova, S. Jomartova and T. Shormanov, “Human voice identification based on the detection of fundamental harmonics,” 2022 IEEE 7th International Energy Conference (ENERGYCON), Riga, Latvia, 2022, pp. 1-4, doi: 10.1109/ENERGYCON53164.2022.9830471.
B. A. Alsaify, H. S. Abu Arja, B. Y. Maayah, M. M. Al-Taweel, R. Alazrai and M. I. Daoud, “Voice-Based Human Identification using Machine Learning,” 2022 13th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 2022, pp. 205-208, doi: 10.1109/ICICS55353.2022.9811154.
O. Lavrynenko, G. Konakhovych and D. Bakhtiiarov, “Method of voice control functions of the UAV,” 2016 4th International Conference on Methods and Systems of Navigation and Motion Control (MSNMC), 2016, pp. 47-50, doi: 10.1109/MSNMC.2016.7783103.
O. Veselska, O. Lavrynenko, R. Odarchenko, M. Zaliskyi, D. Bakhtiiarov, M. Karpinski and S. Rajba, “A Wavelet-based steganographic method for text hiding in an audio signal,” Sensors, vol. 22, no. 15, pp. 1-25, doi: 10.3390/s22155832.
R. Odarchenko, O. Lavrynenko, D. Bakhtiiarov, S. Dorozhynskyi and V. A. O. Zharova, “Empirical Wavelet Transform in Speech Signal Compression Problems,” 2021 IEEE 8th International Conference on Problems of Infocommunications, Science and Technology (PIC S&T), 2021, pp. 599-602, doi: 10.1109/PICST54195.2021.9772156.
O. Lavrynenko, R. Odarchenko, G. Konakhovych, A. Taranenko, D. Bakhtiiarov and T. Dyka, “Method of Semantic Coding of Speech Signals based on Empirical Wavelet Transform,” 2021 IEEE 4th International Conference on Advanced Information and Communication Technologies (AICT), 2021, pp. 18-22, doi: 10.1109/AICT52120.2021.9628985.
O. Lavrynenko, A. Taranenko, I. Machalin, Y. Gabrousenko, I. Terentyeva and D. Bakhtiiarov, “Protected Voice Control System of UAV,” 2019 IEEE 5th International Conference Actual Problems of Unmanned Aerial Vehicles Developments (APUAVD), 2019, pp. 295-298, doi: 10.1109/APUAVD47061.2019.8943926.