Security monitoring via sound analysis and voice identification with artificial intelligence
Abstract
The article demonstrates the possibility of monitoring user access through authentication based on voice profiles using the means of Artificial Intelligence. A two-stage approach is proposed for sound analysis and voice recognition using Feed-Forward Neural Networks (FFNNs) and Cascade-Forward Neural Networks (CFNNs). Seven test voice profiles were pre-processed to extract quantitative sound features. The procedure involves registration of a set of sound parameters concerning three categories, respectively, for all audio and acoustic measurements in the entire sound spectrum, measurements up to and above 100 dBA. The neural architectures were trained with Scaled Gradient Descent (SCG) and Levenberg Marquardt (LM) algorithms, using different transfer functions in the output structural layers. In the initial phase of neural training, the entire sound spectrum of registered indicators was used, and high levels of Accuracy around 90.0% were reached. Subsequently, steps were taken to reduce the informative features when searching for similar levels of accuracy in order to limit the necessary computational procedures in neural training, but maintain the threshold of successful user authentication. In the analysis of neural performance, in addition to accuracy, additional criteria were used, namely Mean-Squared Error (MSE) and Root Mean Squared Error (RMSE). About the achieved and analyzed results, a synthesis was conducted of a set of four informative features with the highest significance, respectively LAE (A-weighted, sound exposure level), Laeq (A-weighted, equivalent sound level), LAF (A-weighted, fast time-constant, sound level) and LAS (A-weighted, slow time constant response, sound level). In the course of subsequent neural training processes, unsuitability was found when using the Log-sigmoid activation type with greatly underestimated accuracy readings and errors below 58.0% and above thresholds of 0.2300 and 0.4800. Positive performance indicators of voice recognition were achieved with Softmax and Hyperbolic tangent sigmoid activations in SCG and LM training procedures in levels of accuracy of 98.7 % and 96.1 % at FFNN models. Successful correct recognition of the test voice profiles on access and security personalization with a quantitative equivalent of 100.0 % accuracy was achieved in the Linear transfer function for Cascade-Forward Neural Networks. The proposed method and the synthesized neural models in the research can be used as units and modules in access control systems with biometric diagnostics and intelligent recognition of employees in company departments to electronically store classified information and physical access control.
Downloads
References
2. Trivedi, P. (2014). Introduction to Various Algorithms of Speech Recognition: Hidden Markov Model, Dynamic Time Warping and Artificial Neural Network. International Journal of Engineering Development and Research, 2(4), 3590-3596.
3. Ivanko, D.; Ryumin, D. (2021). Development of Visual and Audio Speech Recognition Systems Using Deep Neural Networks. The Thirty-First International Conference on Computer Graphics and Vision, 2021(1), 1-12.
4. Dudhejia, H.; Shah, S. (2018). Speech Recognition Using Neural Networks. International Journal of Engineering Research & Technology, 7(10), 196-202.
5. Sridhar, C.; Kanhe, A. (2023). Performance Comparison of Various Neural networks for Speech Recognition. Journal of Physics: NCOCS, 2023(2466), 1-9.
6. Yu, D., Deng, L. (2015). Automatic Speech Recognition: A Deep Learning Approach. Springer, 2015(1), 1-315.