Adaptive feature fusion-based speaker recognition strategy for disguised speech speakers

Authors

  • Maolin Ma Criminal Investigation Police University of China, Shenyang, Liaoning, 110854 Author

DOI:

https://doi.org/10.65613/690697

Keywords:

Camouflaged speech; Speaker recognition; Adaptive feature fusion; Resonance peak; GFCC; Anti-spoofing detection

Abstract

Automatic speaker authentication systems face the threat of disguised speech attacks, especially those generated by speech conversion and speech synthesis techniques, which pose a greater risk to the system and require the development of efficient recognition strategies. This study proposes an adaptive feature fusion-based speaker recognition strategy for disguised speech, which improves the system's ability to detect disguised speech by combining resonance peaks and GFCC feature parameters. The method adopts the inverse spectral method to extract the resonance peak coefficients and combines the GFCC parameter extraction technique, fuses the two features by adaptive weighting, and finally uses a Gaussian mixture model to classify the authentic and fake speech. The experimental results show that in the evaluation set, the average t-DCF of the proposed fusion feature method is only 0.058, which is significantly better than that of the method using the resonance peak feature (0.131) and the GFCC feature (0.086) alone; in the white noise environment (SNR=20dB), the average equal error rate of the fusion feature method is 11.01%, which is 8.66% lower than that of using the resonance peak feature and GFCC feature alone features by 8.66% and 5.57%, respectively. It is shown that the proposed adaptive feature fusion strategy can effectively improve the performance of the camouflaged speech speaker recognition system, especially in noisy environments, which exhibits stronger robustness

Downloads

Published

2026-03-23

Issue

Section

Article