Speech resampling is a typical tempering behavior,which is often integrated into various speech forgeries,such as splicing,electronic disguising,quality faking and so on.By analyzing the principle of resampling,we found that,compared with natural speech,the inconsistency between the bandwidth of the resampled speech and its sampling ratio will be caused because the interpolation process in resampling is imperfect.Based on our observation,a new resampling detection algorithm based on the inconsistency of band energy is proposed.First,according to the sampling ratio of the suspected speech,a band-pass Butterworth filter is designed to filter out the residual signal.Then,the logarithmic ratio of band energy is calculated by the suspected speech and the filtered speech.Finally,with the logarithmic ratio,the resampled and original speech can be discriminated.The experimental results show that the proposed algorithm can effectively detect the resampling behavior under various conditions and is robust to MP3 compression.
Speech is easily leaked imperceptibly.When people use their phones,the personal voice assistant is constantly listening and waiting to be activated.Private content in speech may be maliciously extracted through automatic speech recognition(ASR)technology by some applications on phone devices.To guarantee that the recognized speech content is accurate,speech enhancement technology is used to denoise the input speech.Speech enhancement technology has developed rapidly along with deep neural networks(DNNs),but adversarial examples can cause DNNs to fail.Considering that the vulnerability of DNN can be used to protect the privacy in speech.In this work,we propose an adversarial method to degrade speech enhancement systems,which can prevent the malicious extraction of private information in speech.Experimental results show that the generated enhanced adversarial examples can be removed most content of the target speech or replaced with target speech content by speech enhancement.The word error rate(WER)between the enhanced original example and enhanced adversarial example recognition result can reach 89.0%.WER of target attack between enhanced adversarial example and target example is low at 33.75%.The adversarial perturbation in the adversarial example can bring much more change than itself.The rate of difference between two enhanced examples and adversarial perturbation can reach more than 1.4430.Meanwhile,the transferability between different speech enhancement models is also investigated.The low transferability of the method can be used to ensure the content in the adversarial example is not damaged,the useful information can be extracted by the friendly ASR.This work can prevent the malicious extraction of speech.