Voice activity detection in noisy conditions using tiny convolutional neural network

R. S. Vashkevich; E. S. Azarov

doi:10.37661/1816-0301-2020-17-2-36-43

Voice activity detection in noisy conditions using tiny convolutional neural network

R. S. Vashkevich, E. S. Azarov

https://doi.org/10.37661/1816-0301-2020-17-2-36-43

Full Text:

PDF (Rus)

Generate QR code

Abstract

The paper investigates the problem of voice activity detection from a noisy sound signal. An extremely compact convolutional neural network is proposed. The model has only 385 trainable parameters. Proposed model doesn’t require a lot of computational resources that allows to use it as part of the “internet of things” concept for compact low power devices. At the same time the model provides state of the art results in voice activity detection in terms of detection accuracy. The properties of the model are achieved by using a special convolutional layer that considers the harmonic structure of vocal speech. This layer also eliminates redundancy of the model because it has invariance to changes of fundamental frequency. The model performance is evaluated in various noise conditions with different signal-to-noise ratios. The results show that the proposed model provides higher accuracy compared to voice activity detection model from the WebRTC framework by Google.

Keywords

voice activity detector, harmonic signal, convolutional neural network, pitch, speech processing

About the Authors

R. S. Vashkevich

https://github.com/gvashkevich
Belarusian State University of Informatics and Radioelectronics
Беларусь

Ryhor S. Vashkevich, M. Sci. (Eng.), Postgraduate Student of the Department of EMU

Minsk

E. S. Azarov

Belarusian State University of Informatics and Radioelectronics
Беларусь

Elias S. Azarov, Dr. Sci. (Eng.), Associate Professor, Head of the Department of EMU

Minsk

References

1. Yoo I.-C., Lim H., Yook D. Formant-based robust voice activity detection. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2015, vol. 23, no. 12, рр. 2238–2245. https://doi.org/10.1109/TASLP.2015.2476762

2. Pang J. Spectrum energy based voice activity detection. The 7th IEEE Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, 9–11 January 2017. Las Vegas, 2017, pp. 1–5. https://doi.org/10.1109/CCWC.2017.7868454

3. Kinnunen T., Chernenko E., Tuononen M., Fränti P., Li H. Voice activity detection using MFCC features and support vector machine. The 12th International Conference on Speech and Computer (SPECOM07), Moscow, Russia, 15–18 October 2007. Moscow, 2007, vol. 2, pp. 556–561.

4. Zazo R., Sainath T. N., Simko G., Parada C. Feature learning with raw-waveform CLDNNs for voice activity detection. 17 th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, 8–12 September 2016. San Francisco, 2016, pp. 3668–3672. https://doi.org/10.21437/Interspeech.2016-268

5. Zhang X., Wu J. Denoising deep neural networks based voice activity detection. International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013. Vancouver, 2013, pp. 853–857. https://doi.org/10.1109/ICASSP.2013.6637769

6. Hughes T., Mierle K. Recurrent neural networks for voice activity detection. International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013. Vancouver, 2013, pp. 7378–7382. https://doi.org/10.1109/ICASSP.2013.6639096

7. Eyben F., Weninger F., Squartini S., Schuller B. Real-life voice activity detection with LSTM Recurrent Neural Networks and an application to Hollywood movies. International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013. Vancouver, 2013, pp. 483–487. https://doi.org/10.1109/ICASSP.2013.6637694

8. Wang Q., Du J., Bao X., Wang Z.-R., Dai L.-R., Lee C.-H. A universal VAD based on jointly trained deep neural networks. 16 th Annual Conference of the International Speech Communication Association, Dresden, Germany, 6–10 September 2015. Dresden, 2015, рр. 2282–2286.

9. Ryant N., Liberman M., Yuan J. Speech activity detection on youtube using deep neural networks. 14 th Annual Conference of the International Speech Communication Association, Lyon, France, 25–29 August 2013. Lyon, 2013, pp. 728–731.

10. Snyder D., Chen G., Povey D. Musan: a Music, Speech, and Noise Corpus, 2015. Available at: https://arxiv.org/abs/1510.08484 (accessed 20.10.2019).

11. Kasi K., Zahorian S. A. Yet another algorithm for pitch tracking. International Conference on Acoustics, Speech, and Signal Processing, Orlando, 13–17 May 2002. Orlando, 2002, vol. 1, рр. 361–364. https://doi.org/10.1109/ICASSP.2002.5743729

12. Kingma D. P., Ba J. Adam: a Method for Stochastic Optimization, 2014. Available at: https://arxiv.org/abs/1412.6980 (accessed 20.10.2019).

Review

For citations:

Vashkevich R.S., Azarov E.S. Voice activity detection in noisy conditions using tiny convolutional neural network. Informatics. 2020;17(2):36-43. (In Russ.) https://doi.org/10.37661/1816-0301-2020-17-2-36-43

JATS XML

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 1816-0301 (Print)
ISSN 2617-6963 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Informatics

Voice activity detection in noisy conditions using tiny convolutional neural network

Full Text:

Abstract

Keywords

About the Authors

References

Review

For citations:

Cookies policy