References

inform

Информатика

Informatics

1816-03012617-6963

UIIP NASB

inform-243

Research Article

ОБРАБОТКА СИГНАЛОВ, ИЗОБРАЖЕНИЙ, РЕЧИ, ТЕКСТА И РАСПОЗНАВАНИЕ ОБРАЗОВ

SIGNAL, IMAGE, SPEECH, TEXT PROCESSING AND PATTERN RECOGNITION

ПСИХОАКУСТИЧЕСКИ МОТИВИРОВАННОЕ ПОСТРОЕНИЕ СЛОВАРЯ ЧАСТОТНО-ВРЕМЕННЫХ ФУНКЦИЙ УНИВЕРСАЛЬНОГО МАСШТАБИРУЕМОГО АУДИОКОДЕРА НА ОСНОВЕ РАЗРЕЖЕННОЙ АППРОКСИМАЦИИ

PSYCHOACOUSTICALLY MOTIVATED TIME-FREQUENCY DICTIONARY BUILDING FOR UNIVERSAL SCALABLE AUDIOCODER BASED ON THE SPARSE APPROXIMATION

Герасимович

В. Ю.

Herasimovich

V. Y.

gerasimovich@bsuir.by

Петровский

Ал. А.

Petrovsky

Al. A.

alexey@petrovsky.eu

Белорусский государственный университет информатики и радиоэлектроникиBelarusian State University of Informatics and Radioelectronics

2017

12122017

04(56)89103

2017

Герасимович В.Ю., Петровский А.А.

Herasimovich V.Y., Petrovsky A.A.

Данная работа распространяется под лицензией Creative Commons Attribution 4.0.

This work is licensed under a Creative Commons Attribution 4.0 License.

https://inf.grid.by/jour/article/view/243

Рассматривается способ построения перцептуально-мотивированного словаря частотновременных функций на основе оптимизированного для фрейма входного сигнала пакетного дискретного вейвлет-преобразования и использования этого способа в универсальном масштабируемом аудиокодере реального времени. Показывается актуальность данной задачи, большое внимание уделяется психоакустическому моделированию. Описываются такие алгоритмы, как разреженная аппроксимация, перцептуальная адаптация дерева декомпозиции пакетного дискретного вейвлетпреобразования, а также схемы кодирования и декодирования входного сигнала. Приводятся результаты экспериментальных исследований разрабатываемого аудиокодера. Дается его сравнение с современными схемами сжатия звуковой информации, такими как Opus и Vorbis, на базе объективной оценки качества PEAQ – ODG.

The article studies the process of creating a perceptually-motivated dictionary of the timefrequency functions based on the wavelet packet transform optimized for the input signal frame and its utilization in the universal scalable real-time audiocoder. The article points out the importance of the topic, great attention is paid to the psychoacoustic modelling. It describes the following algorithms: sparse approximation, perceptual adaptation of the wavelet packet decomposition tree, input signal encoding/decoding schemes. The results of the experimental research of the developed coding algorithm and comparison with the modern coding schemes such as Opus and Vorbis based on the objective quality assessment PEAQ – ODG were also given.

References1

Mallat, S. Matching pursuit with time-frequency dictionaries / S. Mallat, Z. Zhang // IEEE Transactions on Signal Processing. – December, 1993. – Vol. 41, no. 12. – P. 3397–3415.

Petrovsky, Al. Hybrid signal decomposition based on instantaneous harmonic parameters and perceptually motivated wavelet packets for scalable audio coding / Al. Petrovsky, E. Azarov, A. Petrovsky // Elsevier, Signal Processing. Special «Issue Fourier Related Transforms for Non-Stationary Signals». – June 2011. – Vol. 91, iss. 6. – P. 1489–1504.

Ruiz Reyes, N. Adaptive signal modelling based on sparse approximations for scalable parametric audio coding / N. Ruiz-Reyes, P. Vera Candeas // IEEE Transactions on audio, speech and language processing. – 2010. – Vol. 18, iss. 3. – P. 447–460.

Chardon, G. Perceptual matching pursuit with Gabor dictionaries and Time-Frequency Masking / G. Chardon, T. Necciari, P. Balazs // ICASSP’2014. – Florence, Italy, 2014. – P. 3126–3130.

Ravelli, E. Union of MDCT bases for audio coding / E. Ravelli, G. Richard, L. Daudet // IEEE Transactions on audio, speech and language processing. – 2008. – Vol. 16, iss. 8. – P. 1361–1372.

Mallat, S.A. Wavelet Tour of Signal Processing. The Sparse Way; 3rd ed. / S.A. Mallat. – Burlington, MA : Academic Press, 2008. – 832 p.

Strang, H. Wavelets and Filter Banks / H. Strang, T. Nguyen. – Wellesley, MA : WellesleyCambridge Press, 1997. – 520 p.

Petrovsky, Al. Scalable parametric audio coder using sparse approximation with frame-toframe perceptually optimized wavelet packet based dictionary / Al. Petrovsky, V. Herasimovich, A. Petrovsky // AES 138th Convention. – Warsaw, Poland, 2015. – Paper 9264.

Анализаторы речевых и звуковых сигналов: методы, алгоритмы и практика (с MATLABпримерами) / под ред. А.А. Петровского. – Минск : Бестпринт, 2009. – 456 с.

Daubechies, I. Ten lectures on Wavelets / I. Daubechies. – Philadelphia, Pennsylvania : Society for industrial and applied mathematics, 1992. – 357 p.

Johnston, J.D. Transform coding of audio signals using perceptual noise criteria / J.D. Johnston // IEEE Journal on Selected Areas in Communications. – February 1988. – Vol. 6, iss. 2. – P. 314–323.

Петровский, Ал.А. Построение психоакустической модели в области вейвлеткоэффициентов для перцептуальной обработки звуковых и речевых сигналов / Ал.А. Петровский // Речевые технологии. – 2008. – № 4. – С. 61–71.

Painter, T. Perceptual Coding of Digital Audio / T. Painter, A. Spanias // Proceedings of the IEEE. – April 2000. – Vol. 88, iss. 4. – P. 451–515.

Umapathy, K. Audio signal processing using time-frequency approaches: coding, classification, fingerprinting, and watermarking / K. Umapathy, B. Ghoraani, S. Krishnan // EURASIP Journal on Advances in Signal Processing. – 2010. – Vol. 2010. – P. 1–28.

Goodwin, M. Atomic decompositions of audio signals / M. Goodwin, M. Vetterli // Proceedings of Workshop on Applications of Signal Processing to Audio and Acoustics. – New Paltz, NY, USA, 1997. – P. 1–4.

Петровский, Ал.А. Масштабируемые аудиоречевые кодеры на основе адаптивного частотно-временного анализа звуковых сигналов / Ал.А. Петровский, А.А. Петровский // Труды СПИИРАН. – 2017. – № 1(50). – С. 55–92.

Petrovsky, Al. Audio/speech coding using the matching pursuit with frame-based psychoacoustic optimized time-frequency dictionaries and its performance evaluation / Al. Petrovsky, V. Herasimovich, A. Petrovsky // Signal Processing: Algorithms, Architectures, Arrangement, and Applications (SPA). – Poznan, Poland, 2016. – P. 225–229.

Petrovsky, A. Real-time wavelet packet-based low bit rate audio coding on a dynamic reconfiguration system / A. Petrovsky, D. Krahe, A.A. Petrovsky // AES 114th Convention. – Amsterdam, 2003. – Paper 5778.

ITU-R Rec. BS.1387-1, Method for objective measurements of perceived audio quality, 2001.

High-quality, low-delay music coding in the Opus codec / J.-M. Valin [et al.] // AES 135th Convention. – NY, USA, 2013. – Paper 8942.

Voice coding with Opus / K. Vos [et al.] // AES 135th Convention. – NY, USA, 2013. – Paper 8941.

The authors declare that there are no conflicts of interest present.