<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "JATS-journalpublishing1-3.dtd">
<article article-type="research-article" dtd-version="1.3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="ru"><front><journal-meta><journal-id journal-id-type="publisher-id">inform</journal-id><journal-title-group><journal-title xml:lang="ru">Информатика</journal-title><trans-title-group xml:lang="en"><trans-title>Informatics</trans-title></trans-title-group></journal-title-group><issn pub-type="ppub">1816-0301</issn><issn pub-type="epub">2617-6963</issn><publisher><publisher-name>UIIP NASB</publisher-name></publisher></journal-meta><article-meta><article-id custom-type="elpub" pub-id-type="custom">inform-243</article-id><article-categories><subj-group subj-group-type="heading"><subject>Research Article</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="ru"><subject>ОБРАБОТКА СИГНАЛОВ, ИЗОБРАЖЕНИЙ, РЕЧИ, ТЕКСТА И РАСПОЗНАВАНИЕ ОБРАЗОВ</subject></subj-group><subj-group subj-group-type="section-heading" xml:lang="en"><subject>SIGNAL, IMAGE, SPEECH, TEXT PROCESSING AND PATTERN RECOGNITION</subject></subj-group></article-categories><title-group><article-title>ПСИХОАКУСТИЧЕСКИ МОТИВИРОВАННОЕ ПОСТРОЕНИЕ СЛОВАРЯ ЧАСТОТНО-ВРЕМЕННЫХ ФУНКЦИЙ УНИВЕРСАЛЬНОГО МАСШТАБИРУЕМОГО АУДИОКОДЕРА НА ОСНОВЕ РАЗРЕЖЕННОЙ АППРОКСИМАЦИИ</article-title><trans-title-group xml:lang="en"><trans-title>PSYCHOACOUSTICALLY MOTIVATED TIME-FREQUENCY DICTIONARY BUILDING FOR UNIVERSAL SCALABLE AUDIOCODER BASED ON THE SPARSE APPROXIMATION</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Герасимович</surname><given-names>В. Ю.</given-names></name><name name-style="western" xml:lang="en"><surname>Herasimovich</surname><given-names>V. Y.</given-names></name></name-alternatives><bio xml:lang="ru"/><email xlink:type="simple">gerasimovich@bsuir.by</email><xref ref-type="aff" rid="aff-1"/></contrib><contrib contrib-type="author" corresp="yes"><name-alternatives><name name-style="eastern" xml:lang="ru"><surname>Петровский</surname><given-names>Ал. А.</given-names></name><name name-style="western" xml:lang="en"><surname>Petrovsky</surname><given-names>Al. A.</given-names></name></name-alternatives><bio xml:lang="ru"/><email xlink:type="simple">alexey@petrovsky.eu</email><xref ref-type="aff" rid="aff-1"/></contrib></contrib-group><aff-alternatives id="aff-1"><aff xml:lang="ru"><institution>Белорусский государственный университет информатики и радиоэлектроники</institution></aff><aff xml:lang="en"><institution>Belarusian State University of Informatics and Radioelectronics</institution></aff></aff-alternatives><pub-date pub-type="collection"><year>2017</year></pub-date><pub-date pub-type="epub"><day>12</day><month>12</month><year>2017</year></pub-date><volume>0</volume><issue>4(56)</issue><fpage>89</fpage><lpage>103</lpage><permissions><copyright-statement>Copyright &amp;#x00A9; Герасимович В.Ю., Петровский А.А., 2017</copyright-statement><copyright-year>2017</copyright-year><copyright-holder xml:lang="ru">Герасимович В.Ю., Петровский А.А.</copyright-holder><copyright-holder xml:lang="en">Herasimovich V.Y., Petrovsky A.A.</copyright-holder><license xml:lang="ru" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>Данная работа распространяется под лицензией Creative Commons Attribution 4.0.</license-p></license><license xml:lang="en" license-type="creative-commons-attribution" xlink:href="https://creativecommons.org/licenses/by/4.0/" xlink:type="simple"><license-p>This work is licensed under a Creative Commons Attribution 4.0 License.</license-p></license></permissions><self-uri xlink:href="https://inf.grid.by/jour/article/view/243">https://inf.grid.by/jour/article/view/243</self-uri><abstract><p> Рассматривается способ построения перцептуально-мотивированного словаря частотновременных функций на основе оптимизированного для фрейма входного сигнала пакетного дискретного вейвлет-преобразования и использования этого способа в универсальном масштабируемом аудиокодере реального времени. Показывается актуальность данной задачи, большое внимание уделяется психоакустическому моделированию. Описываются такие алгоритмы, как разреженная аппроксимация, перцептуальная адаптация дерева декомпозиции пакетного дискретного вейвлетпреобразования, а также схемы кодирования и декодирования входного сигнала. Приводятся результаты экспериментальных исследований разрабатываемого аудиокодера. Дается его сравнение с современными схемами сжатия звуковой информации, такими как Opus и Vorbis, на базе объективной оценки качества PEAQ – ODG.</p></abstract><trans-abstract xml:lang="en"><p>The article studies the process of creating a perceptually-motivated dictionary of the timefrequency functions based on the wavelet packet transform optimized for the input signal frame and its utilization in the universal scalable real-time audiocoder. The article points out the importance of the topic, great attention is paid to the psychoacoustic modelling. It describes the following algorithms: sparse approximation, perceptual adaptation of the wavelet packet decomposition tree, input signal encoding/decoding schemes. The results of the experimental research of the developed coding algorithm and comparison with the modern coding schemes such as Opus and Vorbis based on the objective quality assessment PEAQ – ODG were also given.</p><p> </p></trans-abstract></article-meta></front><back><ref-list><title>References</title><ref id="cit1"><label>1</label><citation-alternatives><mixed-citation xml:lang="ru">Mallat, S. Matching pursuit with time-frequency dictionaries / S. Mallat, Z. Zhang // IEEE Transactions on Signal Processing. – December, 1993. – Vol. 41, no. 12. – P. 3397–3415.</mixed-citation><mixed-citation xml:lang="en">Mallat, S. Matching pursuit with time-frequency dictionaries / S. Mallat, Z. Zhang // IEEE Transactions on Signal Processing. – December, 1993. – Vol. 41, no. 12. – P. 3397–3415.</mixed-citation></citation-alternatives></ref><ref id="cit2"><label>2</label><citation-alternatives><mixed-citation xml:lang="ru">Petrovsky, Al. Hybrid signal decomposition based on instantaneous harmonic parameters and perceptually motivated wavelet packets for scalable audio coding / Al. Petrovsky, E. Azarov, A. Petrovsky // Elsevier, Signal Processing. Special «Issue Fourier Related Transforms for Non-Stationary Signals». – June 2011. – Vol. 91, iss. 6. – P. 1489–1504.</mixed-citation><mixed-citation xml:lang="en">Petrovsky, Al. Hybrid signal decomposition based on instantaneous harmonic parameters and perceptually motivated wavelet packets for scalable audio coding / Al. Petrovsky, E. Azarov, A. Petrovsky // Elsevier, Signal Processing. Special «Issue Fourier Related Transforms for Non-Stationary Signals». – June 2011. – Vol. 91, iss. 6. – P. 1489–1504.</mixed-citation></citation-alternatives></ref><ref id="cit3"><label>3</label><citation-alternatives><mixed-citation xml:lang="ru">Ruiz Reyes, N. Adaptive signal modelling based on sparse approximations for scalable parametric audio coding / N. Ruiz-Reyes, P. Vera Candeas // IEEE Transactions on audio, speech and language processing. – 2010. – Vol. 18, iss. 3. – P. 447–460.</mixed-citation><mixed-citation xml:lang="en">Ruiz Reyes, N. Adaptive signal modelling based on sparse approximations for scalable parametric audio coding / N. Ruiz-Reyes, P. Vera Candeas // IEEE Transactions on audio, speech and language processing. – 2010. – Vol. 18, iss. 3. – P. 447–460.</mixed-citation></citation-alternatives></ref><ref id="cit4"><label>4</label><citation-alternatives><mixed-citation xml:lang="ru">Chardon, G. Perceptual matching pursuit with Gabor dictionaries and Time-Frequency Masking / G. Chardon, T. Necciari, P. Balazs // ICASSP’2014. – Florence, Italy, 2014. – P. 3126–3130.</mixed-citation><mixed-citation xml:lang="en">Chardon, G. Perceptual matching pursuit with Gabor dictionaries and Time-Frequency Masking / G. Chardon, T. Necciari, P. Balazs // ICASSP’2014. – Florence, Italy, 2014. – P. 3126–3130.</mixed-citation></citation-alternatives></ref><ref id="cit5"><label>5</label><citation-alternatives><mixed-citation xml:lang="ru">Ravelli, E. Union of MDCT bases for audio coding / E. Ravelli, G. Richard, L. Daudet // IEEE Transactions on audio, speech and language processing. – 2008. – Vol. 16, iss. 8. – P. 1361–1372.</mixed-citation><mixed-citation xml:lang="en">Ravelli, E. Union of MDCT bases for audio coding / E. Ravelli, G. Richard, L. Daudet // IEEE Transactions on audio, speech and language processing. – 2008. – Vol. 16, iss. 8. – P. 1361–1372.</mixed-citation></citation-alternatives></ref><ref id="cit6"><label>6</label><citation-alternatives><mixed-citation xml:lang="ru">Mallat, S.A. Wavelet Tour of Signal Processing. The Sparse Way; 3rd ed. / S.A. Mallat. – Burlington, MA : Academic Press, 2008. – 832 p.</mixed-citation><mixed-citation xml:lang="en">Mallat, S.A. Wavelet Tour of Signal Processing. The Sparse Way; 3rd ed. / S.A. Mallat. – Burlington, MA : Academic Press, 2008. – 832 p.</mixed-citation></citation-alternatives></ref><ref id="cit7"><label>7</label><citation-alternatives><mixed-citation xml:lang="ru">Strang, H. Wavelets and Filter Banks / H. Strang, T. Nguyen. – Wellesley, MA : WellesleyCambridge Press, 1997. – 520 p.</mixed-citation><mixed-citation xml:lang="en">Strang, H. Wavelets and Filter Banks / H. Strang, T. Nguyen. – Wellesley, MA : WellesleyCambridge Press, 1997. – 520 p.</mixed-citation></citation-alternatives></ref><ref id="cit8"><label>8</label><citation-alternatives><mixed-citation xml:lang="ru">Petrovsky, Al. Scalable parametric audio coder using sparse approximation with frame-toframe perceptually optimized wavelet packet based dictionary / Al. Petrovsky, V. Herasimovich, A. Petrovsky // AES 138th Convention. – Warsaw, Poland, 2015. – Paper 9264.</mixed-citation><mixed-citation xml:lang="en">Petrovsky, Al. Scalable parametric audio coder using sparse approximation with frame-toframe perceptually optimized wavelet packet based dictionary / Al. Petrovsky, V. Herasimovich, A. Petrovsky // AES 138th Convention. – Warsaw, Poland, 2015. – Paper 9264.</mixed-citation></citation-alternatives></ref><ref id="cit9"><label>9</label><citation-alternatives><mixed-citation xml:lang="ru">Анализаторы речевых и звуковых сигналов: методы, алгоритмы и практика (с MATLABпримерами) / под ред. А.А. Петровского. – Минск : Бестпринт, 2009. – 456 с.</mixed-citation><mixed-citation xml:lang="en">Анализаторы речевых и звуковых сигналов: методы, алгоритмы и практика (с MATLABпримерами) / под ред. А.А. Петровского. – Минск : Бестпринт, 2009. – 456 с.</mixed-citation></citation-alternatives></ref><ref id="cit10"><label>10</label><citation-alternatives><mixed-citation xml:lang="ru">Daubechies, I. Ten lectures on Wavelets / I. Daubechies. – Philadelphia, Pennsylvania : Society for industrial and applied mathematics, 1992. – 357 p.</mixed-citation><mixed-citation xml:lang="en">Daubechies, I. Ten lectures on Wavelets / I. Daubechies. – Philadelphia, Pennsylvania : Society for industrial and applied mathematics, 1992. – 357 p.</mixed-citation></citation-alternatives></ref><ref id="cit11"><label>11</label><citation-alternatives><mixed-citation xml:lang="ru">Johnston, J.D. Transform coding of audio signals using perceptual noise criteria / J.D. Johnston // IEEE Journal on Selected Areas in Communications. – February 1988. – Vol. 6, iss. 2. – P. 314–323.</mixed-citation><mixed-citation xml:lang="en">Johnston, J.D. Transform coding of audio signals using perceptual noise criteria / J.D. Johnston // IEEE Journal on Selected Areas in Communications. – February 1988. – Vol. 6, iss. 2. – P. 314–323.</mixed-citation></citation-alternatives></ref><ref id="cit12"><label>12</label><citation-alternatives><mixed-citation xml:lang="ru">Петровский, Ал.А. Построение психоакустической модели в области вейвлеткоэффициентов для перцептуальной обработки звуковых и речевых сигналов / Ал.А. Петровский // Речевые технологии. – 2008. – № 4. – С. 61–71.</mixed-citation><mixed-citation xml:lang="en">Петровский, Ал.А. Построение психоакустической модели в области вейвлеткоэффициентов для перцептуальной обработки звуковых и речевых сигналов / Ал.А. Петровский // Речевые технологии. – 2008. – № 4. – С. 61–71.</mixed-citation></citation-alternatives></ref><ref id="cit13"><label>13</label><citation-alternatives><mixed-citation xml:lang="ru">Painter, T. Perceptual Coding of Digital Audio / T. Painter, A. Spanias // Proceedings of the IEEE. – April 2000. – Vol. 88, iss. 4. – P. 451–515.</mixed-citation><mixed-citation xml:lang="en">Painter, T. Perceptual Coding of Digital Audio / T. Painter, A. Spanias // Proceedings of the IEEE. – April 2000. – Vol. 88, iss. 4. – P. 451–515.</mixed-citation></citation-alternatives></ref><ref id="cit14"><label>14</label><citation-alternatives><mixed-citation xml:lang="ru">Umapathy, K. Audio signal processing using time-frequency approaches: coding, classification, fingerprinting, and watermarking / K. Umapathy, B. Ghoraani, S. Krishnan // EURASIP Journal on Advances in Signal Processing. – 2010. – Vol. 2010. – P. 1–28.</mixed-citation><mixed-citation xml:lang="en">Umapathy, K. Audio signal processing using time-frequency approaches: coding, classification, fingerprinting, and watermarking / K. Umapathy, B. Ghoraani, S. Krishnan // EURASIP Journal on Advances in Signal Processing. – 2010. – Vol. 2010. – P. 1–28.</mixed-citation></citation-alternatives></ref><ref id="cit15"><label>15</label><citation-alternatives><mixed-citation xml:lang="ru">Goodwin, M. Atomic decompositions of audio signals / M. Goodwin, M. Vetterli // Proceedings of Workshop on Applications of Signal Processing to Audio and Acoustics. – New Paltz, NY, USA, 1997. – P. 1–4.</mixed-citation><mixed-citation xml:lang="en">Goodwin, M. Atomic decompositions of audio signals / M. Goodwin, M. Vetterli // Proceedings of Workshop on Applications of Signal Processing to Audio and Acoustics. – New Paltz, NY, USA, 1997. – P. 1–4.</mixed-citation></citation-alternatives></ref><ref id="cit16"><label>16</label><citation-alternatives><mixed-citation xml:lang="ru">Петровский, Ал.А. Масштабируемые аудиоречевые кодеры на основе адаптивного частотно-временного анализа звуковых сигналов / Ал.А. Петровский, А.А. Петровский // Труды СПИИРАН. – 2017. – № 1(50). – С. 55–92.</mixed-citation><mixed-citation xml:lang="en">Петровский, Ал.А. Масштабируемые аудиоречевые кодеры на основе адаптивного частотно-временного анализа звуковых сигналов / Ал.А. Петровский, А.А. Петровский // Труды СПИИРАН. – 2017. – № 1(50). – С. 55–92.</mixed-citation></citation-alternatives></ref><ref id="cit17"><label>17</label><citation-alternatives><mixed-citation xml:lang="ru">Petrovsky, Al. Audio/speech coding using the matching pursuit with frame-based psychoacoustic optimized time-frequency dictionaries and its performance evaluation / Al. Petrovsky, V. Herasimovich, A. Petrovsky // Signal Processing: Algorithms, Architectures, Arrangement, and Applications (SPA). – Poznan, Poland, 2016. – P. 225–229.</mixed-citation><mixed-citation xml:lang="en">Petrovsky, Al. Audio/speech coding using the matching pursuit with frame-based psychoacoustic optimized time-frequency dictionaries and its performance evaluation / Al. Petrovsky, V. Herasimovich, A. Petrovsky // Signal Processing: Algorithms, Architectures, Arrangement, and Applications (SPA). – Poznan, Poland, 2016. – P. 225–229.</mixed-citation></citation-alternatives></ref><ref id="cit18"><label>18</label><citation-alternatives><mixed-citation xml:lang="ru">Petrovsky, A. Real-time wavelet packet-based low bit rate audio coding on a dynamic reconfiguration system / A. Petrovsky, D. Krahe, A.A. Petrovsky // AES 114th Convention. – Amsterdam, 2003. – Paper 5778.</mixed-citation><mixed-citation xml:lang="en">Petrovsky, A. Real-time wavelet packet-based low bit rate audio coding on a dynamic reconfiguration system / A. Petrovsky, D. Krahe, A.A. Petrovsky // AES 114th Convention. – Amsterdam, 2003. – Paper 5778.</mixed-citation></citation-alternatives></ref><ref id="cit19"><label>19</label><citation-alternatives><mixed-citation xml:lang="ru">ITU-R Rec. BS.1387-1, Method for objective measurements of perceived audio quality, 2001.</mixed-citation><mixed-citation xml:lang="en">ITU-R Rec. BS.1387-1, Method for objective measurements of perceived audio quality, 2001.</mixed-citation></citation-alternatives></ref><ref id="cit20"><label>20</label><citation-alternatives><mixed-citation xml:lang="ru">High-quality, low-delay music coding in the Opus codec / J.-M. Valin [et al.] // AES 135th Convention. – NY, USA, 2013. – Paper 8942.</mixed-citation><mixed-citation xml:lang="en">High-quality, low-delay music coding in the Opus codec / J.-M. Valin [et al.] // AES 135th Convention. – NY, USA, 2013. – Paper 8942.</mixed-citation></citation-alternatives></ref><ref id="cit21"><label>21</label><citation-alternatives><mixed-citation xml:lang="ru">Voice coding with Opus / K. Vos [et al.] // AES 135th Convention. – NY, USA, 2013. – Paper 8941.</mixed-citation><mixed-citation xml:lang="en">Voice coding with Opus / K. Vos [et al.] // AES 135th Convention. – NY, USA, 2013. – Paper 8941.</mixed-citation></citation-alternatives></ref></ref-list><fn-group><fn fn-type="conflict"><p>The authors declare that there are no conflicts of interest present.</p></fn></fn-group></back></article>
