References

inform

Информатика

Informatics

1816-03012617-6963

UIIP NASB

inform-336

Research Article

ОБРАБОТКА СИГНАЛОВ, ИЗОБРАЖЕНИЙ, РЕЧИ, ТЕКСТА И РАСПОЗНАВАНИЕ ОБРАЗОВ

SIGNAL, IMAGE, SPEECH, TEXT PROCESSING AND PATTERN RECOGNITION

ПРИМЕНЕНИЕ МГНОВЕННОГО ГАРМОНИЧЕСКОГО АНАЛИЗА ДЛЯ АНТРОПОМОРФИЧЕСКОЙ ОБРАБОТКИ РЕЧЕВЫХ СИГНАЛОВ

Лихачев

Д. С.

Азаров

И. С.

Петровский

А. А.

Белорусский государственный университет информатики и радиоэлектроникиBelarus

2011

05042018

04(32)5970

2018

Лихачев Д.С., Азаров И.С., Петровский А.А.

Данная работа распространяется под лицензией Creative Commons Attribution 4.0.

This work is licensed under a Creative Commons Attribution 4.0 License.

https://inf.grid.by/jour/article/view/336

Рассматривается способ параметрического описания звукового сигнала, основанный на антропоморфической интерпретации его частотных составляющих. Для получения параметров модели предлагается использовать мгновенный гармонический анализ вместо дискретного преобразования Фурье. В работе оценивается точность полученного описания. Приводятся экспериментальные результаты, показывающие, что реконструкция сигнала в большой степени зависит от средств получения частотно-временного описания, причем предложенный способ обеспечивает более высокое качество реконструкции сигнала по сравнению с известными методами.

References1

Morgan, N. Does ASR have a PHD, or is it just piled higher and deeper? / N. Morgan [Electronic resource]. – Mode of access : http://superlectures.com/icassp2011/lecture.php?id=206&lang=en. – Date of access : 21.10.2011.

A Perceptual Model for Sinusoidal Audio Coding Based on Spectral Integration / S. van de Par [et. al.] // EURASIP Journal on Applied Signal Processing. – 2005. – Vol. 2005, № 9. – P. 1292–1304.

Ravindran, S. A Physiologically Inspired Method for Audio Classification / S. Ravindran,

K. Chlemmer, D.V. Anderson // EURASIP Journal on Applied Signal Processing. – 2005. –

Vol. 2005, № 9. – P. 1374–1381.

Feldbauer, C. Anthropomorphic Coding of Speech and Audio: A Model Inversion Approach / C. Feldbauer, G. Kubin, W.B. Kleijn // EURASIP Journal on Applied Signal Processing. – 2005. – Vol. 2005, № 9. – P. 1334–1349.

Ghitza, O. Auditory Models and Human Performance in Tasks Related to Speech Coding and Speech Recognition / O. Ghitza // IEEE Transactions on Speech and Audio Processing. – 1994. – Vol. 2, № 1. – P. 115–132.

Ivanov, A.V. Analysis of the IHC Adaptation for the Anthropomorphic Speech Processing

Systems / A.V. Ivanov, A.A. Petrovsky // EURASIP Journal on Applied Signal Processing. – 2005. – Vol. 2005, № 9. – P. 1323–1333.

Лихачев, Д.С. Анализ и синтез устройств кодирования речевого сигнала на основе ан

тропоморфической обработки и синусоидальных моделей / Д.С. Лихачев, А.А. Петровский // Доклады БГУИР. – 2006. – № 3 (15). – C. 35–43.

Слуховая система / Я.А. Альтман [и др.] ; под общ. ред. Я.А. Альтмана. – Л. : Наука,

– 620 с.

Likhachov, D.S. Improved auditory-based speech coding using psychoacoustic model based on a cochlear filter bank and an average localized synchrony detection / D.S. Likhachov, A.A. Petrovsky // Computer information systems and industrial management applications ; eds. K. Saeed, R. Mosdorf, Z. Sosnowski. – Poland : Bialystok, 2003. – P. 11–19.

Лихачев, Д.С. Компрессия речевого сигнала на основе синусоидальной модели с ан-

тропоморфической обработкой / Д.С. Лихачев, А.А. Петровский // Анализаторы речевых и звуковых сигналов: методы, алгоритмы и практика (с MATLAB-примерами) ; под ред. д.т.н. профессора А.А. Петровского. – Минск : Бестпринт, 2009. – С. 211–233.

Азаров, И.С. Вычисление мгновенных гармонических параметров речевого сигнала / И.С. Азаров, А.А. Петровский // Речевые технологии. – 2008. – № 1 (1). – C. 67–77

Ghitza, O. Adequacy of auditory models to predict internal human representation of speech sounds / O. Ghitza // J. Acoust. Soc. Am. – 1993. – Vol. 93, № 4. – P. 2160–2171.

An anthropomorphic speech processing based on the cochlear model and its application for coding task / A.A. Petrovsky [et al.] // International scientific journal of computing. – 2004. – Vol. 3, № 1. – P. 75–83.

Wan, W.G. A two-dimentional non-linear cochlear model for speech processing: response to pure tones / W.G. Wan, A.A. Petrovsky, C.X. Fan // 6th Intern. Fase-Congress. – Zurich, Switzerland, 1992. – P. 233–236.

Wan, W.G. A new solution for cochlear macromechanics / W.G. Wan, C.X. Fan // Acustica. –

– Vol. 75. – P. 79–82.

Greenwood, D.D. A cochlear frequency-position function for several species-29 years later / D.D. Greenwood // J. Acoust. Soc. Am. – 1990. – Vol. 87, № 6. – P. 2592–2605.

Petrovsky, A.A. A digital cochlear model as a base of anthropomorphic speech processing / A.A. Petrovsky, D.S. Likhachov // Neural networks and artificial intelligence : proc. of the 3d Intern. Conf., Belarus, Minsk, November 12–14, 2003. – Minsk, 2003. – P. 126–131.

Лихачев, Д.С. Антропоморфический анализ на основе дискретного преобразования

Фурье с неравномерной частотной шкалой / Д.С. Лихачев // Известия Белорусской инженерной академии. – 2005. – № 1 (19)/2. – С. 177–180.

McAulay, R.J. Low-rate speech coding based on the sinusoidal model / R.J. McAulay,

T.F. Quatieri // Advances in Speech Signal Processing ; eds. S. Furui, M.M. Sondhi. – N.Y. : Marcel Dekker, 1992. – P. 165–208.

McAulay, R.J. Speech analysis/synthesis based on a sinusoidal representation / R.J. McAulay, T.F. Quatieri // IEEE Trans. on Acoust., Speech and Signal Processing. – 1986. – Vol. ASSP-34. – P. 744–754.

Азаров, И.С. Непрерывное и дискретное гармонические преобразования для декомпозиции речевого сигнала на периодическую и шумовую компоненты / И.С. Азаров, А.А. Петровский // Доклады БГУИР. – 2008. – № 4 (34). – C. 92–105.

Petrovsky, A. Combining advanced sinusoidal and waveform matching models for parametric audio/speech coding / A. Petrovsky, E. Azarov, A. Petrovsky // EUSIPCO 2009 : proc. of the 17th European Signal Processing Conf. – Glasgow, 2009. – P. 436–440.

ITU-T Recommendation P.862, PESQ an objective method for end-to-end speech quality

assessment of narrowband telephone networks and speech codecs, February 2001.

Yang, W. Enhanced Modified Bark Spectral Distortion (EMBSD): an Objective Speech

Quality Measure Based on Audible Distortion and Cognition Model (PhD Thesis) / W. Yang [Electronic resource]. – Mode of access : http://www.temple.edu/speech_lab/wonhos_dissertation.pdf. –

Date of access : 21.10.2011.

The authors declare that there are no conflicts of interest present.