Phase-Aware Single-Channel Speech Enhancement using Modulation-Domain Kalman Filtering

Researchers: Nikolaos Dionelis and Mike Brookes

Modulation-domain Kalman filtering, for noise suppression, can be performed using speech phase tracking.

It is often the case that speech signals are contaminated by unwanted background acoustic noise. The goal of a speech enhancer is to reduce, and ideally eliminate, this unwanted background acoustic noise without distorting the speech signal. In this project, statistical models of the modulation-domain characteristics of speech and noise signals are developed and used in a speech enhancement algorithm. The speech enhancement algorithm performs modulation-domain Kalman filtering, for noise suppression, in the spectral log-amplitude and phase domains.

 

The main original contribution of this project is speech phase tracking along with speech and noise log-spectra tracking.

The flowchart diagram of the proposed enhancement algorithm is:

Main Flowchart

The main part of the enhancement algorithm is the KF, which is time-varying and has a linear KF prediction step and a nonlinear KF update step. The nonlinear KF update step uses the complex-valued KF observation, exp(y) exp().

 

The speech enhancement algorithm performs modulation-domain Kalman filtering and computes the first two moments of the posterior distribution of the speech and noise spectral log-amplitudes:

KF prior and KF posterior – 1

KF prior and KF posterior – 2

KF prior and KF posterior – 3

KF prior and KF posterior – 4

The preceding four examples were for a SNR of 0 dB, a positive SNR, a positive SNR and a negative SNR, respectively.

 

The preceding images plotted the noise log-power, n, against the speech log-power, s. If we now define u=n-s and v=s+n, then the curvy triangle in the (u,v) rotated domain is:

KF Posterior – Curvy Triangle – 1

KF Posterior – Curvy Triangle – 2

The curvy triangle defines the KF observation constraint region.

 

Main Relevant Publication:

1) N. Dionelis and M. Brookes, “Phase-aware single-channel speech enhancement with modulation-domain Kalman filtering,” Accepted, IEEE Transactions on Audio, Speech and Language Processing, [Online]. Available: http://dx.doi.org/10.1109/TASLP.2018.2800525.

Main Relevant Websites:

1) http://dx.doi.org/10.1109/TASLP.2018.2800525

2) https://github.com/nd1511/Proof-Of-Concept

 

Other Relevant Websites:

1) https://www.commsp.ee.ic.ac.uk/~sap/speech-enhancement-using-modulation-domain-kalman-filtering/

2) https://www.commsp.ee.ic.ac.uk/~sap/people-nikolaos-dionelis/

 

Other Relevant Websites by Other Researchers:

1) https://dx.doi.org/10.1109/TASLP.2017.2786863

2) https://www.commsp.ee.ic.ac.uk/~sap/projects/speech-enhancement-in-modulation-domain/

 

Listening Examples:

According to our results, the proposed enhancement algorithm achieves its best results at the SNR when the PESQ of the noisy speech signal is approximately 2.0. The proposed enhancement algorithm achieves its best results when the SNR is about 15 or 20 dB, depending on the noise type.

The listening examples are in the following order: (1) Noisy Speech, (2) Enhanced Speech, (3) Clean Speech.

 

Example 1, SNR 20 dB, White Noise, Noisy-Enhanced-Clean:

 

Example 2, SNR 20 dB, White Noise, Noisy-Enhanced-Clean:

 

Example 3, SNR 20 dB, F16 Noise, Noisy-Enhanced-Clean:

 

Example 4, SNR 15 dB, White Noise, Noisy-Enhanced-Clean:

 

Example 5, SNR 15 dB, White Noise, Noisy-Enhanced-Clean:

 

Example 6, SNR 15 dB, F16 Noise, Noisy-Enhanced-Clean:

 

Example 7, SNR 15 dB, White Noise, Noisy-Enhanced-Clean:

 

Example 8, SNR 15 dB, F16 Noise, Noisy-Enhanced-Clean:

 

Example 9, SNR 15 dB, White Noise, Noisy-Enhanced-Clean:

 

Example 10, SNR 15 dB, Babble Noise, Noisy-Enhanced-Clean:

 

Example 11, SNR 15 dB, White Noise, Noisy-Enhanced-Clean:

 

Example 12, SNR 10 dB, F16 Noise, Noisy-Enhanced-Clean:

 

Example 13, SNR 10 dB, White Noise, Noisy-Enhanced-Clean:

 

Example 14, SNR 10 dB, F16 Noise, Noisy-Enhanced-Clean:

 

Example 15, SNR 10 dB, Babble Noise, Noisy-Enhanced-Clean:

 

Example 16, SNR 10 dB, Babble Noise, Noisy-Enhanced-Clean:

 

Example 17, SNR 10 dB, Babble Noise, Noisy-Enhanced-Clean:

 

Example 18, SNR 10 dB, White Noise, Noisy-Enhanced-Clean:

 

Example 19, SNR 10 dB, White Noise, Noisy-Enhanced-Clean:

 

Example 20, SNR 10 dB, White Noise, Noisy-Enhanced-Clean:

 

Example 21, SNR 10 dB, F16 Noise, Noisy-Enhanced-Clean:

 

Example 22, SNR 5 dB, F16 Noise, Noisy-Enhanced-Clean:

 

Example 23, SNR 5 dB, White Noise, Noisy-Enhanced-Clean:

 

Example 24, SNR 5 dB, Babble Noise, Noisy-Enhanced-Clean:

 

Example 25, SNR 5 dB, Babble Noise, Noisy-Enhanced-Clean:

 

Example 26, SNR 5 dB, White Noise, Noisy-Enhanced-Clean:

 

Example 27, SNR 0 dB, White Noise, Noisy-Enhanced-Clean:

 

Example 28, SNR 0 dB, F16 Noise, Noisy-Enhanced-Clean:

 

Example 29, SNR 0 dB, F16 Noise, Noisy-Enhanced-Clean:

 

Example 30, SNR 0 dB, Babble Noise, Noisy-Enhanced-Clean:

 

 

Comparison with the Log-MMSE:

For comparison purposes, we present some Log-MMSE listening examples.

The Log-MMSE listening examples are in the following order: (1) Noisy Speech, (2) Log-MMSE Enhanced Speech.

 

Log-MMSE Example 1, SNR 20 dB, White Noise, Noisy-Enhanced:

 

Log-MMSE Example 2, SNR 20 dB, White Noise, Noisy-Enhanced:

 

Log-MMSE Example 3, SNR 15 dB, White Noise, Noisy-Enhanced:

 

Log-MMSE Example 4, SNR 15 dB, White Noise, Noisy-Enhanced:

 

Log-MMSE Example 5, SNR 10 dB, White Noise, Noisy-Enhanced:

 

Log-MMSE Example 6, SNR 10 dB, White Noise, Noisy-Enhanced:

 

Log-MMSE Example 7, SNR 5 dB, White Noise, Noisy-Enhanced:

 

Log-MMSE Example 8, SNR 5 dB, White Noise, Noisy-Enhanced:

 

Log-MMSE Example 9, SNR 0 dB, White Noise, Noisy-Enhanced:

 

Log-MMSE Example 10, SNR 0 dB, White Noise, Noisy-Enhanced:

 

Log-MMSE Example 11, SNR 0 dB, White Noise, Noisy-Enhanced:

 

Log-MMSE Example 12, SNR 0 dB, Babble Noise, Noisy-Enhanced:

 

Log-MMSE Example 13, SNR 0 dB, Babble Noise, Noisy-Enhanced:

 

Log-MMSE Example 14, SNR 0 dB, F16 Noise, Noisy-Enhanced: