next up previous
Next: Projection error measure Up: Speech Enhancement Summaries Previous: Maximum-Likelihood spectral amplitude estimator [6]

Minimum-controlled recursive averaging noise estimation

The MCRA approach to noise estimation [2] combines two existing approaches, namely noise estimation by minimum statistics and estimation by using a recursive approach. Contrary to the minimum statistics approach where the spectral estimates are obtained by finding the minima of the smoothed spectrum, with MCRA the minima are only used to estimate the probability of speech presence in each band. This probability is then used to control the adaptivity rule.

  1. Smoothing step
    S(k,l): magnitude squared of noisy signal STFT smoothed in time and frequency
    k=frequency index
    l=time index
  2. Minimum step
    Initialise $S_{min}(k,0)=S(k,0)$ and $S_{tmp}(k,0)=S(k,0)$
    if(within current window)

    \begin{displaymath}
\begin{array}{lll}
S_{min}(k,l) & = & min\{S_{min}(k,l-1),...
...{tmp}(k,l) & = & min\{S_{tmp}(k,l-1),S(k,l)\}\\
\end{array}
\end{displaymath}

    else(W frames read)

    \begin{displaymath}
\begin{array}{lll}
S_{min}(k,l) & = & min\{S_{tmp}(k,l-1),S(k,l)\}\\
S_{tmp}(k,l) & = & S(k,l)\\
\end{array}
\end{displaymath}

    end
    Because of the use of the temporary variable, the local minimum is based on a window of at least L frames, but not more than 2L frames.
    The length of the window for minima-finding controls the bias upwards during ``continuous'' speech (a smaller W leads to a greater bias upwards with more frequent updates).
    It also controls the bias downwards when the noise level increase (a larger W leads to a greater bias downwards since updates don't occur frequently enough)
    The window length is normally chosen to be 0.5 - 1.5 s.
  3. Normalisation step
    $S_r(k,l)=S(k,l)/S_{min}(k,l)$
  4. Bayes decision rule
    Compare to $\delta$ (not sensitive to type and intensity of noise)
  5. Update conditional signal presence probability
    $p(k,l)=\alpha_p p(k,l-1) + (1-\alpha_p) (S_r>\delta)$ (to utilise the strong correlation of speech presence in consecutive frames)
  6. Recursive estimation

    \begin{displaymath}
\begin{array}{lll}
\hat{S}_w(k,l) & = & \hat{S}_w(k,l-1)p(...
...l) + (1-\tilde{\alpha}_w) \vert Z(k,l)\vert^2\\
\end{array}
\end{displaymath}

    where $\tilde{\alpha}_w(k,l)=\alpha_w + (1-\alpha_w)p(k,l)$

next up previous
Next: Projection error measure Up: Speech Enhancement Summaries Previous: Maximum-Likelihood spectral amplitude estimator [6]
Vinesh Bhunjun 2004-09-17