RPE Regular Pulse Excitation

Last updated on Jun 16, 2023

RPE, which stands for Regular Pulse Excitation, is a widely used technique in speech coding and speech synthesis. It is primarily used in low-bitrate speech coding algorithms, such as the Adaptive Multi-Rate (AMR) speech codec, to efficiently represent the excitation signal of speech.

In speech coding, the goal is to compress the speech signal to a lower data rate while maintaining an acceptable level of speech quality. One of the main components of a speech codec is the excitation signal, which represents the source of the speech sound. The excitation signal is typically modeled as a sequence of pulses that determine the timing and amplitude of the glottal excitation.

The RPE technique is based on the observation that the human vocal folds produce a quasi-periodic excitation signal during voiced speech. The idea behind RPE is to generate a periodic pulse train that approximates the excitation signal of voiced speech. The pulse train is generated by positioning a fixed number of pulses within each pitch period of the speech signal.

Here's a step-by-step explanation of the RPE process:

Pitch estimation: The first step in RPE is to estimate the pitch period of the speech signal. The pitch period represents the fundamental frequency of the voiced speech and corresponds to the rate at which the vocal folds vibrate. Various pitch estimation algorithms can be used to estimate the pitch period.
Pitch period normalization: Once the pitch period is estimated, it is normalized to a fixed duration, typically referred to as the normalized pitch period or the open-loop lag. This normalization step ensures that the pulse train has a fixed duration, regardless of the actual pitch period of the speech signal.
Pulse positioning: In RPE, a fixed number of pulses are placed within each normalized pitch period. The position of the pulses is determined based on the characteristics of the speech signal. The goal is to accurately represent the glottal excitation, which is responsible for the voiced sound of speech. The pulse positions can be determined using various techniques, such as interpolation or adaptive algorithms.
Pulse amplitude: Once the pulse positions are determined, the amplitudes of the pulses are calculated. The amplitudes are typically obtained by optimizing a perceptual objective function that takes into account the speech quality and intelligibility. The objective is to find the pulse amplitudes that best represent the speech signal while minimizing the perceptual distortion introduced by the coding process.
Encoding and decoding: After determining the pulse positions and amplitudes, the excitation signal is encoded using a suitable coding scheme. In the case of the AMR speech codec, for example, the RPE parameters are quantized and transmitted as part of the encoded bitstream. On the decoding side, the RPE parameters are decoded, and the excitation signal is reconstructed by generating the pulse train based on the decoded parameters.

By using RPE, speech codecs can achieve efficient representation of the excitation signal while maintaining acceptable speech quality. The regular and periodic nature of the pulse train allows for efficient compression by exploiting the predictability of the excitation signal in voiced speech segments.