POLQA Perceptual objective listening quality assessment

POLQA (Perceptual Objective Listening Quality Assessment) is a widely used method for evaluating the quality of speech and audio signals. It is designed to provide objective measurements that correlate well with subjective listening tests. Developed by the International Telecommunication Union (ITU-T) in Recommendation P.863, POLQA has become the industry standard for assessing the quality of modern telecommunication systems.

The need for objective quality assessment arises due to the increasing demand for high-quality audio communication in various applications, including voice-over-IP (VoIP), video conferencing, and mobile telephony. Traditional objective measures like PESQ (Perceptual Evaluation of Speech Quality) were developed for narrowband speech and have limitations when applied to wideband and super-wideband audio signals. POLQA was specifically designed to overcome these limitations and provide accurate quality assessment for a wide range of audio bandwidths.

POLQA operates by simulating the human auditory system's perception of speech quality. It analyzes the processed speech signal and compares it with the reference signal to calculate a MOS (Mean Opinion Score), which represents the subjective quality rating on a scale of 1 to 5. The MOS is derived by averaging the individual quality scores provided by multiple human listeners during subjective tests.

The key advantage of POLQA is its ability to handle wideband and super-wideband signals, which are essential for assessing the quality of modern communication systems that support higher audio bandwidths. It can accurately evaluate the performance of advanced codecs like G.722, AMR-WB, and Opus, which are capable of reproducing high-fidelity audio.

The POLQA algorithm consists of several processing steps. First, the input speech signal is preprocessed to remove noise and artifacts that may affect the quality assessment. Then, the processed signal is divided into short analysis frames, typically ranging from 20 to 30 milliseconds. For each frame, various psychoacoustic parameters are computed to model the human auditory system's behavior.

These parameters include spectral and temporal characteristics, loudness, and masking effects. Spectral analysis is performed using a filter bank that mimics the critical bands of human hearing. Temporal analysis involves detecting changes in the signal over time, such as rapid amplitude fluctuations or variations in the pitch.

Once the psychoacoustic parameters are computed, they are used to calculate a set of quality-related features. These features capture important aspects of speech quality, including the presence of noise, distortions, and overall loudness. Statistical models are then applied to map these features to a predicted MOS score. The mapping models are trained using a large dataset of subjective quality ratings obtained from human listeners.

The accuracy of the predicted MOS scores is validated by comparing them with the subjective ratings collected during listening tests. This process involves conducting experiments where human listeners assess the quality of speech samples degraded by various impairments, such as packet loss, delay, and coding artifacts. The subjective ratings are then used to derive a mapping function between the predicted MOS scores and the subjective quality ratings.

POLQA has been extensively tested and validated in different scenarios and conditions. It has been shown to provide accurate quality assessments for a wide range of audio codecs, network impairments, and environmental factors. The ITU-T P.863 standard provides detailed guidelines for implementing and applying POLQA in different contexts.

In conclusion, POLQA is a robust and reliable method for objectively assessing the quality of speech and audio signals. Its ability to handle wideband and super-wideband audio makes it particularly valuable for evaluating modern telecommunication systems. By providing accurate quality measurements, POLQA enables service providers and equipment manufacturers to optimize their systems and deliver superior audio experiences to users.