MUSHRA MUlti Stimulus test with Hidden Reference and Anchors

Last updated on 12 May 2023

MUSHRA (MUlti Stimulus test with Hidden Reference and Anchors) is a subjective audio quality evaluation method commonly used in the field of audio engineering and psychoacoustics. It is designed to assess the perceived audio quality of different audio codecs, processing algorithms, or audio systems. MUSHRA combines the advantages of both absolute and relative listening tests by providing listeners with a reference audio signal and multiple test stimuli for comparison.

The primary goal of MUSHRA is to obtain reliable and accurate subjective ratings of audio quality. It is particularly useful for evaluating the performance of audio codecs, such as those used in audio streaming, telecommunication, or multimedia applications. By using MUSHRA, researchers and engineers can gather valuable feedback on the perceived audio quality of different audio processing techniques, helping them make informed decisions during the development and optimization of audio systems.

MUSHRA test setup typically involves a controlled listening environment, high-quality audio playback equipment, and a panel of trained listeners. The listeners are selected based on their ability to perceive and evaluate audio quality accurately. It is crucial to provide clear instructions to the listeners before the test and ensure that they are familiar with the evaluation procedure.

In MUSHRA, the test stimuli consist of several audio samples, which are processed versions of the reference audio signal. The reference signal represents the highest achievable audio quality, such as the original uncompressed audio. The processed stimuli represent different codecs or processing algorithms being evaluated.

To conduct the test, listeners are presented with a graphical user interface (GUI) that displays the stimuli and allows them to rate the perceived audio quality. The GUI typically includes a playback control, a rating scale, and anchors representing different quality levels. The rating scale ranges from "Bad" to "Excellent" or similar descriptors, allowing listeners to assign a quality rating to each stimulus.

One crucial aspect of MUSHRA is the use of hidden reference and anchors. The reference audio signal is not directly presented to the listeners but remains concealed. Instead, the listeners compare the test stimuli to the reference by considering the anchors provided on the rating scale. The anchors serve as reference points that represent specific quality levels.

The number of test stimuli and anchors can vary depending on the specific evaluation setup. However, a typical MUSHRA test might include five or more stimuli and five or more anchors. The anchors are carefully chosen to cover a broad range of quality levels, including poor, fair, good, very good, and excellent. The specific anchor labels can be adapted to the particular evaluation context.

During the test, the listeners sequentially play each stimulus and assign a quality rating on the rating scale based on their perception of the audio quality. It is important to note that listeners should consider the anchors as the reference points rather than the test stimuli themselves. This helps eliminate bias that may arise from the quality differences between the reference and the test stimuli.

To ensure the reliability of the results, it is common practice to randomize the presentation order of the stimuli to avoid any systematic bias. Moreover, the listeners are often instructed to listen to the stimuli multiple times and can replay them as needed to make a confident judgment.

After the test, the collected ratings from all listeners are averaged to obtain a mean opinion score (MOS) for each stimulus. The MOS represents the perceived audio quality for each processed stimulus. Statistical analysis techniques can be applied to determine the significance of the quality differences between the stimuli and draw meaningful conclusions.

MUSHRA offers several advantages over other subjective audio quality evaluation methods. By using a hidden reference and anchors, it provides a more objective framework for rating the quality of audio stimuli. It allows for the evaluation of multiple stimuli in a single test, making it efficient and cost-effective. Additionally, MUSHRA can provide insights into the specific aspects of audio quality that listeners find important or objectionable.

In addition to its advantages, MUSHRA also has some considerations and limitations that need to be taken into account. One important consideration is the selection of appropriate anchors. The anchors should represent a range of quality levels that are meaningful and relevant to the context of the evaluation. Careful consideration should be given to ensure that the anchors adequately cover the potential range of audio quality perceptions.

Furthermore, the choice of test stimuli is crucial. The stimuli should be diverse and representative of the different conditions or scenarios under evaluation. They should also be carefully processed to ensure that any differences in quality are attributable to the specific codec or processing algorithm being tested.

The listening environment plays a significant role in MUSHRA evaluations. It is important to create a controlled environment that minimizes external noise and provides consistent playback conditions. High-quality audio equipment, such as headphones or speakers, should be used to ensure accurate reproduction of the stimuli.

Training and familiarization of the listeners are essential to obtain reliable results. Listeners should be trained to assess and rate audio quality consistently. They should also be familiarized with the MUSHRA procedure, the rating scale, and the specific task they are expected to perform. This helps minimize inter-listener variability and enhances the overall reliability of the test.

MUSHRA is a powerful tool for evaluating audio quality, but it also has some limitations. One limitation is the reliance on subjective ratings, which can introduce inherent variability due to individual listener preferences and biases. However, by using trained listeners and appropriate statistical analysis techniques, it is possible to minimize the impact of these biases and obtain reliable results.

Another limitation is the inability of MUSHRA to capture all aspects of audio quality comprehensively. While it provides valuable information on perceived audio quality, it may not capture other important aspects such as spatial audio reproduction, dynamic range, or other specific characteristics of the audio signal. Therefore, additional evaluation methods may be necessary to complement the findings obtained from MUSHRA.

In conclusion, MUSHRA is a widely used method for evaluating audio quality in various applications. It combines the advantages of both absolute and relative listening tests by providing a hidden reference and anchors for comparison. MUSHRA provides a reliable and efficient means of obtaining subjective ratings of audio quality, allowing researchers and engineers to make informed decisions in the development and optimization of audio systems. However, careful attention should be given to the selection of anchors, stimuli, and the listening environment to ensure the validity and reliability of the results.