77 research outputs found
An Open source Implementation of ITU-T Recommendation P.808 with Validation
The ITU-T Recommendation P.808 provides a crowdsourcing approach for
conducting a subjective assessment of speech quality using the Absolute
Category Rating (ACR) method. We provide an open-source implementation of the
ITU-T Rec. P.808 that runs on the Amazon Mechanical Turk platform. We extended
our implementation to include Degradation Category Ratings (DCR) and Comparison
Category Ratings (CCR) test methods. We also significantly speed up the test
process by integrating the participant qualification step into the main rating
task compared to a two-stage qualification and rating solution. We provide
program scripts for creating and executing the subjective test, and data
cleansing and analyzing the answers to avoid operational errors. To validate
the implementation, we compare the Mean Opinion Scores (MOS) collected through
our implementation with MOS values from a standard laboratory experiment
conducted based on the ITU-T Rec. P.800. We also evaluate the reproducibility
of the result of the subjective speech quality assessment through crowdsourcing
using our implementation. Finally, we quantify the impact of parts of the
system designed to improve the reliability: environmental tests, gold and
trapping questions, rating patterns, and a headset usage test
Transformation of Mean Opinion Scores to Avoid Misleading of Ranked based Statistical Techniques
The rank correlation coefficients and the ranked-based statistical tests (as
a subset of non-parametric techniques) might be misleading when they are
applied to subjectively collected opinion scores. Those techniques assume that
the data is measured at least at an ordinal level and define a sequence of
scores to represent a tied rank when they have precisely an equal numeric
value.
In this paper, we show that the definition of tied rank, as mentioned above,
is not suitable for Mean Opinion Scores (MOS) and might be misleading
conclusions of rank-based statistical techniques. Furthermore, we introduce a
method to overcome this issue by transforming the MOS values considering their
Confidence Intervals. The rank correlation coefficients and ranked-based
statistical tests can then be safely applied to the transformed values. We also
provide open-source software packages in different programming languages to
utilize the application of our transformation method in the quality of
experience domain.Comment: his paper has been accepted for publication in the 2020 Twelfth
International Conference on Quality of Multimedia Experience (QoMEX
Application of Just-Noticeable Difference in Quality as Environment Suitability Test for Crowdsourcing Speech Quality Assessment Task
Crowdsourcing micro-task platforms facilitate subjective media quality
assessment by providing access to a highly scale-able, geographically
distributed and demographically diverse pool of crowd workers. Those workers
participate in the experiment remotely from their own working environment,
using their own hardware. In the case of speech quality assessment, preliminary
work showed that environmental noise at the listener's side and the listening
device (loudspeaker or headphone) significantly affect perceived quality, and
consequently the reliability and validity of subjective ratings. As a
consequence, ITU-T Rec. P.808 specifies requirements for the listening
environment of crowd workers when assessing speech quality. In this paper, we
propose a new Just Noticeable Difference of Quality (JNDQ) test as a remote
screening method for assessing the suitability of the work environment for
participating in speech quality assessment tasks. In a laboratory experiment,
participants performed this JNDQ test with different listening devices in
different listening environments, including a silent room according to ITU-T
Rec. P.800 and a simulated background noise scenario. Results show a
significant impact of the environment and the listening device on the JNDQ
threshold. Thus, the combination of listening device and background noise needs
to be screened in a crowdsourcing speech quality test. We propose a minimum
threshold of our JNDQ test as an easily applicable screening method for this
purpose.Comment: This paper has been accepted for publication in the 2020 Twelfth
International Conference on Quality of Multimedia Experience (QoMEX
Multi-dimensional Speech Quality Assessment in Crowdsourcing
Subjective speech quality assessment is the gold standard for evaluating
speech enhancement processing and telecommunication systems. The commonly used
standard ITU-T Rec. P.800 defines how to measure speech quality in lab
environments, and ITU-T Rec.~P.808 extended it for crowdsourcing. ITU-T Rec.
P.835 extends P.800 to measure the quality of speech in the presence of noise.
ITU-T Rec. P.804 targets the conversation test and introduces perceptual speech
quality dimensions which are measured during the listening phase of the
conversation. The perceptual dimensions are noisiness, coloration,
discontinuity, and loudness. We create a crowdsourcing implementation of a
multi-dimensional subjective test following the scales from P.804 and extend it
to include reverberation, the speech signal, and overall quality. We show the
tool is both accurate and reproducible. The tool has been used in the ICASSP
2023 Speech Signal Improvement challenge and we show the utility of these
speech quality dimensions in this challenge. The tool will be publicly
available as open-source at https://github.com/microsoft/P.808
VCD: A Video Conferencing Dataset for Video Compression
Commonly used datasets for evaluating video codecs are all very high quality
and not representative of video typically used in video conferencing scenarios.
We present the Video Conferencing Dataset (VCD) for evaluating video codecs for
real-time communication, the first such dataset focused on video conferencing.
VCD includes a wide variety of camera qualities and spatial and temporal
information. It includes both desktop and mobile scenarios and two types of
video background processing. We report the compression efficiency of H.264,
H.265, H.266, and AV1 in low-delay settings on VCD and compare it with the
non-video conferencing datasets UVC, MLC-JVC, and HEVC. The results show the
source quality and the scenarios have a significant effect on the compression
efficiency of all the codecs. VCD enables the evaluation and tuning of codecs
for this important scenario. The VCD is publicly available as an open-source
dataset at https://github.com/microsoft/VCD
Design optimization of switched reluctance motor for noise reduction
With finite element method (FEM) using ANSYS finite element (FE) package, an electromagnetic-structural simulation model is introduced for the switched reluctance motor (SRM). Since the main reason of noise and vibration in the SRM is a radial force applied to stator poles, the 2D FE transient analysis is carried out in electromagnetic modeling to predict the instantaneous radial force. Based on 3D FEM, the modal analysis is done in the developed structural model to determine mode shapes and natural frequencies. Using the developed simulation model and an evolutionary algorithm, a method is proposed for design optimization of the SRM to decrease noise. To evaluate the proposed method, the simulation results are presented for an 8/6 switched reluctance motor.
Cryptanalysis of CRUSH hash structure
In this paper, we will present a cryptanalysis of CRUSH hash structure. Surprisingly, our attack could find pre-image for any desired length of internal message. Time complexity of this attack is completely negligible. We will show that the time complexity of finding a pre-image of any length is O(1). In this attack, an adversary could freely find a pre-image with the length of his own choice for any given message digits. We can also find second pre-image, collision, multi-collision in the same complexity with our attack.
In this paper, we also introduce a stronger variant of the algorithm, and show that an adversary could still be able to produce collisions for this stronger variant of CRUSH hash structure with a time complexity less than a Birthday attack
Full Reference Video Quality Assessment for Machine Learning-Based Video Codecs
Machine learning-based video codecs have made significant progress in the
past few years. A critical area in the development of ML-based video codecs is
an accurate evaluation metric that does not require an expensive and slow
subjective test. We show that existing evaluation metrics that were designed
and trained on DSP-based video codecs are not highly correlated to subjective
opinion when used with ML video codecs due to the video artifacts being quite
different between ML and video codecs. We provide a new dataset of ML video
codec videos that have been accurately labeled for quality. We also propose a
new full reference video quality assessment (FRVQA) model that achieves a
Pearson Correlation Coefficient (PCC) of 0.99 and a Spearman's Rank Correlation
Coefficient (SRCC) of 0.99 at the model level. We make the dataset and FRVQA
model open source to help accelerate research in ML video codecs, and so that
others can further improve the FRVQA model
- …