189 research outputs found
Speech Denoising Using Non-Negative Matrix Factorization with Kullback-Leibler Divergence and Sparseness Constraints
Proceedings of: IberSPEECH 2012 Conference, Madrid, Spain, November 21-23, 2012.A speech denoising method based on Non-Negative Matrix Factorization (NMF) is presented in this paper. With respect to previous related works, this paper makes two contributions. First, our method does not assume a priori knowledge about the nature of the noise. Second, it combines the use of the Kullback-Leibler divergence with sparseness constraints on the activation matrix, improving the performance of similar techniques that minimize the Euclidean distance and/or do not consider any sparsification. We evaluate the proposed method for both, speech enhancement and automatic speech recognitions tasks, and compare it to conventional spectral subtraction, showing improvements in speech quality and recognition accuracy, respectively, for different noisy conditions.This work has been partially supported by the Spanish Government grants TSI-020110-2009-103 and TEC2011-26807.Publicad
The Calogero-Sutherland Model and Polynomials with Prescribed Symmetry
The Schr\"odinger operators with exchange terms for certain
Calogero-Sutherland quantum many body systems have eigenfunctions which factor
into the symmetric ground state and a multivariable polynomial. The polynomial
can be chosen to have a prescribed symmetry (i.e. be symmetric or
antisymmetric) with respect to the interchange of some specified variables. For
four particular Calogero-Sutherland systems we construct an eigenoperator for
these polynomials which separates the eigenvalues and establishes
orthogonality. In two of the cases this involves identifying new operators
which commute with the corresponding Schr\"odinger operators. In each case we
express a particular class of the polynomials with prescribed symmetry in a
factored form involving the corresponding symmetric polynomials.Comment: LaTeX 2.09, 31 page
Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed
Speechreading or lipreading is the technique of understanding and getting
phonetic features from a speaker's visual features such as movement of lips,
face, teeth and tongue. It has a wide range of multimedia applications such as
in surveillance, Internet telephony, and as an aid to a person with hearing
impairments. However, most of the work in speechreading has been limited to
text generation from silent videos. Recently, research has started venturing
into generating (audio) speech from silent video sequences but there have been
no developments thus far in dealing with divergent views and poses of a
speaker. Thus although, we have multiple camera feeds for the speech of a user,
but we have failed in using these multiple video feeds for dealing with the
different poses. To this end, this paper presents the world's first ever
multi-view speech reading and reconstruction system. This work encompasses the
boundaries of multimedia research by putting forth a model which leverages
silent video feeds from multiple cameras recording the same subject to generate
intelligent speech for a speaker. Initial results confirm the usefulness of
exploiting multiple camera views in building an efficient speech reading and
reconstruction system. It further shows the optimal placement of cameras which
would lead to the maximum intelligibility of speech. Next, it lays out various
innovative applications for the proposed system focusing on its potential
prodigious impact in not just security arena but in many other multimedia
analytics problems.Comment: 2018 ACM Multimedia Conference (MM '18), October 22--26, 2018, Seoul,
Republic of Kore
Moralizing Postcolonial Consumer Society: Fair Trade in the Netherlands, 1964-1997
Decolonization challenged people across the globe to define their place in a new postcolonial order. This challenge was felt in international political and economic affairs, but it also affected daily lives across the globe. The history of fair trade activism as seen from the Netherlands highlights how citizens in the North grappled to position themselves in a postcolonial consumer society. Interventions by fair trade activists connected debates about the morals of their society to the consequences of decolonization. They reacted to the imbalances of the global market in the wake of decolonization, joining critics from the South in demanding more equitable global relations. It was around this issue of "fair trade" that a transnational coalition of moderate and more radical activists emerged after the 1960s. This coalition held widely dissimilar views regarding the politics of the left and the use of consumer activism. The analysis of their interventions demonstrates that during the postwar era attempts at transforming the global market were inextricably interwoven with visions of a postcolonial order
New single-ended objective measure for non-intrusive speech quality evaluation
peer-reviewedThis article proposes a new output-based method for non-intrusive assessment of speech quality of voice communication systems and evaluates its performance. The method requires access to the processed (degraded) speech only, and is based on measuring perception-motivated objective auditory distances between the voiced parts of the output speech to appropriately matching references extracted from a pre-formulated codebook. The codebook is formed by optimally clustering a large number of parametric speech vectors extracted from a database of clean speech records. The auditory distances are then mapped into objective Mean Opinion listening quality scores. An efficient data-mining tool known as the self-organizing map (SOM) achieves the required clustering and mapping/reference matching processes. In order to obtain a perception-based, speaker-independent parametric representation of the speech, three domain transformation techniques have been investigated. The first technique is based on a perceptual linear prediction (PLP) model, the second utilises a bark spectrum (BS) analysis and the third utilises mel-frequency cepstrum coefficients (MFCC). Reported evaluation results show that the proposed method provides high correlation with subjective listening quality scores, yielding accuracy similar to that of the ITU-T P.563 while maintaining a relatively low computational complexity. Results also demonstrate that the method outperforms the PESQ in a number of distortion conditions, such as those of speech degraded by channel impairments.acceptedpeer-reviewe
Formal Analysis of Linear Control Systems using Theorem Proving
Control systems are an integral part of almost every engineering and physical
system and thus their accurate analysis is of utmost importance. Traditionally,
control systems are analyzed using paper-and-pencil proof and computer
simulation methods, however, both of these methods cannot provide accurate
analysis due to their inherent limitations. Model checking has been widely used
to analyze control systems but the continuous nature of their environment and
physical components cannot be truly captured by a state-transition system in
this technique. To overcome these limitations, we propose to use
higher-order-logic theorem proving for analyzing linear control systems based
on a formalized theory of the Laplace transform method. For this purpose, we
have formalized the foundations of linear control system analysis in
higher-order logic so that a linear control system can be readily modeled and
analyzed. The paper presents a new formalization of the Laplace transform and
the formal verification of its properties that are frequently used in the
transfer function based analysis to judge the frequency response, gain margin
and phase margin, and stability of a linear control system. We also formalize
the active realizations of various controllers, like
Proportional-Integral-Derivative (PID), Proportional-Integral (PI),
Proportional-Derivative (PD), and various active and passive compensators, like
lead, lag and lag-lead. For illustration, we present a formal analysis of an
unmanned free-swimming submersible vehicle using the HOL Light theorem prover.Comment: International Conference on Formal Engineering Method
A Brave New Internet: Hacking the Narrative of Mark Zuckerberg’s 2021 Introduction of the Metaverse
We are entering an era of ‘techlash’: increasing unease with the hold of large technology companies over our lives, driven by fatalistic feelings of loss of agency. Neither attempts by these companies to address such concerns, such as appointing ethical committees and ombudsmen, nor grassroot initiatives aimed at user empowerment, seem effective in addressing this. This context remains unacknowledged in Mark Zuckerberg’s introduction of the Metaverse on 28 October 2021. We will show, however, that it is still implicitly addressed through its narrative. A far-reaching transformation of the way in which we use the internet is presented as desirable and unescapable, employing an epic narrative mode which values constancy of the individual and their mastery over their surroundings. However, this future is shaped by Zuckerberg and his company: promising agency for all, it is remarkable how little agency is given to the user. We juxtapose this smooth future vision with a counternarrative using the same narrative building stones, but told in a narrative mode distributing agency more equally. Thus, we engage in strategic analysis, exploring how to resist narratives such as the Metaverse’s. We call this method hacking the narrative
- …