187,923 research outputs found
Unlocking Foundation Models for Privacy-Enhancing Speech Understanding: An Early Study on Low Resource Speech Training Leveraging Label-guided Synthetic Speech Content
Automatic Speech Understanding (ASU) leverages the power of deep learning
models for accurate interpretation of human speech, leading to a wide range of
speech applications that enrich the human experience. However, training a
robust ASU model requires the curation of a large number of speech samples,
creating risks for privacy breaches. In this work, we investigate using
foundation models to assist privacy-enhancing speech computing. Unlike
conventional works focusing primarily on data perturbation or distributed
algorithms, our work studies the possibilities of using pre-trained generative
models to synthesize speech content as training data with just label guidance.
We show that zero-shot learning with training label-guided synthetic speech
content remains a challenging task. On the other hand, our results demonstrate
that the model trained with synthetic speech samples provides an effective
initialization point for low-resource ASU training. This result reveals the
potential to enhance privacy by reducing user data collection but using
label-guided synthetic speech content
Differential Architecture Search in Deep Learning for DNA Splice Site Classification
The data explosion caused by unprecedented advancements in the field of genomics is constantly challenging the conventional methods used in the interpretation of the human genome. The demand for robust algorithms over the recent years has brought huge success in the field of Deep Learning (DL) in solving many difficult tasks in image, speech and natural language processing by automating the manual process of architecture design
Copyright, Free Speech, and the Public's Right to Know: How Journalists Think about Fair Use
This study, resulting from long-form interviews with 80 journalists, finds that journalistic mission is in peril, because of lack of clarity around copyright and fair use. Journalists' professional culture is highly conducive to a robust employment of their free speech rights under the copyright doctrine of fair use, but their actual knowledge of fair use practice is low. Where they have received education on copyright and fair use, it has often been erroneous. Ironically, when they do not know that they are using fair use, they nevertheless do so with a logic and reasoning that accords extremely well with today's courts' interpretation of the law. But when they have to actively make a decision about whether to employ fair use, they often resort to myths and misconceptions. Furthermore, they sometimes take unnecessary risks. The consequence of a failure to understand their free speech issues within the framework of fair use means that, when facing new practices or situations, journalists experience expense, delays and even failure to meet their mission of informing the public. These consequences are avoidable, with better and shared understanding of fair use within the experience of journalistic practice, whether it is original reporting, aggregation, within large institutions or a one-person outfit. Journalists need both to understand fair use and to articulate collectively the principles that govern its employment to meet journalistic mission
Lexically-guided perceptual learning in speech processing
During listening to spoken language, the perceptual system needs to adapt frequently to changes in talkers, and thus to considerable interindividual variability in the articulation of a given speech sound. This thesis investigated a learning process which allows listeners to use stored lexical representations to modify the interpretation of a speech sound when a talker's articulation of that sound is consistently unclear or ambiguous. The questions that were addressed in this research concerned the robustness of such perceptual learning, a potential role for sleep, and whether learning is specific to the speech of one talker or, alternatively, generalises to other talkers. A further study aimed to identify the underlying functional neuroanatomy by using magnetic resonance imaging methods. The picture that emerged for lexically-guided perceptual learning is that learning occurs very rapidly, is highly specific, and remains remarkably robust both over time and under exposure to speech from other talkers
Robust semantic analysis for adaptive speech interfaces
The DUMAS project develops speech-based applications that are adaptable to different users and domains. The paper describes the project's robust semantic analysis strategy, used both in the generic framework for the development of multilingual speech-based dialogue systems which is the main project goal, and in the initial test application, a mobile phone-based e-mail interface
Automatic Detection of Laryngeal Pathology on Sustained Vowels Using Short-Term Cepstral Parameters: Analysis of Performance and Theoretical Justification
The majority of speech signal analysis procedures for automatic detection of laryngeal pathologies mainly rely on parameters extracted from time domain processing. Moreover, calculation of these parameters often requires prior pitch period estimation; therefore, their validity heavily depends on the robustness of pitch detection. Within this paper, an alternative approach based on cepstral- domain processing is presented which has the advantage of not requiring pitch estimation, thus providing a gain in both simplicity and robustness. While the proposed scheme is similar to solutions based on Mel-frequency cepstral parameters, already present in literature, it has an easier physical interpretation while achieving similar performance standards
Robust Processing of Natural Language
Previous approaches to robustness in natural language processing usually
treat deviant input by relaxing grammatical constraints whenever a successful
analysis cannot be provided by ``normal'' means. This schema implies, that
error detection always comes prior to error handling, a behaviour which hardly
can compete with its human model, where many erroneous situations are treated
without even noticing them.
The paper analyses the necessary preconditions for achieving a higher degree
of robustness in natural language processing and suggests a quite different
approach based on a procedure for structural disambiguation. It not only offers
the possibility to cope with robustness issues in a more natural way but
eventually might be suited to accommodate quite different aspects of robust
behaviour within a single framework.Comment: 16 pages, LaTeX, uses pstricks.sty, pstricks.tex, pstricks.pro,
pst-node.sty, pst-node.tex, pst-node.pro. To appear in: Proc. KI-95, 19th
German Conference on Artificial Intelligence, Bielefeld (Germany), Lecture
Notes in Computer Science, Springer 199
Robust Parsing of Spoken Dialogue Using Contextual Knowledge and Recognition Probabilities
In this paper we describe the linguistic processor of a spoken dialogue
system. The parser receives a word graph from the recognition module as its
input. Its task is to find the best path through the graph. If no complete
solution can be found, a robust mechanism for selecting multiple partial
results is applied. We show how the information content rate of the results can
be improved if the selection is based on an integrated quality score combining
word recognition scores and context-dependent semantic predictions. Results of
parsing word graphs with and without predictions are reported.Comment: 4 pages, LaTex source, 3 PostScript figures, uses epsf.sty and
ETRW.sty, to appear in Proceedings of ESCA Workshop on Spoken Dialogue
Systems, Denmark, May 30-June
SCREEN: Learning a Flat Syntactic and Semantic Spoken Language Analysis Using Artificial Neural Networks
In this paper, we describe a so-called screening approach for learning robust
processing of spontaneously spoken language. A screening approach is a flat
analysis which uses shallow sequences of category representations for analyzing
an utterance at various syntactic, semantic and dialog levels. Rather than
using a deeply structured symbolic analysis, we use a flat connectionist
analysis. This screening approach aims at supporting speech and language
processing by using (1) data-driven learning and (2) robustness of
connectionist networks. In order to test this approach, we have developed the
SCREEN system which is based on this new robust, learned and flat analysis.
In this paper, we focus on a detailed description of SCREEN's architecture,
the flat syntactic and semantic analysis, the interaction with a speech
recognizer, and a detailed evaluation analysis of the robustness under the
influence of noisy or incomplete input. The main result of this paper is that
flat representations allow more robust processing of spontaneous spoken
language than deeply structured representations. In particular, we show how the
fault-tolerance and learning capability of connectionist networks can support a
flat analysis for providing more robust spoken-language processing within an
overall hybrid symbolic/connectionist framework.Comment: 51 pages, Postscript. To be published in Journal of Artificial
Intelligence Research 6(1), 199
- …