Search CORE

187,923 research outputs found

Unlocking Foundation Models for Privacy-Enhancing Speech Understanding: An Early Study on Low Resource Speech Training Leveraging Label-guided Synthetic Speech Content

Author: Bose Digbalay
Feng Tiantian
Narayanan Shrikanth
Shi Xuan
Publication venue
Publication date: 13/06/2023
Field of study

Automatic Speech Understanding (ASU) leverages the power of deep learning models for accurate interpretation of human speech, leading to a wide range of speech applications that enrich the human experience. However, training a robust ASU model requires the curation of a large number of speech samples, creating risks for privacy breaches. In this work, we investigate using foundation models to assist privacy-enhancing speech computing. Unlike conventional works focusing primarily on data perturbation or distributed algorithms, our work studies the possibilities of using pre-trained generative models to synthesize speech content as training data with just label guidance. We show that zero-shot learning with training label-guided synthetic speech content remains a challenging task. On the other hand, our results demonstrate that the model trained with synthetic speech samples provides an effective initialization point for low-resource ASU training. This result reveals the potential to enhance privacy by reducing user data collection but using label-guided synthetic speech content

arXiv.org e-Print Archive

Differential Architecture Search in Deep Learning for DNA Splice Site Classification

Author: Moosa Shabir
Publication venue
Publication date: 01/06/2019
Field of study

The data explosion caused by unprecedented advancements in the field of genomics is constantly challenging the conventional methods used in the interpretation of the human genome. The demand for robust algorithms over the recent years has brought huge success in the field of Deep Learning (DL) in solving many difficult tasks in image, speech and natural language processing by automating the manual process of architecture design

Qatar University Institutional Repository

Copyright, Free Speech, and the Public's Right to Know: How Journalists Think about Fair Use

Author: Patricia Aufderheide
Peter Jaszi
Publication venue: Program on Information Justice and Intellectual Property, Washington College of Law, American University
Publication date: 02/02/2011
Field of study

This study, resulting from long-form interviews with 80 journalists, finds that journalistic mission is in peril, because of lack of clarity around copyright and fair use. Journalists' professional culture is highly conducive to a robust employment of their free speech rights under the copyright doctrine of fair use, but their actual knowledge of fair use practice is low. Where they have received education on copyright and fair use, it has often been erroneous. Ironically, when they do not know that they are using fair use, they nevertheless do so with a logic and reasoning that accords extremely well with today's courts' interpretation of the law. But when they have to actively make a decision about whether to employ fair use, they often resort to myths and misconceptions. Furthermore, they sometimes take unnecessary risks. The consequence of a failure to understand their free speech issues within the framework of fair use means that, when facing new practices or situations, journalists experience expense, delays and even failure to meet their mission of informing the public. These consequences are avoidable, with better and shared understanding of fair use within the experience of journalistic practice, whether it is original reporting, aggregation, within large institutions or a one-person outfit. Journalists need both to understand fair use and to articulate collectively the principles that govern its employment to meet journalistic mission

IssueLab

Lexically-guided perceptual learning in speech processing

Author: Eisner F.
Publication venue: Radboud University Nijmegen
Publication date: 07/03/2006
Field of study

During listening to spoken language, the perceptual system needs to adapt frequently to changes in talkers, and thus to considerable interindividual variability in the articulation of a given speech sound. This thesis investigated a learning process which allows listeners to use stored lexical representations to modify the interpretation of a speech sound when a talker's articulation of that sound is consistently unclear or ambiguous. The questions that were addressed in this research concerned the robustness of such perceptual learning, a potential role for sleep, and whether learning is specific to the speech of one talker or, alternatively, generalises to other talkers. A further study aimed to identify the underlying functional neuroanatomy by using magnetic resonance imaging methods. The picture that emerged for lexically-guided perceptual learning is that learning occurs very rapidly, is highly specific, and remains remarkably robust both over time and under exposure to speech from other talkers

MPG.PuRe

Robust semantic analysis for adaptive speech interfaces

Author: Cheadle Maria
Gambäck Björn
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2003
Field of study

The DUMAS project develops speech-based applications that are adaptable to different users and domains. The paper describes the project's robust semantic analysis strategy, used both in the generic framework for the development of multilingual speech-based dialogue systems which is the main project goal, and in the initial test application, a mobile phone-based e-mail interface

CiteSeerX

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Automatic Detection of Laryngeal Pathology on Sustained Vowels Using Short-Term Cepstral Parameters: Analysis of Performance and Theoretical Justification

Author: B. Boyanov
B. Boyanov
J.G. Proakis
J.I. Godino-Llorente
J.I. Godino-Llorente
J.I. Godino-Llorente
J.R. Deller
L. Rabiner
P.J. Murphy
R.O. Duda
S. Haykin
S.E. Bou-Ghazale
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

The majority of speech signal analysis procedures for automatic detection of laryngeal pathologies mainly rely on parameters extracted from time domain processing. Moreover, calculation of these parameters often requires prior pitch period estimation; therefore, their validity heavily depends on the robustness of pitch detection. Within this paper, an alternative approach based on cepstral- domain processing is presented which has the advantage of not requiring pitch estimation, thus providing a gain in both simplicity and robustness. While the proposed scheme is similar to solutions based on Mel-frequency cepstral parameters, already present in literature, it has an easier physical interpretation while achieving similar performance standards

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Robust Processing of Natural Language

Author: Menzel Wolfgang
Publication venue
Publication date: 01/01/1995
Field of study

Previous approaches to robustness in natural language processing usually treat deviant input by relaxing grammatical constraints whenever a successful analysis cannot be provided by ``normal'' means. This schema implies, that error detection always comes prior to error handling, a behaviour which hardly can compete with its human model, where many erroneous situations are treated without even noticing them. The paper analyses the necessary preconditions for achieving a higher degree of robustness in natural language processing and suggests a quite different approach based on a procedure for structural disambiguation. It not only offers the possibility to cope with robustness issues in a more natural way but eventually might be suited to accommodate quite different aspects of robust behaviour within a single framework.Comment: 16 pages, LaTeX, uses pstricks.sty, pstricks.tex, pstricks.pro, pst-node.sty, pst-node.tex, pst-node.pro. To appear in: Proc. KI-95, 19th German Conference on Artificial Intelligence, Bielefeld (Germany), Lecture Notes in Computer Science, Springer 199

arXiv.org e-Print Archive

CiteSeerX

Robust Parsing of Spoken Dialogue Using Contextual Knowledge and Recognition Probabilities

Author: Goerz Guenther
Hanrieder Gerhard
Publication venue
Publication date: 01/01/1995
Field of study

In this paper we describe the linguistic processor of a spoken dialogue system. The parser receives a word graph from the recognition module as its input. Its task is to find the best path through the graph. If no complete solution can be found, a robust mechanism for selecting multiple partial results is applied. We show how the information content rate of the results can be improved if the selection is based on an integrated quality score combining word recognition scores and context-dependent semantic predictions. Results of parsing word graphs with and without predictions are reported.Comment: 4 pages, LaTex source, 3 PostScript figures, uses epsf.sty and ETRW.sty, to appear in Proceedings of ESCA Workshop on Spoken Dialogue Systems, Denmark, May 30-June

arXiv.org e-Print Archive

CiteSeerX

SCREEN: Learning a Flat Syntactic and Semantic Spoken Language Analysis Using Artificial Neural Networks

Author: Weber Volker
Wermter Stefan
Publication venue
Publication date: 31/12/1996
Field of study

In this paper, we describe a so-called screening approach for learning robust processing of spontaneously spoken language. A screening approach is a flat analysis which uses shallow sequences of category representations for analyzing an utterance at various syntactic, semantic and dialog levels. Rather than using a deeply structured symbolic analysis, we use a flat connectionist analysis. This screening approach aims at supporting speech and language processing by using (1) data-driven learning and (2) robustness of connectionist networks. In order to test this approach, we have developed the SCREEN system which is based on this new robust, learned and flat analysis. In this paper, we focus on a detailed description of SCREEN's architecture, the flat syntactic and semantic analysis, the interaction with a speech recognizer, and a detailed evaluation analysis of the robustness under the influence of noisy or incomplete input. The main result of this paper is that flat representations allow more robust processing of spontaneous spoken language than deeply structured representations. In particular, we show how the fault-tolerance and learning capability of connectionist networks can support a flat analysis for providing more robust spoken-language processing within an overall hybrid symbolic/connectionist framework.Comment: 51 pages, Postscript. To be published in Journal of Artificial Intelligence Research 6(1), 199

arXiv.org e-Print Archive

CiteSeerX

Universaar

Acronym