566 research outputs found

    A signal representation approach for discrimination between full and empty hazelnuts

    Get PDF
    We apply a sparse signal representation approach to impact acoustic signals to discriminate between empty and full hazelnuts. The impact acoustic signals are recorded by dropping the hazelnut shells on a metal plate. The impact signal is then approximated within a given error limit by choosing codevectors from a special dictionary. This dictionary was generated from sub-dictionaries that are individually generated for the impact signals corresponding to empty and full hazelnut. The number of codevectors selected from each sub-dictionary and the approximation error within initial codevectors are used as classification features and fed to a Linear Discriminant Analysis (LDA). We also compare this algorithm with a baseline approach. This baseline approach uses features which describe the time and frequency characteristics of the given signal that were previously used for empty and full hazelnut separation. Classification accuracies of 98.3% and 96.8% were achieved by the proposed approach and base algorithm respectively. The results we obtained show that sparse signal representation strategy can be used as an alternative classification method for undeveloped hazelnut separation with higher accuracies. © 2007 EURASIP

    Contributions to speech processing and ambient sound analysis

    Get PDF
    We are constantly surrounded by sounds that we continuously exploit to adapt our actions to situations we are facing. Some of the sounds like speech can have a particular structure from which we can infer some information, explicit or not. This is one reason why speech is possibly that is the most intuitive way to communicate between humans. Within the last decade, there has been significant progress in the domain of speech andaudio processing and in particular in the domain of machine learning applied to speech and audio processing. Thanks to these progresses, speech has become a central element in many human to human distant communication tools as well as in human to machine communication systems. These solutions work pretty well on clean speech or under controlled condition. However, in scenarios that involve the presence of acoustic perturbation such as noise or reverberation systems performance tends to degrade severely. In this thesis we focus on processing speech and its environments from an audio perspective. The algorithms proposed here are relying on a variety of solutions from signal processing based approaches to data-driven solutions based on supervised matrix factorization or deep neural networks. We propose solutions to problems ranging from speech recognition, to speech enhancement or ambient sound analysis. The target is to offer a panorama of the different aspects that could improve a speech processing algorithm working in a real environments. We start by describing automatic speech recognition as a potential end application and progressively unravel the limitations and the proposed solutions ending-up to the more general ambient sound analysis.Nous sommes constamment entourés de sons que nous exploitons pour adapter nos actions aux situations auxquelles nous sommes confrontés. Certains sons comme la parole peuvent avoir une structure particulière à partir de laquelle nous pouvons déduire des informations, explicites ou non. C’est l’une des raisons pour lesquelles la parole est peut-être le moyen le plus intuitif de communiquer entre humains. Au cours de la décennie écoulée, des progrès significatifs ont été réalisés dans le domaine du traitement de la parole et du son et en particulier dans le domaine de l’apprentissage automatique appliqué au traitement de la parole et du son. Grâce à ces progrès, la parole est devenue un élément central de nombreux outils de communication à distance d’humain à humain ainsi que dans les systèmes de communication humain-machine. Ces solutions fonctionnent bien sur un signal de parole propre ou dans des conditions contrôlées. Cependant, dans les scénarios qui impliquent la présence de perturbations acoustiques telles que du bruit ou de la réverbération les performances peuvent avoir tendance à se dégrader gravement. Dans cette HDR, nous nous concentrons sur le traitement de la parole et de son environnement d’un point de vue audio. Les algorithmes proposés ici reposent sur une variété de solutions allant des approches basées sur le traitement du signal aux solutions orientées données à base de factorisation matricielle supervisée ou de réseaux de neurones profonds. Nous proposons des solutions à des problèmes allant de la reconnaissance vocale au rehaussement de la parole ou à l’analyse des sons ambiants. L’objectif est d’offrir un panorama des différents aspects qui pourraient être améliorer un algorithme de traitement de la parole fonctionnant dans un environnement réel. Nous commençons par décrire la reconnaissance automatique de la parole comme une application finale potentielle et analysons progressivement les limites et les solutions proposées aboutissant à l’analyse plus générale des sons ambiants

    Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing

    Get PDF
    Linguistic typology aims to capture structural and semantic variation across the world's languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled resources. We present an extensive literature survey on the use of typological information in the development of NLP techniques. Our survey demonstrates that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance. We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-employment of the typological features included in them. We advocate for a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of machine learning algorithms used in contemporary NLP. In particular, we suggest that such approach could be facilitated by recent developments in data-driven induction of typological knowledge

    Developing Sparse Representations for Anchor-Based Voice Conversion

    Get PDF
    Voice conversion is the task of transforming speech from one speaker to sound as if it was produced by another speaker, changing the identity while retaining the linguistic content. There are many methods for performing voice conversion, but oftentimes these methods have onerous training requirements or fail in instances where one speaker has a nonnative accent. To address these issues, this dissertation presents and evaluates a novel “anchor-based” representation of speech that separates speaker content from speaker identity by modeling how speakers form English phonemes. We call the proposed method Sparse, Anchor-Based Representation of Speech (SABR), and explore methods for optimizing the parameters of this model in native-to-native and native-to-nonnative voice conversion contexts. We begin the dissertation by demonstrating how sparse coding in combination with a compact, phoneme-based dictionary can be used to separate speaker identity from content in objective and subjective tests. The formulation of the representation then presents several research questions. First, we propose a method for improving the synthesis quality by using the sparse coding residual in combination with a frequency warping algorithm to convert the residual from the source to target speaker’s space, and add it to the target speaker’s estimated spectrum. Experimentally, we find that synthesis quality is significantly improved via this transform. Second, we propose and evaluate two methods for selecting and optimizing SABR anchors in native-to-native and native-to-nonnative voice conversion. We find that synthesis quality is significantly improved by the proposed methods, especially in native-to- nonnative voice conversion over baseline algorithms. In a detailed analysis of the algorithms, we find they focus on phonemes that are difficult for nonnative speakers of English or naturally have multiple acoustic states. Following this, we examine methods for adding in temporal constraints to SABR via the Fused Lasso. The proposed method significantly reduces the inter-frame variance in the sparse codes over other methods that incorporate temporal features into sparse coding representations. Finally, in a case study, we examine the use of the SABR methods and optimizations in the context of a computer aided pronunciation training system for building “Golden Speakers”, or ideal models for nonnative speakers of a second language to learn correct pronunciation. Under the hypothesis that the optimal “Golden Speaker” was the learner’s voice, synthesized with a native accent, we used SABR to build voice models for nonnative speakers and evaluated the resulting synthesis in terms of quality, identity, and accentedness. We found that even when deployed in the field, the SABR method generated synthesis with low accentedness and similar acoustic identity to the target speaker, validating the use of the method for building “golden speakers”

    Can humain association norm evaluate latent semantic analysis?

    Get PDF
    This paper presents the comparison of word association norm created by a psycholinguistic experiment to association lists generated by algorithms operating on text corpora. We compare lists generated by Church and Hanks algorithm and lists generated by LSA algorithm. An argument is presented on how those automatically generated lists reflect real semantic relations
    • …
    corecore