608 research outputs found

    Noise cancelling in acoustic voice signals with spectral subtraction

    Get PDF
    The main purpose of study throughout this entire End of Degree Project would be the noise removal within speech signals, focusing on the diverse amount of algorithms using the spectral subtraction method. A Matlab application has been designed and created. The application main goal is to remove any meaningless thing considered as a disturb element when trying to perceive a voice; that is, anything considered as a noise. Noise removal is the basis for any voice processing that the user wants to do later, as speech recognition, save the clean audio, voice analysis, etc. A studio on four algorithms has been executed, in order to perform the spectral subtraction: Boll, Berouti, Lockwood & Boudy, and Multiband. This document presents a theoretical study and its implementation. Moreover, in order to have ready for the user a suitable implementation of an application, an intuitive and simple interface has been designed. This document shows how the different algorithms work in some voices and with various types of noise. A few amounts of noises are ideal, used by its mathematical characteristics, while others, are quite common and presented in daily routine, it is presented as for example, the noise of a bus. To apply the method of spectral subtraction is necessary the implementation of a Vocal Activity Detector, able to recognize in which precise moments of the audio there is voice or not. Two types have been studied and implemented: the first one establishes the meaning of voice according to a threshold which is adequate to this record, while the second one is the combination of Zero Crossing Rate and energy. In the end, once the application is implemented, evaluating its performances was the next process, either in an objective and a subjective form. People stand point was considered and asked, in order to obtain the proper functioning of the application along different types of noise, voice, variables, algorithm, etc.Este Trabajo de Fin de Grado, consiste en el estudio de la eliminación de ruido en voces; en concreto en el estudio de distintos algoritmos para el método de la resta espectral. Se ha creado una aplicación en el programa de cálculo Matlab cuyo uso es la eliminación de todo aquello que nos pueda molestar a la hora de escuchar una voz, es decir, lo que se considera ruido. La eliminación de ruido es la base de cualquier tratamiento de voz que se quiera aplicar posteriormente; desde reconocimiento de voz, el análisis de la misma, la conservación de la grabación limpia. etc. Se ha hecho un estudio de cuatro algoritmos para llevar a cabo esta resta espectral: Boll, Berouti, Lockwood & Boudy y Multibanda. En este documento se encuentra tanto un estudio teórico, así como su implementación. Para la implementación de una aplicación que pueda ser usada por un usuario, se ha diseñado una interfaz fácil e intuitiva de usar, en ésta se muestra cómo funcionan los distintos algoritmos en distintas voces y con distintos tipos de ruido, algunos ideales, usados en las medidas oficiales de ruido por sus concretas características matemáticas, y otros, los de la vida cotidiana como el ruido de un autobús. Para aplicar el método de la resta espectral es necesario la implementación de un Detector de Actividad Vocal (VAD) que reconozca en qué momentos del audio hay voz o no. Se han estudiado e implementado dos: Uno de ellos establece qué es voz según un límite adecuado a esa grabación y el otro es la combinación de la Tasa de Cruces por Cero (ZCR) y la energía. Por último, una vez implementada esta aplicación se ha procedido a evaluar su funcionamiento, tanto de una forma objetiva como subjetiva, a través de la escucha de distintas personas, las cuales dan su opinión, para poder obtener el comportamiento de la aplicación con distintos tipos de ruidos, voces, variables, algoritmos, etc.Ingeniería de Sistemas Audiovisuale

    A Parametric Sound Object Model for Sound Texture Synthesis

    Get PDF
    This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are considered. Four essential processing steps for sound texture analysis are identifi ed, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fi xed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of parameters over time. In contrast to standard spectral modeling techniques, this representation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of diff erent length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of di fferent sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modulation, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various techniques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed

    "I'm sorry" - an analysis on CEO apologies through YouTube as part of corporate crisis communication. Case study: BlackBerry RIM, Skype and Stratfor

    Get PDF
    Objective of the Study: The objective of the study was to determine the elements that constitute a true apology, how it is delivered and how the elements are connected to each other. Three international case organizations were chosen for the study: BlackBerry RIM, Skype and Stratfor. The thesis explored the delivery of three corporate apologies in order to answer two research questions: "What are the different dimensions of an apology speech act?" and "What is the relationship between the conveyed apology and the intended apology when the medium is YouTube?" Methodology and the Theoretical Framework: The primary data in the study was the three videoed corporate apologies from the CEOs of the organizations in question. Previous material on the CEOs was the secondary data. This qualitative research was executed through content analysis, speech act analysis and multimodal semiotic analysis. Additionally, GreenBlueRed theory was applied to each part of the analysis in order to code the results and then to better compare the case examples to each other. The theoretical framework consisted of uniting different aspects of social media, business communication, nonverbal communication and visual studies in order to create the basis for the true apology. Findings and Conclusion: Different dimensions of an apology speech act were found as: offering compensation, corrective action, conciliation and mortification. Also two different forms of apology speech acts were discovered: the formulaic utterance and the apology referring to a specific set of closed settings. It was found that YouTube served as a platform to share the statements with ease. A videoed apology enables the organization to portray emotions along with the content, thus making it possible for the audience to better receive the message as it was intended. It was noted that the speaker should portray empathy both in content and delivery of the apology in order for the audience to perceive the apology as a true apology. Empathy was found to be the element to define the relationship between the conveyed apology and the intended apology

    Inpainting of Missing Audio Signal Samples

    Get PDF
    V oblasti zpracování signálů se v současné době čím dál více využívají tzv. řídké reprezentace signálů, tzn. že daný signál je možné vyjádřit přesně či velmi dobře aproximovat lineární kombinací velmi malého počtu vektorů ze zvoleného reprezentačního systému. Tato práce se zabývá využitím řídkých reprezentací pro rekonstrukci poškozených zvukových záznamů, ať už historických nebo nově vzniklých. Především historické zvukové nahrávky trpí zarušením jako praskání nebo šum. Krátkodobé poškození zvukových nahrávek bylo doposud řešeno interpolačními technikami, zejména pomocí autoregresního modelování. V nedávné době byl představen algoritmus s názvem Audio Inpainting, který řeší doplňování chybějících vzorků ve zvukovém signálu pomocí řídkých reprezentací. Zmíněný algoritmus využívá tzv. hladové algoritmy pro řešení optimalizačních úloh. Cílem této práce je porovnání dosavadních interpolačních metod s technikou Audio Inpaintingu. Navíc, k řešení optimalizačních úloh jsou využívány algoritmy založené na l1-relaxaci, a to jak ve formě analyzujícího, tak i syntetizujícího modelu. Především se jedná o proximální algoritmy. Tyto algoritmy pracují jak s jednotlivými koeficienty samostatně, tak s koeficienty v závislosti na jejich okolí, tzv. strukturovaná řídkost. Strukturovaná řídkost je dále využita taky pro odšumování zvukových nahrávek. Jednotlivé algoritmy jsou v praktické části zhodnoceny z hlediska nastavení parametrů pro optimální poměr rekonstrukce vs. výpočetní čas. Všechny algoritmy popsané v práci jsou na praktických příkladech porovnány pomocí objektivních metod odstupu signálu od šumu (SNR) a PEMO-Q. Na závěr je úspěšnost rekonstrukce poškozených zvukových signálů vyhodnocena.Recently, sparse representations of signals became very popular in the field of signal processing. Sparse representation mean that the signal is represented exactly or very well approximated by a linear combination of only a few vectors from the specific representation system. This thesis deals with the utilization of sparse representations of signals for the process of audio restoration, either historical or recent records. Primarily old audio recordings suffer from defects like crackles or noise. Until now, short gaps in audio signals were repaired by interpolation techniques, especially autoregressive modeling. Few years ago, an algorithm termed the Audio Inpainting was introduced. This algorithm solves the missing audio signal samples inpainting using sparse representations through the greedy algorithm for sparse approximation. This thesis aims to compare the state-of-the-art interpolation methods with the Audio Inpainting. Besides this, l1-relaxation methods are utilized for sparse approximation, while both analysis and synthesis models are incorporated. Algorithms used for the sparse approximation are called the proximal algorithms. These algorithms treat the coefficients either separately or with relations to their neighbourhood (structured sparsity). Further, structured sparsity is used for audio denoising. In the experimental part of the thesis, parameters of each algorithm are evaluated in terms of optimal restoration efficiency vs. processing time efficiency. All of the algorithms described in the thesis are compared using objective evaluation methods Signal-to-Noise ratio (SNR) and PEMO-Q. Finally, the overall conclusion and discussion on the restoration results is presented.
    corecore