563 research outputs found

    Voice Conversion Using K-Histograms and Residual Averaging

    Get PDF
    The main goal of a voice conversion system is to modify the voice of a source speaker, in order to be perceived as if it had been uttered by another specific speaker.   Many approaches found in the literature convert only the features related to the vocal tract of the speaker. Our proposal is to convert those characteristics of the vocal tract, and also to process the signal passing through the vocal chords. Thus, the goal of this work is to obtain better scores in the voice conversion results.Fil: Uriz, Alejandro José. Universidad Nacional de Mar del Plata. Facultad de Ingenieria. Departamento de Electronica. Laboratorio de Comunicaciones; ArgentinaFil: Pablo D. Agüero. Consejo Nacional de Invest.cientif.y Tecnicas. Centro Cientifico Tecnol.conicet - Mar del Plata;Fil: Castiñeira Moreira; Jorge. Universidad Nacional de Mar del Plata. Facultad de Ingenieria. Departamento de Electronica. Laboratorio de Comunicaciones;Fil: Tulli, J. C.. Universidad Nacional de Mar del Plata. Facultad de Ingenieria. Departamento de Electronica. Laboratorio de Comunicaciones; ArgentinaFil: González, Esteban Lucio. Consejo Nacional de Invest.cientif.y Tecnicas. Centro Cientifico Tecnol.conicet - Mar del Plata; ArgentinaFil: Bonafonte, A.. Universidad Nacional de Mar del Plata. Facultad de Ingenieria. Departamento de Electronica. Laboratorio de Comunicaciones; Argentin

    Sound Transformation: Applying Image Neural Style Transfer Networks to Audio Spectrograms

    Get PDF
    Image style transfer networks are used to blend images, producing images that are a mix of source images. The process is based on controlled extraction of style and content aspects of images, using pre-trained Convolutional Neural Networks (CNNs). Our interest lies in adopting these image style transfer networks for the purpose of transforming sounds. Audio signals can be presented as grey-scale images of audio spectrograms. The purpose of our work is to investigate whether audio spectrogram inputs can be used with image neural transfer networks to produce new sounds. Using musical instrument sounds as source sounds, we apply and compare three existing image neural style transfer networks for the task of sound mixing. Our evaluation shows that all three networks are successful in producing consistent, new sounds based on the two source sounds. We use classification models to demonstrate that the new audio signals are consistent and distinguishable from the source instrument sounds. We further apply t-SNE cluster visualisation to visualise the feature maps of the new sounds and original source sounds, confirming that they form different sound groups from the source sounds. Our work paves the way to using CNNs for creative and targeted production of new sounds from source sounds, with specified source qualities, including pitch and timbre

    Energy Harvesting & Wing Morphing Design Using Piezoelectric Macro Fiber Composites

    Get PDF
    Energy harvesting from vibration sources was a very promising field of research throughout the last few decades among the engineers and scientist as considering the necessity of renewable/green energy for the welfare of mankind. Unused vibration energy exists in the surrounding or machineries was always tried to be utilized. Since then, by using piezoelectric transduction, researchers started to harvest the vibration energy. However, after the invention of piezo ceramics Macro Fiber Composites (MFC) by NASA, the research in this field augmented a lot due to its high efficiency to convert mechanical strain or vibration to useful electrical power and vice versa. Apart from energy harvesting researcher concentrated to utilize this harvested energy for daily life and hence application of this harvested energy for structural health monitoring inaugurated. Recent study showed that, the vibration energy harvested from the vehicles or aerospace (UAV) structure is good enough to power its onboard structural health monitoring unit though for feeding this power to any other onboard electrical system is still challenging due to low power generation along with its random production. Moreover, Macro Fiber Composites (MFC) can be used as an actuator to change the shape of aircraft wing to enhance aerodynamic performance and hence, application of MFC for wing morphing design has become popular throughout these years. The purpose of this research work is to depict the recent progress & development that took place in the field of energy harvesting & wing morphing research using macro fiber composites and combining the existing knowledge continue the work further, the future of this harvested energy, new design concept & upcoming challenges along with its possible solution. This work investigates the different configuration of macro fiber composites (MFC) for piezoelectric energy harvesting and its contribution for wing morphing design with enhanced aerodynamics. For the first part of this work, uniform MFC configuration was modeled and built up based on the Euler-Bernoulli beam theory. When the governing differential equations of the systems were derived, by applying the harmonic base excitation, coupled vibration response and the voltage response were obtained. The prediction of the mathematical model was at first verified by unimorph MFC with a brass substrate obtained from the state of art and then validation was justified by MFC unimorph along with three different substrate materials (copper, zinc alloy & galvanized steel) and thickness for the first time in this type of research. Computational & analytical solution revealed that, among these three substrates and for same thickness, maximum peak power at resonance excitation was obtained for the copper substrate. For the second part of the study (i) computational analysis was performed and the output was compared with the real time data obtained from the wind tunnel experiment and the conclusion stood that, with the increment of the incoming flow velocity, the power output from the MFC increases with a thin aerofoil made of copper substrates and two MFC on its upper surface (ii) wing morphing design was performed for a NACA 0012 aerofoil for the first time where macro fiber composite actuators were used to change the top and bottom surfaces of the aerofoil with a view to recording the enhanced aerodynamics performance the designed morphing wing. CFD simulation results were compared with the wind tunnel testing data from the state of art for NACA 0014 for all identical parameters. The enhanced aerodynamics performance observed for designed wing morphing can be used for future concepts like maneuvering of the aircraft without the help of ailerons or for the purpose of active flow control over the aircraft wing

    A Parametric Sound Object Model for Sound Texture Synthesis

    Get PDF
    This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are considered. Four essential processing steps for sound texture analysis are identifi ed, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fi xed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of parameters over time. In contrast to standard spectral modeling techniques, this representation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of diff erent length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of di fferent sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modulation, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various techniques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed

    Face Image and Video Analysis in Biometrics and Health Applications

    Get PDF
    Computer Vision (CV) enables computers and systems to derive meaningful information from acquired visual inputs, such as images and videos, and make decisions based on the extracted information. Its goal is to acquire, process, analyze, and understand the information by developing a theoretical and algorithmic model. Biometrics are distinctive and measurable human characteristics used to label or describe individuals by combining computer vision with knowledge of human physiology (e.g., face, iris, fingerprint) and behavior (e.g., gait, gaze, voice). Face is one of the most informative biometric traits. Many studies have investigated the human face from the perspectives of various different disciplines, ranging from computer vision, deep learning, to neuroscience and biometrics. In this work, we analyze the face characteristics from digital images and videos in the areas of morphing attack and defense, and autism diagnosis. For face morphing attacks generation, we proposed a transformer based generative adversarial network to generate more visually realistic morphing attacks by combining different losses, such as face matching distance, facial landmark based loss, perceptual loss and pixel-wise mean square error. In face morphing attack detection study, we designed a fusion-based few-shot learning (FSL) method to learn discriminative features from face images for few-shot morphing attack detection (FS-MAD), and extend the current binary detection into multiclass classification, namely, few-shot morphing attack fingerprinting (FS-MAF). In the autism diagnosis study, we developed a discriminative few shot learning method to analyze hour-long video data and explored the fusion of facial dynamics for facial trait classification of autism spectrum disorder (ASD) in three severity levels. The results show outstanding performance of the proposed fusion-based few-shot framework on the dataset. Besides, we further explored the possibility of performing face micro- expression spotting and feature analysis on autism video data to classify ASD and control groups. The results indicate the effectiveness of subtle facial expression changes on autism diagnosis

    Handbook of Digital Face Manipulation and Detection

    Get PDF
    This open access book provides the first comprehensive collection of studies dealing with the hot topic of digital face manipulation such as DeepFakes, Face Morphing, or Reenactment. It combines the research fields of biometrics and media forensics including contributions from academia and industry. Appealing to a broad readership, introductory chapters provide a comprehensive overview of the topic, which address readers wishing to gain a brief overview of the state-of-the-art. Subsequent chapters, which delve deeper into various research challenges, are oriented towards advanced readers. Moreover, the book provides a good starting point for young researchers as well as a reference guide pointing at further literature. Hence, the primary readership is academic institutions and industry currently involved in digital face manipulation and detection. The book could easily be used as a recommended text for courses in image processing, machine learning, media forensics, biometrics, and the general security area

    Voice source characterization for prosodic and spectral manipulation

    Get PDF
    The objective of this dissertation is to study and develop techniques to decompose the speech signal into its two main components: voice source and vocal tract. Our main efforts are on the glottal pulse analysis and characterization. We want to explore the utility of this model in different areas of speech processing: speech synthesis, voice conversion or emotion detection among others. Thus, we will study different techniques for prosodic and spectral manipulation. One of our requirements is that the methods should be robust enough to work with the large databases typical of speech synthesis. We use a speech production model in which the glottal flow produced by the vibrating vocal folds goes through the vocal (and nasal) tract cavities and its radiated by the lips. Removing the effect of the vocal tract from the speech signal to obtain the glottal pulse is known as inverse filtering. We use a parametric model fo the glottal pulse directly in the source-filter decomposition phase. In order to validate the accuracy of the parametrization algorithm, we designed a synthetic corpus using LF glottal parameters reported in the literature, complemented with our own results from the vowel database. The results show that our method gives satisfactory results in a wide range of glottal configurations and at different levels of SNR. Our method using the whitened residual compared favorably to this reference, achieving high quality ratings (Good-Excellent). Our full parametrized system scored lower than the other two ranking in third place, but still higher than the acceptance threshold (Fair-Good). Next we proposed two methods for prosody modification, one for each of the residual representations explained above. The first method used our full parametrization system and frame interpolation to perform the desired changes in pitch and duration. The second method used resampling on the residual waveform and a frame selection technique to generate a new sequence of frames to be synthesized. The results showed that both methods are rated similarly (Fair-Good) and that more work is needed in order to achieve quality levels similar to the reference methods. As part of this dissertation, we have studied the application of our models in three different areas: voice conversion, voice quality analysis and emotion recognition. We have included our speech production model in a reference voice conversion system, to evaluate the impact of our parametrization in this task. The results showed that the evaluators preferred our method over the original one, rating it with a higher score in the MOS scale. To study the voice quality, we recorded a small database consisting of isolated, sustained Spanish vowels in four different phonations (modal, rough, creaky and falsetto) and were later also used in our study of voice quality. Comparing the results with those reported in the literature, we found them to generally agree with previous findings. Some differences existed, but they could be attributed to the difficulties in comparing voice qualities produced by different speakers. At the same time we conducted experiments in the field of voice quality identification, with very good results. We have also evaluated the performance of an automatic emotion classifier based on GMM using glottal measures. For each emotion, we have trained an specific model using different features, comparing our parametrization to a baseline system using spectral and prosodic characteristics. The results of the test were very satisfactory, showing a relative error reduction of more than 20% with respect to the baseline system. The accuracy of the different emotions detection was also high, improving the results of previously reported works using the same database. Overall, we can conclude that the glottal source parameters extracted using our algorithm have a positive impact in the field of automatic emotion classification
    • …
    corecore