67 research outputs found

    Voice Cloning Using Artificial Intelligence and Machine Learning: A Review

    Get PDF
    This paper represents a thorough method for integrating emotions, texttospeech conversion, and state of the art voice cloning. The paper focuses on novel background noise adaptation, emotional voice synthesis, and multi-speaker voice cloning for better speech synthesis. The synthesis of emotive voices, multi-speaker voice cloning, and creative methods for modifying background noise to improve speech synthesis quality are among the topics covered in this study. Additionally, the study explores the domain of emotional artificial intelligence by adding a variety of emotions to artificial voices, improving user engagement through sympathetic reactions. The study also looks at how background noise can be altered to change it from a disturbing to a silent, non-disruptive state. The texttospeech systems usability in noisy conditions is greatly enhanced by this improvement. By integrating these components, the project makes a substantial contribution to text to speech, emotional AI, and voice cloning, creating new avenues for human-computer connection

    Purging of silence for robust speaker identification in colossal database

    Get PDF
    The aim of this work is to develop an effective speaker recognition system under noisy environments for large data sets. The important phases involved in typical identification systems are feature extraction, training and testing. During the feature extraction phase, the speaker-specific information is processed based on the characteristics of the voice signal. Effective methods have been proposed for the silence removal in order to achieve accurate recognition under noisy environments in this work. Pitch and Pitch-strength parameters are extracted as distinct features from the input speech spectrum. Multi-linear principle component analysis (MPCA) is is utilized to minimize the complexity of the parameter matrix. Silence removal using zero crossing rate (ZCR) and endpoint detection algorithm (EDA) methods are applied on the source utterance during the feature extraction phase. These features are useful in later classification phase, where the identification is made on the basis of support vector machine (SVM) algorithms. Forward loking schostic (FOLOS) is the efficient large-scale SVM algorithm that has been employed for the effective classification among speakers. The evaluation findings indicate that the methods suggested increase the performance for large amounts of data in noise ecosystems

    Sonic Urbanities: Undoing the Soundscape and Aural History in Kingston, NY

    Get PDF
    Senior Project submitted to The Division of Social Studies of Bard College

    Dictionary-based Tensor Canonical Polyadic Decomposition

    Full text link
    To ensure interpretability of extracted sources in tensor decomposition, we introduce in this paper a dictionary-based tensor canonical polyadic decomposition which enforces one factor to belong exactly to a known dictionary. A new formulation of sparse coding is proposed which enables high dimensional tensors dictionary-based canonical polyadic decomposition. The benefits of using a dictionary in tensor decomposition models are explored both in terms of parameter identifiability and estimation accuracy. Performances of the proposed algorithms are evaluated on the decomposition of simulated data and the unmixing of hyperspectral images

    Baltic Sea waves analysis by using chaos theory tools, Computer Systems Engeneering: Theory and Applications

    Get PDF
    The motivation for this paper was to assess the applicability of the novel approach derived from chaos theory to the description and analysis of dynamics of the free sea surface, in particular to the phase space reconstruction of the dynamical system from the observed time series. The free sea surface elevation data sets were sampled at the Baltic Coastal Research Station Lubiatowo in Poland. After proper processing the experimental data, it was found that the sea surface elevations can be described as a result of a four-dimensional process, which appears to be weakly chaotic, characterized by a positive largest Lyapunov exponent and a short prediction horizon. It was confirmed that using chaos theory tools may be very promising for diagnosing certain properties of the sea waves. Moreover, in the paper, some new technique for evaluation of the average mutual information is introduced

    IVGPR: A New Program for Advanced End-To-End GPR Processing

    Get PDF
    Ground penetrating radar (GPR) processing workflows commonly rely on techniques developed particularly for seismic reflection imaging. Although this practice has produced an abundance of reliable results, it is limited to basic applications. As the popularity of GPR continues to surge, a greater number of complex studies demand the use of routines that take into account the unique properties of GPR signals. Such is the case of surveys that examine the material properties of subsurface scatterers. The nature of these complicated tasks have created a demand for GPR-specific processing packages flexible enough to tackle new applications. Unlike seismic processing programs, however, GPR counterparts often afford only a limited amount of functionalities. This work produced a new GPR-specific processing package, dubbed IVGPR, that offers over 60 fully customizable procedures. This program was built using the modern Fortran programming language in combination with serial and parallel optimization practices that allow it to achieve high levels of performance. Within its many functions, IVGPR provides the rare opportunity to apply a three-dimensional single-component vector migration routine. This could be of great value for advanced workflows designed to develop and test new true-amplitude and inversion algorithms. Numerous examples given through this work demonstrate the effectiveness of key routines in IVGPR. Additionally, three case studies show end-to-end applications of this program to field records that produced satisfactory result well-suited interpretatio

    Recent Advances in Deep Learning Techniques for Face Recognition

    Full text link
    In recent years, researchers have proposed many deep learning (DL) methods for various tasks, and particularly face recognition (FR) made an enormous leap using these techniques. Deep FR systems benefit from the hierarchical architecture of the DL methods to learn discriminative face representation. Therefore, DL techniques significantly improve state-of-the-art performance on FR systems and encourage diverse and efficient real-world applications. In this paper, we present a comprehensive analysis of various FR systems that leverage the different types of DL techniques, and for the study, we summarize 168 recent contributions from this area. We discuss the papers related to different algorithms, architectures, loss functions, activation functions, datasets, challenges, improvement ideas, current and future trends of DL-based FR systems. We provide a detailed discussion of various DL methods to understand the current state-of-the-art, and then we discuss various activation and loss functions for the methods. Additionally, we summarize different datasets used widely for FR tasks and discuss challenges related to illumination, expression, pose variations, and occlusion. Finally, we discuss improvement ideas, current and future trends of FR tasks.Comment: 32 pages and citation: M. T. H. Fuad et al., "Recent Advances in Deep Learning Techniques for Face Recognition," in IEEE Access, vol. 9, pp. 99112-99142, 2021, doi: 10.1109/ACCESS.2021.309613

    FEATURES OF SLEEP APNEA RECOGNITION AND ANALYSIS

    Get PDF

    LEFT-HANDEDNESS DETECTION

    Get PDF

    Lidar Point Cloud compression, processing and learning for Autonomous Driving

    Get PDF
    As technology advances, cities are getting smarter. Smart mobility is the key element in smart cities and Autonomous Driving (AV) are an essential part of smart mobility. However, the vulnerability of unmanned vehicles can also affect the value of life and human safety. In this paper, we provide a comprehensive analysis of 3D Point-Cloud (3DPC) processing and learning in terms of development, advancement, and performance for the AV system. 3DPC has recently attracted growing interest due to its extensive applications, such as autonomous driving, computer vision, and robotics. Light Detection and Ranging Sensors (LiDAR) is one of the most significant sensors in AV, which collects 3DPC that can accurately capture the outer surfaces of scenes and objects. Learning and processing tools in the 3DPC are essential for creating maps, perceptions, and localization devices in AV. The intention behind 3DPC learning and practical processing tools is to be considered the most essential modules to create, locate, and perceive maps in an AV system. The goal of the study is to know ``what has been tested in AV system so far and what is necessary to make it safer and more practical in AV system.'' We also provide insights into the necessary open problems that are required to be resolved in the future
    • …
    corecore