67 research outputs found
Voice Cloning Using Artificial Intelligence and Machine Learning: A Review
This paper represents a thorough method for integrating emotions, texttospeech conversion, and state of the art voice cloning. The paper focuses on novel background noise adaptation, emotional voice synthesis, and multi-speaker voice cloning for better speech synthesis. The synthesis of emotive voices, multi-speaker voice cloning, and creative methods for modifying background noise to improve speech synthesis quality are among the topics covered in this study. Additionally, the study explores the domain of emotional artificial intelligence by adding a variety of emotions to artificial voices, improving user engagement through sympathetic reactions. The study also looks at how background noise can be altered to change it from a disturbing to a silent, non-disruptive state. The texttospeech systems usability in noisy conditions is greatly enhanced by this improvement. By integrating these components, the project makes a substantial contribution to text to speech, emotional AI, and voice cloning, creating new avenues for human-computer connection
Purging of silence for robust speaker identification in colossal database
The aim of this work is to develop an effective speaker recognition system under noisy environments for large data sets. The important phases involved in typical identification systems are feature extraction, training and testing. During the feature extraction phase, the speaker-specific information is processed based on the characteristics of the voice signal. Effective methods have been proposed for the silence removal in order to achieve accurate recognition under noisy environments in this work. Pitch and Pitch-strength parameters are extracted as distinct features from the input speech spectrum. Multi-linear principle component analysis (MPCA) is is utilized to minimize the complexity of the parameter matrix. Silence removal using zero crossing rate (ZCR) and endpoint detection algorithm (EDA) methods are applied on the source utterance during the feature extraction phase. These features are useful in later classification phase, where the identification is made on the basis of support vector machine (SVM) algorithms. Forward loking schostic (FOLOS) is the efficient large-scale SVM algorithm that has been employed for the effective classification among speakers. The evaluation findings indicate that the methods suggested increase the performance for large amounts of data in noise ecosystems
Sonic Urbanities: Undoing the Soundscape and Aural History in Kingston, NY
Senior Project submitted to The Division of Social Studies of Bard College
Dictionary-based Tensor Canonical Polyadic Decomposition
To ensure interpretability of extracted sources in tensor decomposition, we
introduce in this paper a dictionary-based tensor canonical polyadic
decomposition which enforces one factor to belong exactly to a known
dictionary. A new formulation of sparse coding is proposed which enables high
dimensional tensors dictionary-based canonical polyadic decomposition. The
benefits of using a dictionary in tensor decomposition models are explored both
in terms of parameter identifiability and estimation accuracy. Performances of
the proposed algorithms are evaluated on the decomposition of simulated data
and the unmixing of hyperspectral images
Baltic Sea waves analysis by using chaos theory tools, Computer Systems Engeneering: Theory and Applications
The motivation for this paper was to assess the applicability of the novel approach derived from
chaos theory to the description and analysis of dynamics of the free sea surface, in particular to the phase space reconstruction of the dynamical system from the observed time series. The free sea surface elevation data sets were sampled at the Baltic Coastal Research Station Lubiatowo in Poland.
After proper processing the experimental data, it was found that the sea surface elevations can be described as a result of a four-dimensional process, which appears to be weakly chaotic, characterized by a positive largest Lyapunov exponent and a short prediction horizon. It was confirmed that using
chaos theory tools may be very promising for diagnosing certain properties of the sea waves.
Moreover, in the paper, some new technique for evaluation of the average mutual information is
introduced
IVGPR: A New Program for Advanced End-To-End GPR Processing
Ground penetrating radar (GPR) processing workflows commonly rely on techniques
developed particularly for seismic reflection imaging. Although this practice has produced
an abundance of reliable results, it is limited to basic applications. As the popularity of
GPR continues to surge, a greater number of complex studies demand the use of routines
that take into account the unique properties of GPR signals. Such is the case of surveys
that examine the material properties of subsurface scatterers. The nature of these complicated
tasks have created a demand for GPR-specific processing packages flexible enough
to tackle new applications. Unlike seismic processing programs, however, GPR counterparts
often afford only a limited amount of functionalities. This work produced a new
GPR-specific processing package, dubbed IVGPR, that offers over 60 fully customizable
procedures. This program was built using the modern Fortran programming language in
combination with serial and parallel optimization practices that allow it to achieve high
levels of performance. Within its many functions, IVGPR provides the rare opportunity
to apply a three-dimensional single-component vector migration routine. This could be
of great value for advanced workflows designed to develop and test new true-amplitude
and inversion algorithms. Numerous examples given through this work demonstrate the
effectiveness of key routines in IVGPR. Additionally, three case studies show end-to-end
applications of this program to field records that produced satisfactory result well-suited
interpretatio
Recent Advances in Deep Learning Techniques for Face Recognition
In recent years, researchers have proposed many deep learning (DL) methods
for various tasks, and particularly face recognition (FR) made an enormous leap
using these techniques. Deep FR systems benefit from the hierarchical
architecture of the DL methods to learn discriminative face representation.
Therefore, DL techniques significantly improve state-of-the-art performance on
FR systems and encourage diverse and efficient real-world applications. In this
paper, we present a comprehensive analysis of various FR systems that leverage
the different types of DL techniques, and for the study, we summarize 168
recent contributions from this area. We discuss the papers related to different
algorithms, architectures, loss functions, activation functions, datasets,
challenges, improvement ideas, current and future trends of DL-based FR
systems. We provide a detailed discussion of various DL methods to understand
the current state-of-the-art, and then we discuss various activation and loss
functions for the methods. Additionally, we summarize different datasets used
widely for FR tasks and discuss challenges related to illumination, expression,
pose variations, and occlusion. Finally, we discuss improvement ideas, current
and future trends of FR tasks.Comment: 32 pages and citation: M. T. H. Fuad et al., "Recent Advances in Deep
Learning Techniques for Face Recognition," in IEEE Access, vol. 9, pp.
99112-99142, 2021, doi: 10.1109/ACCESS.2021.309613
Lidar Point Cloud compression, processing and learning for Autonomous Driving
As technology advances, cities are getting smarter. Smart mobility is the key element in smart cities and Autonomous Driving (AV) are an essential part of smart mobility. However, the vulnerability of unmanned vehicles can also affect the value of life and human safety. In this paper, we provide a comprehensive analysis of 3D Point-Cloud (3DPC) processing and learning in terms of development, advancement, and performance for the AV system. 3DPC has recently attracted growing interest due to its extensive applications, such as autonomous driving, computer vision, and robotics. Light Detection and Ranging Sensors (LiDAR) is one of the most significant sensors in AV, which collects 3DPC that can accurately capture the outer surfaces of scenes and objects. Learning and processing tools in the 3DPC are essential for creating maps, perceptions, and localization devices in AV. The intention behind 3DPC learning and practical processing tools is to be considered the most essential modules to create, locate, and perceive maps in an AV system. The goal of the study is to know ``what has been tested in AV system so far and what is necessary to make it safer and more practical in AV system.'' We also provide insights into the necessary open problems that are required to be resolved in the future
- …