1,558 research outputs found
High-Resolution Speaker Counting In Reverberant Rooms Using CRNN With Ambisonics Features
Speaker counting is the task of estimating the number of people that are
simultaneously speaking in an audio recording. For several audio processing
tasks such as speaker diarization, separation, localization and tracking,
knowing the number of speakers at each timestep is a prerequisite, or at least
it can be a strong advantage, in addition to enabling a low latency processing.
For that purpose, we address the speaker counting problem with a multichannel
convolutional recurrent neural network which produces an estimation at a
short-term frame resolution. We trained the network to predict up to 5
concurrent speakers in a multichannel mixture, with simulated data including
many different conditions in terms of source and microphone positions,
reverberation, and noise. The network can predict the number of speakers with
good accuracy at frame resolution.Comment: 5 pages, 1 figur
Brain Music : Sistema generativo para la creación de música simbólica a partir de respuestas neuronales afectivas
gráficas, tablasEsta tesis de maestría presenta una metodología de aprendizaje profundo multimodal innovadora que fusiona un modelo de clasificación de emociones con un generador musical, con el propósito de crear música a partir de señales de electroencefalografía, profundizando así en la interconexión entre emociones y música. Los resultados alcanzan tres objetivos específicos:
Primero, ya que el rendimiento de los sistemas interfaz cerebro-computadora varía considerablemente entre diferentes sujetos, se introduce un enfoque basado en la transferencia de conocimiento entre sujetos para mejorar el rendimiento de individuos con dificultades en sistemas de interfaz cerebro-computadora basados en el paradigma de imaginación motora. Este enfoque combina datos de EEG etiquetados con datos estructurados, como cuestionarios psicológicos, mediante un método de "Kernel Matching CKA". Utilizamos una red neuronal profunda (Deep&Wide) para la clasificación de la imaginación motora. Los resultados destacan su potencial para mejorar las habilidades motoras en interfaces cerebro-computadora.
Segundo, proponemos una técnica innovadora llamada "Labeled Correlation Alignment"(LCA) para sonificar respuestas neurales a estímulos representados en datos no estructurados, como música afectiva. Esto genera características musicales basadas en la actividad cerebral inducida por las emociones. LCA aborda la variabilidad entre sujetos y dentro de sujetos mediante el análisis de correlación, lo que permite la creación de envolventes acústicos y la distinción entre diferente información sonora. Esto convierte a LCA en una herramienta prometedora para interpretar la actividad neuronal y su reacción a estímulos auditivos.
Finalmente, en otro capítulo, desarrollamos una metodología de aprendizaje profundo de extremo a extremo para generar contenido musical MIDI (datos simbólicos) a partir de señales de actividad cerebral inducidas por música con etiquetas afectivas. Esta metodología abarca el preprocesamiento de datos, el entrenamiento de modelos de extracción de características y un proceso de emparejamiento de características mediante Deep Centered Kernel Alignment, lo que permite la generación de música a partir de señales EEG.
En conjunto, estos logros representan avances significativos en la comprensión de la relación entre emociones y música, así como en la aplicación de la inteligencia artificial en la generación musical a partir de señales cerebrales. Ofrecen nuevas perspectivas y herramientas para la creación musical y la investigación en neurociencia emocional. Para llevar a cabo nuestros experimentos, utilizamos bases de datos públicas como GigaScience, Affective Music Listening y Deap Dataset (Texto tomado de la fuente)This master’s thesis presents an innovative multimodal deep learning methodology that combines an emotion classification model with a music generator, aimed at creating music from electroencephalography (EEG) signals, thus delving into the interplay between emotions and music. The results achieve three specific objectives:
First, since the performance of brain-computer interface systems varies significantly among different subjects, an approach based on knowledge transfer among subjects is introduced to enhance the performance of individuals facing challenges in motor imagery-based brain-computer interface systems. This approach combines labeled EEG data with structured information, such as psychological questionnaires, through a "Kernel Matching CKA"method. We employ a deep neural network (Deep&Wide) for motor imagery classification. The results underscore its potential to enhance motor skills in brain-computer interfaces.
Second, we propose an innovative technique called "Labeled Correlation Alignment"(LCA) to sonify neural responses to stimuli represented in unstructured data, such as affective music. This generates musical features based on emotion-induced brain activity. LCA addresses variability among subjects and within subjects through correlation analysis, enabling the creation of acoustic envelopes and the distinction of different sound information. This makes LCA a promising tool for interpreting neural activity and its response to auditory stimuli.
Finally, in another chapter, we develop an end-to-end deep learning methodology for generating MIDI music content (symbolic data) from EEG signals induced by affectively labeled music. This methodology encompasses data preprocessing, feature extraction model training, and a feature matching process using Deep Centered Kernel Alignment, enabling music generation from EEG signals.
Together, these achievements represent significant advances in understanding the relationship between emotions and music, as well as in the application of artificial intelligence in musical generation from brain signals. They offer new perspectives and tools for musical creation and research in emotional neuroscience. To conduct our experiments, we utilized public databases such as GigaScience, Affective Music Listening and Deap DatasetMaestríaMagíster en Ingeniería - Automatización IndustrialInvestigación en Aprendizaje Profundo y señales BiológicasEléctrica, Electrónica, Automatización Y Telecomunicaciones.Sede Manizale
Memory Fusion Network for Multi-view Sequential Learning
Multi-view sequential learning is a fundamental problem in machine learning
dealing with multi-view sequences. In a multi-view sequence, there exists two
forms of interactions between different views: view-specific interactions and
cross-view interactions. In this paper, we present a new neural architecture
for multi-view sequential learning called the Memory Fusion Network (MFN) that
explicitly accounts for both interactions in a neural architecture and
continuously models them through time. The first component of the MFN is called
the System of LSTMs, where view-specific interactions are learned in isolation
through assigning an LSTM function to each view. The cross-view interactions
are then identified using a special attention mechanism called the Delta-memory
Attention Network (DMAN) and summarized through time with a Multi-view Gated
Memory. Through extensive experimentation, MFN is compared to various proposed
approaches for multi-view sequential learning on multiple publicly available
benchmark datasets. MFN outperforms all the existing multi-view approaches.
Furthermore, MFN outperforms all current state-of-the-art models, setting new
state-of-the-art results for these multi-view datasets.Comment: AAAI 2018 Oral Presentatio
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
Silhouette-based gait recognition using Procrustes shape analysis and elliptic Fourier descriptors
This paper presents a gait recognition method which combines spatio-temporal motion characteristics, statistical and physical parameters (referred to as STM-SPP) of a human subject for its classification by analysing shape of the subject's silhouette contours using Procrustes shape analysis (PSA) and elliptic Fourier descriptors (EFDs). STM-SPP uses spatio-temporal gait characteristics and physical parameters of human body to resolve similar dissimilarity scores between probe and gallery sequences obtained by PSA. A part-based shape analysis using EFDs is also introduced to achieve robustness against carrying conditions. The classification results by PSA and EFDs are combined, resolving tie in ranking using contour matching based on Hu moments. Experimental results show STM-SPP outperforms several silhouette-based gait recognition methods
- …