49 research outputs found

    Advanced statistical models for the segregation, identification and measurement of coexisting sound sources

    Get PDF
    Long-term monitoring of acoustical environments is gaining popularity thanks to the relevant amount of scientific and engineering insights that it provides. The increasing interest is due to the constant growth of storage capacity and computational power to process large amounts of data. In this perspective, machine learning (ML) provides a broad family of data-driven statistical techniques to deal with large databases. Nowadays, the conventional praxis of sound level meter measurements limits the global description of a sound scene to an energetic point of view. The equivalent continuous level Leq represents the main metric to define an acoustic environment, indeed. Finer analyses involve the use of statistical levels. However, acoustic percentiles are based on temporal assumptions, which are not always reliable. A statistical approach, based on the study of the occurrences of sound pressure levels, would bring a different perspective to the analysis of long-term monitoring. Depicting a sound scene through the most probable sound pressure level, rather than portions of energy, brought more specific information about the activity carried out during the measurements. The statistical mode of the occurrences can capture typical behaviors of specific kinds of sound sources. The present work aims to propose an ML-based method to identify, separate and measure coexisting sound sources in real-world scenarios. It is based on long-term monitoring and is addressed to acousticians focused on the analysis of environmental noise in manifold contexts. The presented method is based on clustering analysis. Two algorithms, Gaussian Mixture Model and K-means clustering, represent the main core of a process to investigate different active spaces monitored through sound level meters. The procedure has been applied in two different contexts: university lecture halls and offices. The proposed method shows robust and reliable results in describing the acoustic scenario and it could represent an important analytical tool for acousticians

    Data-Driven Speech Intelligibility Prediction

    Get PDF

    Attention-Inspired Artificial Neural Networks for Speech Processing: A Systematic Review

    Get PDF
    Artificial Neural Networks (ANNs) were created inspired by the neural networks in the human brain and have been widely applied in speech processing. The application areas of ANN include: Speech recognition, speech emotion recognition, language identification, speech enhancement, and speech separation, amongst others. Likewise, given that speech processing performed by humans involves complex cognitive processes known as auditory attention, there has been a growing amount of papers proposing ANNs supported by deep learning algorithms in conjunction with some mechanism to achieve symmetry with the human attention process. However, while these ANN approaches include attention, there is no categorization of attention integrated into the deep learning algorithms and their relation with human auditory attention. Therefore, we consider it necessary to have a review of the different ANN approaches inspired in attention to show both academic and industry experts the available models for a wide variety of applications. Based on the PRISMA methodology, we present a systematic review of the literature published since 2000, in which deep learning algorithms are applied to diverse problems related to speech processing. In this paper 133 research works are selected and the following aspects are described: (i) Most relevant features, (ii) ways in which attention has been implemented, (iii) their hypothetical relationship with human attention, and (iv) the evaluation metrics used. Additionally, the four publications most related with human attention were analyzed and their strengths and weaknesses were determined

    Selected Papers from the 5th International Electronic Conference on Sensors and Applications

    Get PDF
    This Special Issue comprises selected papers from the proceedings of the 5th International Electronic Conference on Sensors and Applications, held on 15–30 November 2018, on sciforum.net, an online platform for hosting scholarly e-conferences and discussion groups. In this 5th edition of the electronic conference, contributors were invited to provide papers and presentations from the field of sensors and applications at large, resulting in a wide variety of excellent submissions and topic areas. Papers which attracted the most interest on the web or that provided a particularly innovative contribution were selected for publication in this collection. These peer-reviewed papers are published with the aim of rapid and wide dissemination of research results, developments, and applications. We hope this conference series will grow rapidly in the future and become recognized as a new way and venue by which to (electronically) present new developments related to the field of sensors and their applications

    Exploring Animal Behavior Through Sound: Volume 1

    Get PDF
    This open-access book empowers its readers to explore the acoustic world of animals. By listening to the sounds of nature, we can study animal behavior, distribution, and demographics; their habitat characteristics and needs; and the effects of noise. Sound recording is an efficient and affordable tool, independent of daylight and weather; and recorders may be left in place for many months at a time, continuously collecting data on animals and their environment. This book builds the skills and knowledge necessary to collect and interpret acoustic data from terrestrial and marine environments. Beginning with a history of sound recording, the chapters provide an overview of off-the-shelf recording equipment and analysis tools (including automated signal detectors and statistical methods); audiometric methods; acoustic terminology, quantities, and units; sound propagation in air and under water; soundscapes of terrestrial and marine habitats; animal acoustic and vibrational communication; echolocation; and the effects of noise. This book will be useful to students and researchers of animal ecology who wish to add acoustics to their toolbox, as well as to environmental managers in industry and government

    Exploring Animal Behavior Through Sound: Volume 1

    Get PDF
    This open-access book empowers its readers to explore the acoustic world of animals. By listening to the sounds of nature, we can study animal behavior, distribution, and demographics; their habitat characteristics and needs; and the effects of noise. Sound recording is an efficient and affordable tool, independent of daylight and weather; and recorders may be left in place for many months at a time, continuously collecting data on animals and their environment. This book builds the skills and knowledge necessary to collect and interpret acoustic data from terrestrial and marine environments. Beginning with a history of sound recording, the chapters provide an overview of off-the-shelf recording equipment and analysis tools (including automated signal detectors and statistical methods); audiometric methods; acoustic terminology, quantities, and units; sound propagation in air and under water; soundscapes of terrestrial and marine habitats; animal acoustic and vibrational communication; echolocation; and the effects of noise. This book will be useful to students and researchers of animal ecology who wish to add acoustics to their toolbox, as well as to environmental managers in industry and government

    Advanced deep neural networks for speech separation and enhancement

    Get PDF
    Ph. D. Thesis.Monaural speech separation and enhancement aim to remove noise interference from the noisy speech mixture recorded by a single microphone, which causes a lack of spatial information. Deep neural network (DNN) dominates speech separation and enhancement. However, there are still challenges in DNN-based methods, including choosing proper training targets and network structures, refining generalization ability and model capacity for unseen speakers and noises, and mitigating the reverberations in room environments. This thesis focuses on improving separation and enhancement performance in the real-world environment. The first contribution in this thesis is to address monaural speech separation and enhancement within reverberant room environment by designing new training targets and advanced network structures. The second contribution to this thesis is on improving the enhancement performance by proposing a multi-scale feature recalibration convolutional bidirectional gate recurrent unit (GRU) network (MCGN). The third contribution is to improve the model capacity of the network and retain the robustness in the enhancement performance. A convolutional fusion network (CFN) is proposed, which exploits the group convolutional fusion unit (GCFU). The proposed speech enhancement methods are evaluated with various challenging datasets. The proposed methods are assessed with the stateof-the-art techniques and performance measures to confirm that this thesis contributes novel solution

    An AI-based Framework For Parent-child Interaction Analysis

    Get PDF
    The quality of parent-child interactions is foundational to children's social-emotional and cognitive development, as well as their lifelong mental health. The Parent-Child Interaction Teaching Scale (PCITS) is a well-established and effective tool used to measure parent-child interaction quality. It is utilized in both public health settings and basic and applied research studies to identify problem areas within parent-child interactions. However, like other observational measures of parent-child interaction quality, the PCITS can be time-consuming to administer and score, which limits its wider implementation. Therefore, the main objective of this research is to organize a framework for the recognition of behavioural symptoms of the child and parent during interventions. Based on the literature on interactive parent-child behaviour analysis, we categorized PCITS labels into three modalities: language, audio, and video. Some labels have dyadic actors, while others have a single actor (either the parent or child). In addition, within each modality, there are technical issues, considerations, and limitations in terms of artificial intelligence. Hence, we divided the problem into three modalities, proposed models for each modality, and a solution to combine them. Firstly, we proposed a model for recognizing action-related labels (video). These labels are interactive and involve two actors: the parent and the child. We conducted a feature extraction algorithm to produce semantic features passed through a feature selection algorithm to extract the most meaningful semantic features from the video. We chose this method due to its lower data requirement compared to other modalities. Also, because of using 2D video files, the proposed feature extraction and selection algorithms are to handle the occlusion and natural conditions like camera movement, Secondly, we proposed a model for recognizing language- and audio-related labels. These labels represent a single-actor role for the parent, as children are not yet capable of producing meaningful text in the intervention videos. To develop this model, we conducted research on a similar dataset to utilize transfer learning between two problems. Therefore, the second part of this research is associated with working on this text dataset. Third, we focused on multi-modal aspects of the work. We conducted experiments to determine how to integrate the prior work into our model. We also provided an ensemble model, which combined the modalities of language and audio based on the semantic and syntactic characteristics of the text. This ensemble model provides a baseline for developing further models with different aspects and modalities. Finally, we provided a roadmap to support more labels that were not covered in this research due to not reaching enough samples. Our proposed framework includes a labelling system that we developed in the primary stages of the research to gather labelled data. This system also plays a role to be integrated with AI modules to provide auto-recognition of the behavioural labels in parent-child interaction videos to the nurses

    Sensing, interpreting, and anticipating human social behaviour in the real world

    Get PDF
    Low-level nonverbal social signals like glances, utterances, facial expressions and body language are central to human communicative situations and have been shown to be connected to important high-level constructs, such as emotions, turn-taking, rapport, or leadership. A prerequisite for the creation of social machines that are able to support humans in e.g. education, psychotherapy, or human resources is the ability to automatically sense, interpret, and anticipate human nonverbal behaviour. While promising results have been shown in controlled settings, automatically analysing unconstrained situations, e.g. in daily-life settings, remains challenging. Furthermore, anticipation of nonverbal behaviour in social situations is still largely unexplored. The goal of this thesis is to move closer to the vision of social machines in the real world. It makes fundamental contributions along the three dimensions of sensing, interpreting and anticipating nonverbal behaviour in social interactions. First, robust recognition of low-level nonverbal behaviour lays the groundwork for all further analysis steps. Advancing human visual behaviour sensing is especially relevant as the current state of the art is still not satisfactory in many daily-life situations. While many social interactions take place in groups, current methods for unsupervised eye contact detection can only handle dyadic interactions. We propose a novel unsupervised method for multi-person eye contact detection by exploiting the connection between gaze and speaking turns. Furthermore, we make use of mobile device engagement to address the problem of calibration drift that occurs in daily-life usage of mobile eye trackers. Second, we improve the interpretation of social signals in terms of higher level social behaviours. In particular, we propose the first dataset and method for emotion recognition from bodily expressions of freely moving, unaugmented dyads. Furthermore, we are the first to study low rapport detection in group interactions, as well as investigating a cross-dataset evaluation setting for the emergent leadership detection task. Third, human visual behaviour is special because it functions as a social signal and also determines what a person is seeing at a given moment in time. Being able to anticipate human gaze opens up the possibility for machines to more seamlessly share attention with humans, or to intervene in a timely manner if humans are about to overlook important aspects of the environment. We are the first to propose methods for the anticipation of eye contact in dyadic conversations, as well as in the context of mobile device interactions during daily life, thereby paving the way for interfaces that are able to proactively intervene and support interacting humans.Blick, Gesichtsausdrücke, Körpersprache, oder Prosodie spielen als nonverbale Signale eine zentrale Rolle in menschlicher Kommunikation. Sie wurden durch vielzählige Studien mit wichtigen Konzepten wie Emotionen, Sprecherwechsel, Führung, oder der Qualität des Verhältnisses zwischen zwei Personen in Verbindung gebracht. Damit Menschen effektiv während ihres täglichen sozialen Lebens von Maschinen unterstützt werden können, sind automatische Methoden zur Erkennung, Interpretation, und Antizipation von nonverbalem Verhalten notwendig. Obwohl die bisherige Forschung in kontrollierten Studien zu ermutigenden Ergebnissen gekommen ist, bleibt die automatische Analyse nonverbalen Verhaltens in weniger kontrollierten Situationen eine Herausforderung. Darüber hinaus existieren kaum Untersuchungen zur Antizipation von nonverbalem Verhalten in sozialen Situationen. Das Ziel dieser Arbeit ist, die Vision vom automatischen Verstehen sozialer Situationen ein Stück weit mehr Realität werden zu lassen. Diese Arbeit liefert wichtige Beiträge zur autmatischen Erkennung menschlichen Blickverhaltens in alltäglichen Situationen. Obwohl viele soziale Interaktionen in Gruppen stattfinden, existieren unüberwachte Methoden zur Augenkontakterkennung bisher lediglich für dyadische Interaktionen. Wir stellen einen neuen Ansatz zur Augenkontakterkennung in Gruppen vor, welcher ohne manuelle Annotationen auskommt, indem er sich den statistischen Zusammenhang zwischen Blick- und Sprechverhalten zu Nutze macht. Tägliche Aktivitäten sind eine Herausforderung für Geräte zur mobile Augenbewegungsmessung, da Verschiebungen dieser Geräte zur Verschlechterung ihrer Kalibrierung führen können. In dieser Arbeit verwenden wir Nutzerverhalten an mobilen Endgeräten, um den Effekt solcher Verschiebungen zu korrigieren. Neben der Erkennung verbessert diese Arbeit auch die Interpretation sozialer Signale. Wir veröffentlichen den ersten Datensatz sowie die erste Methode zur Emotionserkennung in dyadischen Interaktionen ohne den Einsatz spezialisierter Ausrüstung. Außerdem stellen wir die erste Studie zur automatischen Erkennung mangelnder Verbundenheit in Gruppeninteraktionen vor, und führen die erste datensatzübergreifende Evaluierung zur Detektion von sich entwickelndem Führungsverhalten durch. Zum Abschluss der Arbeit präsentieren wir die ersten Ansätze zur Antizipation von Blickverhalten in sozialen Interaktionen. Blickverhalten hat die besondere Eigenschaft, dass es sowohl als soziales Signal als auch der Ausrichtung der visuellen Wahrnehmung dient. Somit eröffnet die Fähigkeit zur Antizipation von Blickverhalten Maschinen die Möglichkeit, sich sowohl nahtloser in soziale Interaktionen einzufügen, als auch Menschen zu warnen, wenn diese Gefahr laufen wichtige Aspekte der Umgebung zu übersehen. Wir präsentieren Methoden zur Antizipation von Blickverhalten im Kontext der Interaktion mit mobilen Endgeräten während täglicher Aktivitäten, als auch während dyadischer Interaktionen mittels Videotelefonie

    Seamless Multimodal Biometrics for Continuous Personalised Wellbeing Monitoring

    Full text link
    Artificially intelligent perception is increasingly present in the lives of every one of us. Vehicles are no exception, (...) In the near future, pattern recognition will have an even stronger role in vehicles, as self-driving cars will require automated ways to understand what is happening around (and within) them and act accordingly. (...) This doctoral work focused on advancing in-vehicle sensing through the research of novel computer vision and pattern recognition methodologies for both biometrics and wellbeing monitoring. The main focus has been on electrocardiogram (ECG) biometrics, a trait well-known for its potential for seamless driver monitoring. Major efforts were devoted to achieving improved performance in identification and identity verification in off-the-person scenarios, well-known for increased noise and variability. Here, end-to-end deep learning ECG biometric solutions were proposed and important topics were addressed such as cross-database and long-term performance, waveform relevance through explainability, and interlead conversion. Face biometrics, a natural complement to the ECG in seamless unconstrained scenarios, was also studied in this work. The open challenges of masked face recognition and interpretability in biometrics were tackled in an effort to evolve towards algorithms that are more transparent, trustworthy, and robust to significant occlusions. Within the topic of wellbeing monitoring, improved solutions to multimodal emotion recognition in groups of people and activity/violence recognition in in-vehicle scenarios were proposed. At last, we also proposed a novel way to learn template security within end-to-end models, dismissing additional separate encryption processes, and a self-supervised learning approach tailored to sequential data, in order to ensure data security and optimal performance. (...)Comment: Doctoral thesis presented and approved on the 21st of December 2022 to the University of Port
    corecore