2,202 research outputs found

    Sound Event Detection by Exploring Audio Sequence Modelling

    Get PDF
    Everyday sounds in real-world environments are a powerful source of information by which humans can interact with their environments. Humans can infer what is happening around them by listening to everyday sounds. At the same time, it is a challenging task for a computer algorithm in a smart device to automatically recognise, understand, and interpret everyday sounds. Sound event detection (SED) is the process of transcribing an audio recording into sound event tags with onset and offset time values. This involves classification and segmentation of sound events in the given audio recording. SED has numerous applications in everyday life which include security and surveillance, automation, healthcare monitoring, multimedia information retrieval, and assisted living technologies. SED is to everyday sounds what automatic speech recognition (ASR) is to speech and automatic music transcription (AMT) is to music. The fundamental questions in designing a sound recognition system are, which portion of a sound event should the system analyse, and what proportion of a sound event should the system process in order to claim a confident detection of that particular sound event. While the classification of sound events has improved a lot in recent years, it is considered that the temporal-segmentation of sound events has not improved in the same extent. The aim of this thesis is to propose and develop methods to improve the segmentation and classification of everyday sound events in SED models. In particular, this thesis explores the segmentation of sound events by investigating audio sequence encoding-based and audio sequence modelling-based methods, in an effort to improve the overall sound event detection performance. In the first phase of this thesis, efforts are put towards improving sound event detection by explicitly conditioning the audio sequence representations of an SED model using sound activity detection (SAD) and onset detection. To achieve this, we propose multi-task learning-based SED models in which SAD and onset detection are used as auxiliary tasks for the SED task. The next part of this thesis explores self-attention-based audio sequence modelling, which aggregates audio representations based on temporal relations within and between sound events, scored on the basis of the similarity of sound event portions in audio event sequences. We propose SED models that include memory-controlled, adaptive, dynamic, and source separation-induced self-attention variants, with the aim to improve overall sound recognition

    Advances and Applications of DSmT for Information Fusion. Collected Works, Volume 5

    Get PDF
    This fifth volume on Advances and Applications of DSmT for Information Fusion collects theoretical and applied contributions of researchers working in different fields of applications and in mathematics, and is available in open-access. The collected contributions of this volume have either been published or presented after disseminating the fourth volume in 2015 in international conferences, seminars, workshops and journals, or they are new. The contributions of each part of this volume are chronologically ordered. First Part of this book presents some theoretical advances on DSmT, dealing mainly with modified Proportional Conflict Redistribution Rules (PCR) of combination with degree of intersection, coarsening techniques, interval calculus for PCR thanks to set inversion via interval analysis (SIVIA), rough set classifiers, canonical decomposition of dichotomous belief functions, fast PCR fusion, fast inter-criteria analysis with PCR, and improved PCR5 and PCR6 rules preserving the (quasi-)neutrality of (quasi-)vacuous belief assignment in the fusion of sources of evidence with their Matlab codes. Because more applications of DSmT have emerged in the past years since the apparition of the fourth book of DSmT in 2015, the second part of this volume is about selected applications of DSmT mainly in building change detection, object recognition, quality of data association in tracking, perception in robotics, risk assessment for torrent protection and multi-criteria decision-making, multi-modal image fusion, coarsening techniques, recommender system, levee characterization and assessment, human heading perception, trust assessment, robotics, biometrics, failure detection, GPS systems, inter-criteria analysis, group decision, human activity recognition, storm prediction, data association for autonomous vehicles, identification of maritime vessels, fusion of support vector machines (SVM), Silx-Furtif RUST code library for information fusion including PCR rules, and network for ship classification. Finally, the third part presents interesting contributions related to belief functions in general published or presented along the years since 2015. These contributions are related with decision-making under uncertainty, belief approximations, probability transformations, new distances between belief functions, non-classical multi-criteria decision-making problems with belief functions, generalization of Bayes theorem, image processing, data association, entropy and cross-entropy measures, fuzzy evidence numbers, negator of belief mass, human activity recognition, information fusion for breast cancer therapy, imbalanced data classification, and hybrid techniques mixing deep learning with belief functions as well

    The 2023 wearable photoplethysmography roadmap

    Get PDF
    Photoplethysmography is a key sensing technology which is used in wearable devices such as smartwatches and fitness trackers. Currently, photoplethysmography sensors are used to monitor physiological parameters including heart rate and heart rhythm, and to track activities like sleep and exercise. Yet, wearable photoplethysmography has potential to provide much more information on health and wellbeing, which could inform clinical decision making. This Roadmap outlines directions for research and development to realise the full potential of wearable photoplethysmography. Experts discuss key topics within the areas of sensor design, signal processing, clinical applications, and research directions. Their perspectives provide valuable guidance to researchers developing wearable photoplethysmography technology

    Seamless Multimodal Biometrics for Continuous Personalised Wellbeing Monitoring

    Full text link
    Artificially intelligent perception is increasingly present in the lives of every one of us. Vehicles are no exception, (...) In the near future, pattern recognition will have an even stronger role in vehicles, as self-driving cars will require automated ways to understand what is happening around (and within) them and act accordingly. (...) This doctoral work focused on advancing in-vehicle sensing through the research of novel computer vision and pattern recognition methodologies for both biometrics and wellbeing monitoring. The main focus has been on electrocardiogram (ECG) biometrics, a trait well-known for its potential for seamless driver monitoring. Major efforts were devoted to achieving improved performance in identification and identity verification in off-the-person scenarios, well-known for increased noise and variability. Here, end-to-end deep learning ECG biometric solutions were proposed and important topics were addressed such as cross-database and long-term performance, waveform relevance through explainability, and interlead conversion. Face biometrics, a natural complement to the ECG in seamless unconstrained scenarios, was also studied in this work. The open challenges of masked face recognition and interpretability in biometrics were tackled in an effort to evolve towards algorithms that are more transparent, trustworthy, and robust to significant occlusions. Within the topic of wellbeing monitoring, improved solutions to multimodal emotion recognition in groups of people and activity/violence recognition in in-vehicle scenarios were proposed. At last, we also proposed a novel way to learn template security within end-to-end models, dismissing additional separate encryption processes, and a self-supervised learning approach tailored to sequential data, in order to ensure data security and optimal performance. (...)Comment: Doctoral thesis presented and approved on the 21st of December 2022 to the University of Port

    University of Windsor Graduate Calendar 2023 Spring

    Get PDF
    https://scholar.uwindsor.ca/universitywindsorgraduatecalendars/1027/thumbnail.jp

    A novel optogenetics-based therapy for obstructive sleep apnoea

    Full text link
    Obstructive sleep apnoea (OSA) is characterised by repeat upper airway narrowing and/or collapse during sleep. Many patients are sub-optimally treated due to poor tolerance or incomplete response to established therapies. We propose a novel, optogenetics-based therapy, that enables light-stimulation induced upper airway dilator muscle contractions to maintain airway patency. The primary aims of this thesis were to determine feasibility in a rodent model of OSA, and identify effective optogenetic constructs for activating upper airway muscles. Chapters 2 and 3 outline the development of a novel construct for the expression of light-sensitive proteins (opsins) in upper airway muscles, comparing two promotors and two recombinant adeno-associated virus capsids (rAAV) for optogenetic gene transfer. Results show that a muscle-specific promotor (tMCK) was superior to a non-specific promotor (CAG). With tMCK, opsin expression in the tongue was 470% greater (p=0.013, RM-ANOVA), brainstem expression was abolished, and light stimulation facilitated a 66% increase in muscle activity from that recorded during unstimulated breaths in an acute model of OSA (p<0.001, linear mixed model) (Chapter 2). Moreover, a novel, highly myotropic rAAV serotype, AAVMYO, was superior to a wild-type serotype, AAV9. The AAVMYO serotype driven by tMCK facilitated a further increase in muscle activity with light stimulation to 194% of that recorded during unstimulated breaths (p<0.001, linear mixed model) (Chapter 3). Finally, ultrasound imaging confirmed that the optimised construct was able to generate effective light-induced muscle contractions and airway dilation (Chapter 4). A secondary aim was to advance preclinical trials for the proposed therapy. To this end, a surgical protocol for chronic implantation of light delivery hardware and recording electrodes in rodents was developed (Chapter 5). The final protocol will allow us to determine the effects of acute and chronic light stimulation on opsin-expressing upper airway muscles during natural sleep. In summary, Chapters 2 to 4 provide proof-of-concept for a non-invasive optogenetics-based OSA therapy. The combination of a muscle-specific promotor and a muscle-specific viral vector presents a novel and highly effective method of inducing light sensitivity into skeletal muscle and facilitating light-evoked airway dilation. Finally, Chapter 5 commences the development of a surgical protocol that will aid ongoing preclinical trials

    Computational Imaging for Phase Retrieval and Biomedical Applications

    Get PDF
    In conventional imaging, optimizing hardware is prioritized to enhance image quality directly. Digital signal processing is viewed as supplementary. Computational imaging intentionally distorts images through modulation schemes in illumination or sensing. Then its reconstruction algorithms extract desired object information from raw data afterwards. Co-designing hardware and algorithms reduces demands on hardware and achieves the same or even better image quality. Algorithm design is at the heart of computational imaging, with model-based inverse problem or data-driven deep learning methods as approaches. This thesis presents research work from both perspectives, with a primary focus on the phase retrieval issue in computational microscopy and the application of deep learning techniques to address biomedical imaging challenges. The first half of the thesis begins with Fourier ptychography, which was employed to overcome chromatic aberration problems in multispectral imaging. Then, we proposed a novel computational coherent imaging modality based on Kramers-Kronig relations, aiming to replace Fourier ptychography as a non-iterative method. While this approach showed promise, it lacks certain essential characteristics of the original Fourier ptychography. To address this limitation, we introduced two additional algorithms to form a whole package scheme. Through comprehensive evaluation, we demonstrated that the combined scheme outperforms Fourier ptychography in achieving high-resolution, large field-of-view, aberration-free coherent imaging. The second half of the thesis shifts focus to deep-learning-based methods. In one project, we optimized the scanning strategy and image processing pipeline of an epifluorescence microscope to address focus issues. Additionally, we leveraged deep-learning-based object detection models to automate cell analysis tasks. In another project, we predicted the polarity status of mouse embryos from bright field images using adapted deep learning models. These findings highlight the capability of computational imaging to automate labor-intensive processes, and even outperform humans in challenging tasks.</p

    Audiovisual speech perception in cochlear implant patients

    Get PDF
    Hearing with a cochlear implant (CI) is very different compared to a normal-hearing (NH) experience, as the CI can only provide limited auditory input. Nevertheless, the central auditory system is capable of learning how to interpret such limited auditory input such that it can extract meaningful information within a few months after implant switch-on. The capacity of the auditory cortex to adapt to new auditory stimuli is an example of intra-modal plasticity — changes within a sensory cortical region as a result of altered statistics of the respective sensory input. However, hearing deprivation before implantation and restoration of hearing capacities after implantation can also induce cross-modal plasticity — changes within a sensory cortical region as a result of altered statistics of a different sensory input. Thereby, a preserved cortical region can, for example, support a deprived cortical region, as in the case of CI users which have been shown to exhibit cross-modal visual-cortex activation for purely auditory stimuli. Before implantation, during the period of hearing deprivation, CI users typically rely on additional visual cues like lip-movements for understanding speech. Therefore, it has been suggested that CI users show a pronounced binding of the auditory and visual systems, which may allow them to integrate auditory and visual speech information more efficiently. The projects included in this thesis investigate auditory, and particularly audiovisual speech processing in CI users. Four event-related potential (ERP) studies approach the matter from different perspectives, each with a distinct focus. The first project investigates how audiovisually presented syllables are processed by CI users with bilateral hearing loss compared to NH controls. Previous ERP studies employing non-linguistic stimuli and studies using different neuroimaging techniques found distinct audiovisual interactions in CI users. However, the precise timecourse of cross-modal visual-cortex recruitment and enhanced audiovisual interaction for speech related stimuli is unknown. With our ERP study we fill this gap, and we present differences in the timecourse of audiovisual interactions as well as in cortical source configurations between CI users and NH controls. The second study focuses on auditory processing in single-sided deaf (SSD) CI users. SSD CI patients experience a maximally asymmetric hearing condition, as they have a CI on one ear and a contralateral NH ear. Despite the intact ear, several behavioural studies have demonstrated a variety of beneficial effects of restoring binaural hearing, but there are only few ERP studies which investigate auditory processing in SSD CI users. Our study investigates whether the side of implantation affects auditory processing and whether auditory processing via the NH ear of SSD CI users works similarly as in NH controls. Given the distinct hearing conditions of SSD CI users, the question arises whether there are any quantifiable differences between CI user with unilateral hearing loss and bilateral hearing loss. In general, ERP studies on SSD CI users are rather scarce, and there is no study on audiovisual processing in particular. Furthermore, there are no reports on lip-reading abilities of SSD CI users. To this end, in the third project we extend the first study by including SSD CI users as a third experimental group. The study discusses both differences and similarities between CI users with bilateral hearing loss and CI users with unilateral hearing loss as well as NH controls and provides — for the first time — insights into audiovisual interactions in SSD CI users. The fourth project investigates the influence of background noise on audiovisual interactions in CI users and whether a noise-reduction algorithm can modulate these interactions. It is known that in environments with competing background noise listeners generally rely more strongly on visual cues for understanding speech and that such situations are particularly difficult for CI users. As shown in previous auditory behavioural studies, the recently introduced noise-reduction algorithm "ForwardFocus" can be a useful aid in such cases. However, the questions whether employing the algorithm is beneficial in audiovisual conditions as well and whether using the algorithm has a measurable effect on cortical processing have not been investigated yet. In this ERP study, we address these questions with an auditory and audiovisual syllable discrimination task. Taken together, the projects included in this thesis contribute to a better understanding of auditory and especially audiovisual speech processing in CI users, revealing distinct processing strategies employed to overcome the limited input provided by a CI. The results have clinical implications, as they suggest that clinical hearing assessments, which are currently purely auditory, should be extended to audiovisual assessments. Furthermore, they imply that rehabilitation including audiovisual training methods may be beneficial for all CI user groups for quickly achieving the most effective CI implantation outcome

    University of Windsor Graduate Calendar 2023 Winter

    Get PDF
    https://scholar.uwindsor.ca/universitywindsorgraduatecalendars/1026/thumbnail.jp
    corecore