1,271 research outputs found

    Sound Event Localization, Detection, and Tracking by Deep Neural Networks

    Get PDF
    In this thesis, we present novel sound representations and classification methods for the task of sound event localization, detection, and tracking (SELDT). The human auditory system has evolved to localize multiple sound events, recognize and further track their motion individually in an acoustic environment. This ability of humans makes them context-aware and enables them to interact with their surroundings naturally. Developing similar methods for machines will provide an automatic description of social and human activities around them and enable machines to be context-aware similar to humans. Such methods can be employed to assist the hearing impaired to visualize sounds, for robot navigation, and to monitor biodiversity, the home, and cities. A real-life acoustic scene is complex in nature, with multiple sound events that are temporally and spatially overlapping, including stationary and moving events with varying angular velocities. Additionally, each individual sound event class, for example, a car horn can have a lot of variabilities, i.e., different cars have different horns, and within the same model of the car, the duration and the temporal structure of the horn sound is driver dependent. Performing SELDT in such overlapping and dynamic sound scenes while being robust is challenging for machines. Hence we propose to investigate the SELDT task in this thesis and use a data-driven approach using deep neural networks (DNNs). The sound event detection (SED) task requires the detection of onset and offset time for individual sound events and their corresponding labels. In this regard, we propose to use spatial and perceptual features extracted from multichannel audio for SED using two different DNNs, recurrent neural networks (RNNs) and convolutional recurrent neural networks (CRNNs). We show that using multichannel audio features improves the SED performance for overlapping sound events in comparison to traditional single-channel audio features. The proposed novel features and methods produced state-of-the-art performance for the real-life SED task and won the IEEE AASP DCASE challenge consecutively in 2016 and 2017. Sound event localization is the task of spatially locating the position of individual sound events. Traditionally, this has been approached using parametric methods. In this thesis, we propose a CRNN for detecting the azimuth and elevation angles of multiple temporally overlapping sound events. This is the first DNN-based method performing localization in complete azimuth and elevation space. In comparison to parametric methods which require the information of the number of active sources, the proposed method learns this information directly from the input data and estimates their respective spatial locations. Further, the proposed CRNN is shown to be more robust than parametric methods in reverberant scenarios. Finally, the detection and localization tasks are performed jointly using a CRNN. This method additionally tracks the spatial location with time, thus producing the SELDT results. This is the first DNN-based SELDT method and is shown to perform equally with stand-alone baselines for SED, localization, and tracking. The proposed SELDT method is evaluated on nine datasets that represent anechoic and reverberant sound scenes, stationary and moving sources with varying velocities, a different number of overlapping sound events and different microphone array formats. The results show that the SELDT method can track multiple overlapping sound events that are both spatially stationary and moving

    DMRN+16: Digital Music Research Network One-day Workshop 2021

    Get PDF
    DMRN+16: Digital Music Research Network One-day Workshop 2021 Queen Mary University of London Tuesday 21st December 2021 Keynote speakers Keynote 1. Prof. Sophie Scott -Director, Institute of Cognitive Neuroscience, UCL. Title: "Sound on the brain - insights from functional neuroimaging and neuroanatomy" Abstract In this talk I will use functional imaging and models of primate neuroanatomy to explore how sound is processed in the human brain. I will demonstrate that sound is represented cortically in different parallel streams. I will expand this to show how this can impact on the concept of auditory perception, which arguably incorporates multiple kinds of distinct perceptual processes. I will address the roles that subcortical processes play in this, and also the contributions from hemispheric asymmetries. Keynote 2: Prof. Gus Xia - Assistant Professor at NYU Shanghai Title: "Learning interpretable music representations: from human stupidity to artificial intelligence" Abstract Gus has been leading the Music X Lab in developing intelligent systems that help people better compose and learn music. In this talk, he will show us the importance of music representation for both humans and machines, and how to learn better music representations via the design of inductive bias. Once we got interpretable music representations, the potential applications are limitless

    CebuaNER:A New Baseline Cebuano Named Entity Recognition Model

    Get PDF
    Despite being one of the most linguistically diverse groups of countries, computational linguistics and language processing research in Southeast Asia has struggled to match the level of countries from the Global North. Thus, initiatives such as open-sourcing corpora and the development of baseline models for basic language processing tasks are important stepping stones to encourage the growth of research efforts in the field. To answer this call, we introduce CebuaNER, a new baseline model for named entity recognition (NER) in the Cebuano language. Cebuano is the second most-used native language in the Philippines, with over 20 million speakers. To build the model, we collected and annotated over 4,000 news articles, the largest of any work in the language, retrieved from online local Cebuano platforms to train algorithms such as Conditional Random Field and Bidirectional LSTM. Our findings show promising results as a new baseline model, achieving over 70% performance on precision, recall, and F1 across all entity tags, as well as potential efficacy in a crosslingual setup with Tagalog

    Harmonic Plus Noise Model for Concatenative Speech Synthesis

    Get PDF
    This project develops the new model Harmonic Plus Noise applied for the concatenative speech synthesis. The software is composed of an analysis part (off-line process) applied on the first initial database and a synthesis part (real time process) applied on the HNM database and the prododic modifications from FESTIVAL. The future work consists of the integretion into the HMM-based speech synthesis

    Usability engineering of interactive voice responsive (IVR) systems in oral users of Southern Africa

    Get PDF
    Includes bibliographical references (p. 96-109).This research study focuses on the feasibility of using the telephone as a tool for information access in the oral communities of Southern Africa. The OpenPhone and BGR systems are used as case studies and their designs have been influenced by field studies with the targeted users. The OpenPhone project aims to design an Interactive Voice Response (IVR) health information system that enables people who are caregivers for HIV/AIDS infected children to access relevant care-giving information by using a telephone in their native language of Setswana in Botswana, Southern Africa. The BGR system allows soccer fans to access results of recently played matches in Premier Soccer League (PSL) of South Africa

    Composing literacy: Exploring how musical aptitude explains technical reading abilities

    Get PDF
    Several studies have indicated a connection between musical skills and reading-related abilities. However, the underlying reasons to the connection have been unclear. I studied whether subskills within musical aptitude can explain the relationship between music and reading in 8–11-year-old children (N = 66). The children were tested for musical aptitude subskills: pitch discrimination, temporal discrimination, and tonal memory. The focus lay on technical reading abilities, namely performance in reading fluency and sentence comprehension in the Finnish primary school reading test. Linear regression models were used to assess whether the subskills, both together and separately, account for the variance in reading performance. The combination of musical aptitude subskills was related to technical reading abilities. Independently of other subskills, tonal memory explained both reading fluency and sentence comprehension while pitch discrimination explained only reading fluency. The findings support the hypothesis that musical aptitude and reading-related abilities share common mechanisms, such as pitch perception. More extensive research on how musical aptitude and reading are related is needed. Information about the underlying mechanisms in them could be used to create music interventions to support reading acquisition

    A Literary Study of Mbube Proverbs

    Get PDF
    This paper examines the literary and aesthetic content of proverbs of Mbube people of Northern Cross River in Nigeria. Proverbs are pithy statements embodying community’s ethical and philosophical values and rich in artistic value and performance. A study of these proverbs helps in understanding the people’s culture and world view. Mbube proverbs which are rich in idioms and metaphors effectively perform the functions of inculcating moral values. The paper examines the literary content of Mbube proverbs using the anthropological and cognitive views of proverbs. This theory sees proverbs as species of metaphor. Through participant- observation, about 100 proverbs were collected taking into consideration the occasion and contexts of performance. The paper concludes that proverbs in Mbube function effectively in deepening and enriching communication and help in fostering communal harmony and cohesion and their poetic content provides for economy of words.Keywords: proverbs, communication, Mbube culture, aesthetic

    Educational content in the performing arts : tradition and Christianity in Kenya

    Get PDF
    Includes bibliographical references (p.235-263).The performing arts (a combination of music, dance and dramatisation) in the church in Kenya have not received much scholarly attention. These performing arts as adopted by Christian dance groups in Kenya have not been fully accepted into Christian circles because of the indigenous and popular music influences that govern them. This study therefore sets out to determine the educational role that the performing arts in the church in Nairobi play as demonstrated by a Nairobi Christian dance group, the Maximum Miracle Melodies

    African linguistics across the disciplines

    Get PDF
    Since the hiring of its first Africanist linguist Carleton Hodge in 1964, Indiana University’s Department of Linguistics has had a strong and continuing presence in the study of African languages and linguistics through the work of its faculty and of its graduates on the faculties of many other universities. Research on African linguistics at IU has covered some of the major language groups spoken on the African continent. Carleton Hodge’s work on Ancient Egyptian and Hausa, Paul Newman’s work on Hausa and Chadic languages, and Roxanna Ma Newman’s work on Hausa language structure and pedagogy have been some of the most important studies on Afro-Asiatic linguistics. With respect to Niger-Congo languages, the work of Charles Bird on Bambara and the Mande languages, Robert Botne’s work on Bantu structure (especially tense and aspect), Samuel Obeng and Colin Painter’s work on Ghanaian Languages (phonetics, phonology, and pragmatics), Robert Port’s studies on Swahili, and Erhard Voeltz's s
    corecore