676 research outputs found

    Retraining-free Customized ASR for Enharmonic Words Based on a Named-Entity-Aware Model and Phoneme Similarity Estimation

    Full text link
    End-to-end automatic speech recognition (E2E-ASR) has the potential to improve performance, but a specific issue that needs to be addressed is the difficulty it has in handling enharmonic words: named entities (NEs) with the same pronunciation and part of speech that are spelled differently. This often occurs with Japanese personal names that have the same pronunciation but different Kanji characters. Since such NE words tend to be important keywords, ASR easily loses user trust if it misrecognizes them. To solve these problems, this paper proposes a novel retraining-free customized method for E2E-ASRs based on a named-entity-aware E2E-ASR model and phoneme similarity estimation. Experimental results show that the proposed method improves the target NE character error rate by 35.7% on average relative to the conventional E2E-ASR model when selecting personal names as a target NE.Comment: accepted by INTERSPEECH202

    Biogeographical ‍distributions of trickster animals

    Get PDF
    Human language encompasses almost endless potential for meaning, and folklore can theoretically incorporate themes beyond time and space. However, actual distributions of the themes are not always universal and their constraints remain unclear. Here, we specifically focused on zoological folklore and aimed to reveal what restricts the distribution of trickster animals in folklore. We applied the biogeographical methodology to 16 taxonomic categories of trickster (455 data) and real (93 090 848 data) animals obtained from large databases. Our analysis revealed that the distribution of trickster animals was restricted by their presence in the vicinity and, more importantly, the presence of their corresponding real animals. Given that the distributions of real animals are restricted by the annual mean temperature and annual precipitation, these climatic conditions indirectly affect the distribution of trickster animals. Our study, applying biogeographical methods to culture, paves the way to a deeper understanding of the interactions between ecology and culture

    Real-Time Expression Control System for Wearable Animatronics

    Get PDF
    The animatoronics is used for the expression of the character with many pic-ture works which includes movies. As for the animatronics mask that the per-son wears of the lively character, the expression of lively character is truly possible because the actor\u27s performance is reflected directly. In this research, I suggest using an animatronics mask in order to reflect the character\u27s feelings and expressions real time by the actor wearing the mask.Conventionally, it is necessary for the actor to look good with the movements and facial expressions beforehand in order to determine if the actor can intuitively play the character but thinks that an actor can play a character intuitively by using this system.Art and Design Research for Sustainable Development ; September 22, 2018Conference: Tsukuba Global Science Week 2018Date: September 20-22, 2018Venue: Tsukuba International Congress Center Sponsored: University of Tsukub

    Improvement of DOA Estimation by using Quaternion Output in Sound Event Localization and Detection

    Get PDF
    This paper describes improvement of Direction of Arrival (DOA) estimation performance using quaternion output in the Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 Task 3. DCASE 2019 Task3 focuses on the sound event localization and detection (SELD) which is a task that simultaneously estimates the sound source direction in addition to conventional sound event detection (SED). In the baseline method, the sound source direction angle is directly regressed. However, the angle is a periodic function and it has discontinuities which may make learning unstable. Specifical-ly, even though -180 deg and 180 deg are in the same direc-tion, a large loss is calculated. Estimating DOA angles with a classification approach instead of regression can solve such instability of discontinuities but this causes limitation of reso-lution. In this paper, we propose to introduce the quaternion which is a continuous function into the output layer of the neural network instead of directly estimating the sound source direction angle. This method can be easily implemented only by changing the output of the existing neural network, and thus does not significantly increase the number of parameters in the middle layers. Experimental results show that proposed method improves the DOA estimation without significantly increasing the number of parameters.24424

    Robust sound source mapping using three-layered selective audio rays for mobile robots

    Full text link
    © 2016 IEEE. This paper investigates sound source mapping in a real environment using a mobile robot. Our approach is based on audio ray tracing which integrates occupancy grids and sound source localization using a laser range finder and a microphone array. Previous audio ray tracing approaches rely on all observed rays and grids. As such observation errors caused by sound reflection, sound occlusion, wall occlusion, sounds at misdetected grids, etc. can significantly degrade the ability to locate sound sources in a map. A three-layered selective audio ray tracing mechanism is proposed in this work. The first layer conducts frame-based unreliable ray rejection (sensory rejection) considering sound reflection and wall occlusion. The second layer introduces triangulation and audio tracing to detect falsely detected sound sources, rejecting audio rays associated to these misdetected sounds sources (short-term rejection). A third layer is tasked with rejecting rays using the whole history (long-term rejection) to disambiguate sound occlusion. Experimental results under various situations are presented, which proves the effectiveness of our method

    From Blurry to Brilliant Detection: YOLOv5-Based Aerial Object Detection with Super Resolution

    Full text link
    The demand for accurate object detection in aerial imagery has surged with the widespread use of drones and satellite technology. Traditional object detection models, trained on datasets biased towards large objects, struggle to perform optimally in aerial scenarios where small, densely clustered objects are prevalent. To address this challenge, we present an innovative approach that combines super-resolution and an adapted lightweight YOLOv5 architecture. We employ a range of datasets, including VisDrone-2023, SeaDroneSee, VEDAI, and NWPU VHR-10, to evaluate our model's performance. Our Super Resolved YOLOv5 architecture features Transformer encoder blocks, allowing the model to capture global context and context information, leading to improved detection results, especially in high-density, occluded conditions. This lightweight model not only delivers improved accuracy but also ensures efficient resource utilization, making it well-suited for real-time applications. Our experimental results demonstrate the model's superior performance in detecting small and densely clustered objects, underlining the significance of dataset choice and architectural adaptation for this specific task. In particular, the method achieves 52.5% mAP on VisDrone, exceeding top prior works. This approach promises to significantly advance object detection in aerial imagery, contributing to more accurate and reliable results in a variety of real-world applications
    corecore