Search CORE

613 research outputs found

Stacked Convolutional and Recurrent Neural Networks for Bird Audio Detection

Author: chollet
fagerlund
furnas
goodfellow
grill
ioffe
kingma
lasseck
mcfee
nowicki
pellegrini
smith
sprengel
srivastava
çakir
çakir
Publication venue
Publication date: 07/06/2017
Field of study

This paper studies the detection of bird calls in audio segments using stacked convolutional and recurrent neural networks. Data augmentation by blocks mixing and domain adaptation using a novel method of test mixing are proposed and evaluated in regard to making the method robust to unseen data. The contributions of two kinds of acoustic features (dominant frequency and log mel-band energy) and their combinations are studied in the context of bird audio detection. Our best achieved AUC measure on five cross-validations of the development data is 95.5% and 88.1% on the unseen evaluation data.Comment: Accepted for European Signal Processing Conference 201

arXiv.org e-Print Archive

Crossref

A Mobile Application Framework to Classify Philippine Currency Images to Audio Labels Using Deep Learning

Author: Natividad Ballesteros Concepcion Lilibeth Abellano Buban, Mary Grace Abellano Buban, Joyce Cadiz Malubay,
Publication venue: Auricle Global Society of Education and Research
Publication date: 23/04/2024
Field of study

This research presents a mobile application framework designed to empower visually impaired individuals in Legazpi City by providing real-time audio feedback for currency identification. Leveraging deep learning techniques, the proposed framework employs a robust model trained on a comprehensive dataset of Philippine currency images. The deep learning model is capable of accurately classifying various denominations of bills and coins, enabling the development of an inclusive solution for the visually impaired community. The researcher employed a qualitative approach in this study, which included a focus group discussion. Respondents were chosen using purposive sampling. Among those who responded were masseuses, chiropractors, herbal street vendors, and students. Through an online meeting, the selected participants contributed to the focus group discussion. In addition, an in-depth informal interview was conducted to gather additional information for the development of an architectural framework. Based on the result of this study, it was discovered that by implementing this architectural framework, these groups would be able to more easily identify money, increasing efficiency and reducing errors in cash transactions. The use of audio labels is particularly helpful for visually impaired individuals, as it provides an accessible way for them to independently handle and identify money

International Journal on Recent and Innovation Trends in Computing and Communication

Identifying patterns of human and bird activities using bioacoustic data

Author: Brown A
Garg S
Li R
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

In general, humans and animals often interact within the same environment at the same time. Human activities may disturb or affect some bird activities. Therefore, it is important to monitor and study the relationships between human and animal activities. This paper proposed a system able not only to automatically classify human and bird activities using bioacoustic data, but also to automatically summarize patterns of events over time. To perform automatic summarization of acoustic events, a frequency–duration graph (FDG) framework was proposed to summarize the patterns of human and bird activities. This system first performs data pre-processing work on raw bioacoustic data and then applies a support vector machine (SVM) model and a multi-layer perceptron (MLP) model to classify human and bird chirping activities before using the FDG framework to summarize results. The SVM model achieved 98% accuracy on average and the MLP model achieved 98% accuracy on average across several day-long recordings. Three case studies with real data show that the FDG framework correctly determined the patterns of human and bird activities over time and provided both statistical and graphical insight into the relationships between these two events

Multidisciplinary Digital Publishing Institute

University of Tasmania Open Access Repository

BIRB: A Generalization Benchmark for Information Retrieval in Bioacoustics

Author: Denton Tom
Dumoulin Vincent
Hamer Jenny
Kahl Stefan
Klinck Holger
Triantafillou Eleni
van Merriënboer Bart
Publication venue
Publication date: 13/12/2023
Field of study

The ability for a machine learning model to cope with differences in training and deployment conditions--e.g. in the presence of distribution shift or the generalization to new classes altogether--is crucial for real-world use cases. However, most empirical work in this area has focused on the image domain with artificial benchmarks constructed to measure individual aspects of generalization. We present BIRB, a complex benchmark centered on the retrieval of bird vocalizations from passively-recorded datasets given focal recordings from a large citizen science corpus available for training. We propose a baseline system for this collection of tasks using representation learning and a nearest-centroid search. Our thorough empirical evaluation and analysis surfaces open research directions, suggesting that BIRB fills the need for a more realistic and complex benchmark to drive progress on robustness to distribution shifts and generalization of ML models

arXiv.org e-Print Archive

Consecutive Decoding for Speech-to-text Translation

Author: Dong Qianqian
Li Lei
Wang Mingxuan
Xu Bo
Xu Shuang
Zhou Hao
Publication venue
Publication date: 05/04/2021
Field of study

Speech-to-text translation (ST), which directly translates the source language speech to the target language text, has attracted intensive attention recently. However, the combination of speech recognition and machine translation in a single model poses a heavy burden on the direct cross-modal cross-lingual mapping. To reduce the learning difficulty, we propose COnSecutive Transcription and Translation (COSTT), an integral approach for speech-to-text translation. The key idea is to generate source transcript and target translation text with a single decoder. It benefits the model training so that additional large parallel text corpus can be fully exploited to enhance the speech translation training. Our method is verified on three mainstream datasets, including Augmented LibriSpeech English-French dataset, TED English-German dataset, and TED English-Chinese dataset. Experiments show that our proposed COSTT outperforms the previous state-of-the-art methods. The code is available at https://github.com/dqqcasia/st.Comment: Accepted by AAAI 2021. arXiv admin note: text overlap with arXiv:2009.0970

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications