7 research outputs found

    An Open-set Recognition and Few-Shot Learning Dataset for Audio Event Classification in Domestic Environments

    Get PDF
    The problem of training a deep neural network with a small set of positive samples is known as few-shot learning (FSL). It is widely known that traditional deep learning (DL) algorithms usually show very good performance when trained with large datasets. However, in many applications, it is not possible to obtain such a high number of samples. In the image domain, typical FSL applications are those related to face recognition. In the audio domain, music fraud or speaker recognition can be clearly benefited from FSL methods. This paper deals with the application of FSL to the detection of specific and intentional acoustic events given by different types of sound alarms, such as door bells or fire alarms, using a limited number of samples. These sounds typically occur in domestic environments where many events corresponding to a wide variety of sound classes take place. Therefore, the detection of such alarms in a practical scenario can be considered an open-set recognition (OSR) problem. To address the lack of a dedicated public dataset for audio FSL, researchers usually make modifications on other available datasets. This paper is aimed at providing the audio recognition community with a carefully annotated dataset for FSL and OSR comprised of 1360 clips from 34 classes divided into pattern sounds and unwanted sounds. To facilitate and promote research in this area, results with two baseline systems (one trained from scratch and another based on transfer learning), are presented.Comment: To be submitted to Expert System with Application

    San Adrian: un nuevo yacimiento de la Edad del Bronce en el Norte de la Peninsula Iberica

    Get PDF
    Bronze Age studies carried out in the Cantabrian Region have traditionally focused on prestige goods and funerary contexts. As a result of this, the lack of information about daily activities, subsistence strategies, and human settlement on a regional scale is evident in the state of art. However, current research has achieved new discoveries in recent years, allowing a reconstruction of some aspects of the economic structure, settlements, material culture and the palaeoenvironment during the Bronze Age. Indeed, besides the funerary practices discovered in 1983 in San Adrian (Parztuergo Nagusia, Gipuzkoa), research has now revealed the presence of Upper Palaeolithic and Early Bronze Age occupations. This paper presents a first characterization of the retrieved evidence and a preliminary evaluation of the archaeological site and its environment. San Adrian is a tunnel-shaped cave located at 1,000 meters a.s.l. in the Aizkorri mountain range, opening a passage beneath the Atlantic-Mediterranean watershed in northern Iberia. The strategic character of this mountain site is demonstrated by the presence of Upper Palaeolithic and Bronze Age occupations, and by the construction of a road passing through it and the fortification of both its entrances in the Middle Ages. The aim of the archaeological survey started in 2008 was to identify, describe and evaluate the heritage potential of the cave, because previous fieldwork had only managed to make surface finds in the side galleries, including a medieval hoard and Bronze Age human remains. The work carried out by our research group at San Adrian includes a series of test pits and the excavation of an area nine square metres in size following stratigraphic criteria. In the current state, we identified at least two contexts corresponding to Late Upper Palaeolithic and Bronze Age occupations in the cave. Fieldwork included the sieving and flotation of sediment and the collection of samples for different types of analysis: palynology, carpology, sedimentology, and radiocarbon dating. The evidence is being studied by a multidisciplinary team according to expertise requirements for each topic: palaeobotany and environment, archaeozoology, sedimentology, geology, physical anthropology, prehistoric industries (lithics, pottery and bone) and archaeological and historical documentation. Because of its recent discovery, Upper Palaeolithic evidence remains still under study, but first results on Bronze Age layers can be presented. The ongoing archaeobotanical and archaeozoological studies reveal the exploitation of domestic plants and fauna complemented by hunting and foraging of wild species. At the same time, the archaeological artefacts and their production sequences show the exploitation of nearby resources on both sides of the mountain range, while prestige goods are absent. This evidence is also used to estimate the regularity of cave occupations and to propose a model of seasonal exploitation of the mountain environment. The results obtained reveal the exploitation of resources from both the Mediterranean and Atlantic basins, and contribute towards an understanding of the daily activities of Bronze Age societies. In addition, the evidence shows the exchange and circulation of quotidian products between the Cantabrian region and inland Iberia in other networks than those of prestige goods

    Open Set Audio Classification Using Autoencoders Trained on Few Data

    No full text
    Open-set recognition (OSR) is a challenging machine learning problem that appears when classifiers are faced with test instances from classes not seen during training. It can be summarized as the problem of correctly identifying instances from a known class (seen during training) while rejecting any unknown or unwanted samples (those belonging to unseen classes). Another problem arising in practical scenarios is few-shot learning (FSL), which appears when there is no availability of a large number of positive samples for training a recognition system. Taking these two limitations into account, a new dataset for OSR and FSL for audio data was recently released to promote research on solutions aimed at addressing both limitations. This paper proposes an audio OSR/FSL system divided into three steps: a high-level audio representation, feature embedding using two different autoencoder architectures and a multi-layer perceptron (MLP) trained on latent space representations to detect known classes and reject unwanted ones. An extensive set of experiments is carried out considering multiple combinations of openness factors (OSR condition) and number of shots (FSL condition), showing the validity of the proposed approach and confirming superior performance with respect to a baseline system based on transfer learning

    A Comparative Analysis of Residual Block Alternatives for End-to-End Audio Classification

    Get PDF
    Residual learning is known for being a learning framework that facilitates the training of very deep neural networks. Residual blocks or units are made up of a set of stacked layers, where the inputs are added back to their outputs with the aim of creating identity mappings. In practice, such identity mappings are accomplished by means of the so-called skip or shortcut connections. However, multiple implementation alternatives arise with respect to where such skip connections are applied within the set of stacked layers making up a residual block. While residual networks for image classification using convolutional neural networks (CNNs) have been widely discussed in the literature, their adoption for 1D end-to-end architectures is still scarce in the audio domain. Thus, the suitability of different residual block designs for raw audio classification is partly unknown. The purpose of this article is to compare, analyze and discuss the performance of several residual block implementations, the most commonly used in image classification problems, within a state-of-the-art CNN-based architecture for end-to-end audio classification using raw audio waveforms. Deep and careful statistical analyses over six different residual block alternatives are conducted, considering two well-known datasets and common input normalization choices. The results show that, while some significant differences in performance are observed among architectures using different residual block designs, the selection of the most suitable residual block can be highly dependent on the input data.publishedVersionPeer reviewe
    corecore