717 research outputs found

    Comparison of the efficiency of time and frequency domain descriptors for the classification of selected wind instruments

    Get PDF
    By analyzing the physical features of the time domain and the frequency domain of the audio signal, it is possible to determine its source and use appropriate algorithms to automatically classify of it. The issue of sound indexing deals with the analysis of different classes and sources - including signals from musical instruments. By calculating the values of descriptors and classifying them, we obtain information about the type of instrument and its structure - most often the material from which it was made. During the conducted research, it turned out that a different composition of the feature vector is implemented to describe brass instruments and a different one for wooden instruments. In this case, the key feature may be harmonic highs in the frequency domain. The conducted experiments concern an attempt to parameterize wind instruments (aerophones) in order to compare the classification effectiveness of time and spectral descriptors. Sounds from a tube, a flute and a soprano saxophone were used for research. The sample population for each instrument was 21.Analizując fizyczne cechy domeny czasu i domeny częstotliwości sygnału audio można okreslić jego źródło i przy pomocy własciwych algorytmów dokonac jego automatycznej klasyfikacji. Kwestia indeksacji dźwięku dotyczy analizy różnych klas i źródeł – także sygnałów wywodzących się z instrumentów muzycznych. Obliczając wartości deskryptorów i dokonując ich klasyfikacji uzyskujemy informację o typie instrumentu oraz jego budowie - najczęściej materiału, z którego zostal wykonany. Podczas prowadzonych badań okazało się, że różna kompozycja wektora cech jest implementowana do opisu instrumentów blaszanych oraz inna dla instrumentów drewnianych. W tym przypadku cechą kluczową mogą być składowe wyże harmoniczne w postaci częstotliwościowej dźwieku. Przeprowadzone eksperymenty dotyczą próby parametryzacji instrumentów dętych (aerofonów) w celu porównania skuteczności klasyfikacyjnej deskryptorów czasowych i widmowych. Do badań przeznaczono dźwieki pochodzace z tuby, fletu oraz saksofonu sopranowego. Populacja próbek dla każdego instrumentu wynosiła 21

    Spectrogram classification using dissimilarity space

    Get PDF
    In this work, we combine a Siamese neural network and different clustering techniques to generate a dissimilarity space that is then used to train an SVM for automated animal audio classification. The animal audio datasets used are (i) birds and (ii) cat sounds, which are freely available. We exploit different clustering methods to reduce the spectrograms in the dataset to a number of centroids that are used to generate the dissimilarity space through the Siamese network. Once computed, we use the dissimilarity space to generate a vector space representation of each pattern, which is then fed into an support vector machine (SVM) to classify a spectrogram by its dissimilarity vector. Our study shows that the proposed approach based on dissimilarity space performs well on both classification problems without ad-hoc optimization of the clustering methods. Moreover, results show that the fusion of CNN-based approaches applied to the animal audio classification problem works better than the stand-alone CNNs

    A survey on artificial intelligence-based acoustic source identification

    Get PDF
    The concept of Acoustic Source Identification (ASI), which refers to the process of identifying noise sources has attracted increasing attention in recent years. The ASI technology can be used for surveillance, monitoring, and maintenance applications in a wide range of sectors, such as defence, manufacturing, healthcare, and agriculture. Acoustic signature analysis and pattern recognition remain the core technologies for noise source identification. Manual identification of acoustic signatures, however, has become increasingly challenging as dataset sizes grow. As a result, the use of Artificial Intelligence (AI) techniques for identifying noise sources has become increasingly relevant and useful. In this paper, we provide a comprehensive review of AI-based acoustic source identification techniques. We analyze the strengths and weaknesses of AI-based ASI processes and associated methods proposed by researchers in the literature. Additionally, we did a detailed survey of ASI applications in machinery, underwater applications, environment/event source recognition, healthcare, and other fields. We also highlight relevant research directions

    Evaluation of MPEG-7-based audio descriptors for animal voice recognition over wireless acoustic sensor networks

    Get PDF
    This article belongs to the Special Issue State-of-the-Art Sensors Technology in Spain 2015Environmental audio monitoring is a huge area of interest for biologists all over the world. This is why some audio monitoring system have been proposed in the literature, which can be classified into two different approaches: acquirement and compression of all audio patterns in order to send them as raw data to a main server; or specific recognition systems based on audio patterns. The first approach presents the drawback of a high amount of information to be stored in a main server. Moreover, this information requires a considerable amount of effort to be analyzed. The second approach has the drawback of its lack of scalability when new patterns need to be detected. To overcome these limitations, this paper proposes an environmental Wireless Acoustic Sensor Network architecture focused on use of generic descriptors based on an MPEG-7 standard. These descriptors demonstrate it to be suitable to be used in the recognition of different patterns, allowing a high scalability. The proposed parameters have been tested to recognize different behaviors of two anuran species that live in Spanish natural parks; the Epidalea calamita and the Alytes obstetricans toads, demonstrating to have a high classification performance.Consejería de Innovación, Ciencia y Empresa, Junta de Andalucía, Spain TIC-570

    Mapping Acoustic and Semantic Dimensions of Auditory Perception

    Get PDF
    Auditory categorisation is a function of sensory perception which allows humans to generalise across many different sounds present in the environment and classify them into behaviourally relevant categories. These categories cover not only the variance of acoustic properties of the signal but also a wide variety of sound sources. However, it is unclear to what extent the acoustic structure of sound is associated with, and conveys, different facets of semantic category information. Whether people use such data and what drives their decisions when both acoustic and semantic information about the sound is available, also remains unknown. To answer these questions, we used the existing methods broadly practised in linguistics, acoustics and cognitive science, and bridged these domains by delineating their shared space. Firstly, we took a model-free exploratory approach to examine the underlying structure and inherent patterns in our dataset. To this end, we ran principal components, clustering and multidimensional scaling analyses. At the same time, we drew sound labels’ semantic space topography based on corpus-based word embeddings vectors. We then built an LDA model predicting class membership and compared the model-free approach and model predictions with the actual taxonomy. Finally, by conducting a series of web-based behavioural experiments, we investigated whether acoustic and semantic topographies relate to perceptual judgements. This analysis pipeline showed that natural sound categories could be successfully predicted based on the acoustic information alone and that perception of natural sound categories has some acoustic grounding. Results from our studies help to recognise the role of physical sound characteristics and their meaning in the process of sound perception and give an invaluable insight into the mechanisms governing the machine-based and human classifications

    Generative Music from Fuzzy Logic and Probability: A Portfolio of Electroacoustic Compositions

    Get PDF
    This portfolio of thirteen recorded works was composed as an investigation into the application of generative processes to electroacoustic music, paying particular attention to the use of fuzzy logic and the rule-based constraint of chance events. These works were developed by a rolling process of program design and musical composition, focusing on two areas: the generation and transformation of large groups of sounds within a multi-dimensional parameter space for acousmatic composition (using the author’s software, Audio Spray Gun) and the real-time selection of sounds using audio descriptors, principally for live performance by instrument and electronics. Later stages of the project attempted to unite these processes in two ways: by the agent-based generation of large sound-groups for multichannel audio from live instruments or pre-recorded audio datasets and by the software generation of such groups for fixed-media composition using trajectories and transformations in ‘timbre space’. An accompanying document charts the development of these works with a programme note, technical discussion and performance records for each, along with spectrograms and scores as appropriate. It also describes the programming methods used and discusses the implications and limitations of these approaches, particularly for object-based spatial music and timbre selection

    Classification of Broadcast News Audio Data Employing Binary Decision Architecture

    Get PDF
    A novel binary decision architecture (BDA) for broadcast news audio classification task is presented in this paper. The idea of developing such architecture came from the fact that the appropriate combination of multiple binary classifiers for two-class discrimination problem can reduce a miss-classification error without rapid increase in computational complexity. The core element of classification architecture is represented by a binary decision (BD) algorithm that performs discrimination between each pair of acoustic classes, utilizing two types of decision functions. The first one is represented by a simple rule-based approach in which the final decision is made according to the value of selected discrimination parameter. The main advantage of this solution is relatively low processing time needed for classification of all acoustic classes. The cost for that is low classification accuracy. The second one employs support vector machine (SVM) classifier. In this case, the overall classification accuracy is conditioned by finding the optimal parameters for decision function resulting in higher computational complexity and better classification performance. The final form of proposed BDA is created by combining four BD discriminators supplemented by decision table. The effectiveness of proposed BDA, utilizing rule-based approach and the SVM classifier, is compared with two most popular strategies for multiclass classification, namely the binary decision trees (BDT) and the One-Against-One SVM (OAOSVM). Experimental results show that the proposed classification architecture can decrease the overall classification error in comparison with the BDT architecture. On the contrary, an optimization technique for selecting the optimal set of training data is needed in order to overcome the OAOSVM

    A new methodology for modelling urban soundscapes: a psychometric revisitation of the current standard and a Bayesian approach for individual response prediction

    Get PDF
    Measuring how the urban sound environment is perceived by public space users, which is usually referred as urban soundscape, is a research field of particular in terest for a broad and multidisciplinary scientific community besides private and public agencies. The need for a tool to quantify soundscapes would provide much support to urban planning and design, so to public healthcare. Soundscape liter ature still does not show a unique strategy for addressing this topic. Soundscape definition, data collection, and analysis tools have been recently standardised and published in three respective ISO (International Organisation for Standardization) items. In particular, the third item of the ISO series defines the calculation of the soundscape experience of public space users by means of multiple Likert scales. In this thesis, with regards to the third item of the soundscape ISO series, the soundscape data analysis standard method is questioned and a correction paradigm is proposed. This thesis questiones the assumption of a point-wise superimposition match across the Likert scales used during the soundscape assessment task. In order to do that, the thesis presents a new method which introduces correction values, or metric, for adjusting the scales in accordance to the results of common scaling behaviours found across the investigated locations. In order to validate the results, the outcome of the new metric is used as tar get to predict the individual experience of soundscapes from the participants. In comparison to the current ISO output, the new correction values reveal to achievea better predictability in both linear and non-linear modelling by increasing the ac-curacy of prediction of individual responses up to 52.6% (8.3% higher than theaccuracy obtained with the standard method).Finally, the new metric is used to validate the collection of data samples acrossseveral locations on individual questionnaires responses. Models are trained, in aiterative way, on all the locations except the one used during the validation. Thisprocedure provides a strong validating framework for predicting individual subjectassessments belonging to locations totally unseen during the model training. The results show that the combination of the new metrics with the proposed modelling structure achieves good performance on individual responses across the dataset withan average accuracy above 54%. A new index for measuring the soundscape is fi-nally introduced based on the percentage of people agreeing on soundscape pleas-antness calculated from the new proposed metric and performing a r-squared valueequals to 0.87.The framework introduced is limited by cultural and linguistic factors. Indeed,different corrected metric space are expected to be found when data is collected from different countries or urban context. The current values found in this thesis areso expected to be valid in large British cities and eventually in international hub andcapital cities. In these scenarios the corrected metric would provide a more realisticand direction-invariant representation of how the urban soundscape is perceived compared to the current ISO tool, showing that some components in the circumplex model are perceived softer or stronger according to the dimension. Future research will need to understand better the limitations of this new ramework and to extendand compare it towards different urban, cultural, and linguistic contexts
    corecore