939 research outputs found
Multi-label Ferns for Efficient Recognition of Musical Instruments in Recordings
In this paper we introduce multi-label ferns, and apply this technique for
automatic classification of musical instruments in audio recordings. We compare
the performance of our proposed method to a set of binary random ferns, using
jazz recordings as input data. Our main result is obtaining much faster
classification and higher F-score. We also achieve substantial reduction of the
model size
Adaptive Multi-Class Audio Classification in Noisy In-Vehicle Environment
With ever-increasing number of car-mounted electric devices and their
complexity, audio classification is increasingly important for the automotive
industry as a fundamental tool for human-device interactions. Existing
approaches for audio classification, however, fall short as the unique and
dynamic audio characteristics of in-vehicle environments are not appropriately
taken into account. In this paper, we develop an audio classification system
that classifies an audio stream into music, speech, speech+music, and noise,
adaptably depending on driving environments including highway, local road,
crowded city, and stopped vehicle. More than 420 minutes of audio data
including various genres of music, speech, speech+music, and noise are
collected from diverse driving environments. The results demonstrate that the
proposed approach improves the average classification accuracy up to 166%, and
64% for speech, and speech+music, respectively, compared with a non-adaptive
approach in our experimental settings
Effect of nano black rice husk ash on the chemical and physical properties of porous concrete pavement
Black rice husk is a waste from this agriculture industry. It has been found that majority inorganic element in rice husk is silica. In this study, the effect of Nano from black rice husk ash (BRHA) on the chemical and physical properties of concrete pavement was investigated. The BRHA produced from uncontrolled burning at rice factory was taken. It was then been ground using laboratory mill with steel balls and steel rods. Four different grinding grades of BRHA were examined. A rice husk ash dosage of 10% by weight of binder was used throughout the experiments. The chemical and physical properties of the Nano BRHA mixtures were evaluated using fineness test, X-ray Fluorescence spectrometer (XRF) and X-ray diffraction (XRD). In addition, the compressive strength test was used to evaluate the performance of porous concrete pavement. Generally, the results show that the optimum grinding time was 63 hours. The result also indicated that the use of Nano black rice husk ash ground for 63hours produced concrete with good strengt
Indexing of fictional video content for event detection and summarisation
This paper presents an approach to movie video indexing that utilises audiovisual analysis to detect important and meaningful temporal video segments, that we term events. We consider three event classes, corresponding to dialogues, action sequences, and montages, where the latter also includes musical sequences. These three event classes are intuitive for a viewer to understand and recognise whilst accounting for over 90% of the content of most movies. To detect events we leverage traditional filmmaking principles and map these to a set of computable low-level audiovisual features. Finite state machines (FSMs) are used to detect when temporal sequences of specific features occur. A set of heuristics, again inspired by filmmaking conventions, are then applied to the output of multiple FSMs to detect the required events. A movie search system, named MovieBrowser, built upon this approach is also described. The overall approach is evaluated against a ground truth of over twenty-three hours of movie content drawn from various genres and consistently obtains high precision and recall for all event classes. A user experiment designed to evaluate the usefulness of an event-based structure for both searching and browsing movie archives is also described and the results indicate the usefulness of the proposed approach
Data-Driven Audio Feature Space Clustering for Automatic Sound Recognition in Radio Broadcast News
This is an Open Access article published by World Scientific Publishing Company. It is distributed under the terms of the Creative Commons Attribution 4.0 (CC-BY) License. Further distribution of this work is permitted, provided the original work is properly cited. T. Theodorou, I. Mpoas, A. Lazaridis, N. Fakotakis, 'Data-Driven Audio Feature Space Clustering for Automatic Sound Recognition in Radio Broadcast News', International Journal on Artificial Intelligence Tools, Vol. 26 (2), April 2017, 1750005 (13 pages), DOI: 10.1142/S021821301750005. © The Author(s).In this paper we describe an automatic sound recognition scheme for radio broadcast news based on principal component clustering with respect to the discrimination ability of the principal components. Specifically, streams of broadcast news transmissions, labeled based on the audio event, are decomposed using a large set of audio descriptors and project into the principal component space. A data-driven algorithm clusters the relevance of the components. The component subspaces are used by sound type classifier. This methodology showed that the k-nearest neighbor and the artificial intelligent network provide good results. Also, this methodology showed that discarding unnecessary dimension works in favor on the outcome, as it hardly deteriorates the effectiveness of the algorithms.Peer reviewe
Dual shots detection
The identification of a special kind of acoustic events such as dual gunshots and single gunshots in the traffic background is described in this work. The recognition of dangerous sounds may help to prevent the abnormal or criminal activities that happened near to the public transport stations. Therefore in this paper the methodology of dual shots detection in a noisy background was developed and evaluated. For this purpose, we investigated various feature extraction methods and combinations of different feature sets. These approaches were evaluated by the widely used classification technique based on the Hidden Markov Models
Analysis and recognition of similar environmental sounds
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em Engenharia InformáticaHumans have the ability to identify sound sources just by hearing a sound. Adapting the same problem to computers is called (automatic) sound recognition. Several sound recognizers have been developed throughout the years. The accuracy provided by these recognizers is influenced by the features they use and the classification method implemented. While there are many approaches in sound feature extraction and in sound classification, most have been used to classify sounds with very different characteristics.
Here, we implemented a similar sound recognizer. This recognizer uses sounds with very similar properties making the recognition process harder. Therefore, we will use both temporal and spectral properties of the sound. These properties will be extracted using the Intrinsic Structures Analysis (ISA) method, which uses Independent Component Analysis and Principal Component Analysis. We will implement the classification method based on k-Nearest Neighbor algorithm.
Here we prove that the features extracted in this way are powerful in sound recognition. We tested our recognizer with several sets of features the ISA method retrieves, and achieved great results. We, finally, did a user study to compare human performance distinguishing similar sounds against our recognizer. The study allowed us to conclude the sounds are in fact really similar and difficult to distinguish and that our recognizer has much more ability than humans to identify them
Speaker segmentation and clustering
This survey focuses on two challenging speech processing topics, namely: speaker segmentation and speaker clustering. Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker clustering, deterministic and probabilistic algorithms are examined. A comparative assessment of the reviewed algorithms is undertaken, the algorithm advantages and disadvantages are indicated, insight to the algorithms is offered, and deductions as well as recommendations are given. Rich transcription and movie analysis are candidate applications that benefit from combined speaker segmentation and clustering. © 2007 Elsevier B.V. All rights reserved
- …