Search CORE

582 research outputs found

A regularity-constrained Viterbi algorithm and its application to the structural segmentation of songs

Author: Bimbot Frédéric
Sargent Gabriel
Vincent Emmanuel
Publication venue: HAL CCSD
Publication date: 24/10/2011
Field of study

International audienceThis paper presents a general approach for the structural segmentation of songs. It is formalized as a cost optimization problem that combines properties of the musical content and prior regularity assumption on the segment length. A versatile implementation of this approach is proposed by means of a Viterbi algorithm, and the design of the costs are discussed. We then present two systems derived from this approach, based on acoustic and symbolic features respectively. The advantages of the regularity constraint are evaluated on a database of 100 popular songs by showing a significant improvement of the segmentation performance in terms of F-measure

INRIA a CCSD electronic archive server

Intelligent system for spoken term detection using the belief combination

Author: Khan Wasiq
Kuru Kaya
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/02/2017
Field of study

Spoken Term Detection (STD) can be considered as a sub-part of the automatic speech recognition which aims to extract the partial information from speech signals in the form of query utterances. A variety of STD techniques available in the literature employ a single source of evidence for the query utterance match/mismatch determination. In this manuscript, we develop an acoustic signal processing based approach for STD that incorporates a number of techniques for silence removal, dynamic noise filtration, and evidence combination using Dempster-Shafer Theory (DST). A ‘spectral-temporal features based voiced segment detection’ and ‘energy and zero cross rate based unvoiced segment detection’ are built to remove the silence segments in the speech signal. Comprehensive experiments have been performed on large speech datasets and consequently satisfactory results have been achieved with the proposed approach. Our approach improves the existing speaker dependent STD approaches, specifically the reliability of query utterance spotting by combining the evidences from multiple belief sources

CLoK

Crossref

E-space: Manchester Metropolitan University's Research Repository

Spherical near field acoustic holography with microphones on a rigid sphere:Abstract of paper

Author: Fernandez Grande Efren
Hald Jørgen
Jacobsen Finn
Moreno Guillermo
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/2008
Field of study

Crossref

Online Research Database In Technology

Predicting and auralizing acoustics in classrooms

Author: Christensen Claus Lynge
Publication venue
Publication date: 01/01/2005
Field of study

Although classrooms have fairly simple geometries, this type of room is known to cause problems when trying to predict their acoustics using room acoustics computer modeling. Some typical features from a room acoustics point of view are: Parallel walls, low ceilings (the rooms are flat), uneven distribution of absorption, and most of the floor being covered with furniture which at long distances act as scattering elements, and at short distance provide strong specular components. The importance of diffraction and scattering is illustrated in numbers and by means of auralization, using ODEON 8 Beta

Crossref

Online Research Database In Technology

Towards an Integrative Information Society: Studies on Individuality in Speech and Sign

Author: Ojala Stina
Publication venue: Turku Centre for Computer Science
Publication date: 21/05/2011
Field of study

The flow of information within modern information society has increased rapidly over the last decade. The major part of this information flow relies on the individual’s abilities to handle text or speech input. For the majority of us it presents no problems, but there are some individuals who would benefit from other means of conveying information, e.g. signed information flow. During the last decades the new results from various disciplines have all suggested towards the common background and processing for sign and speech and this was one of the key issues that I wanted to investigate further in this thesis. The basis of this thesis is firmly within speech research and that is why I wanted to design analogous test batteries for widely used speech perception tests for signers – to find out whether the results for signers would be the same as in speakers’ perception tests. One of the key findings within biology – and more precisely its effects on speech and communication research – is the mirror neuron system. That finding has enabled us to form new theories about evolution of communication, and it all seems to converge on the hypothesis that all communication has a common core within humans. In this thesis speech and sign are discussed as equal and analogical counterparts of communication and all research methods used in speech are modified for sign. Both speech and sign are thus investigated using similar test batteries. Furthermore, both production and perception of speech and sign are studied separately. An additional framework for studying production is given by gesture research using cry sounds. Results of cry sound research are then compared to results from children acquiring sign language. These results show that individuality manifests itself from very early on in human development. Articulation in adults, both in speech and sign, is studied from two perspectives: normal production and re-learning production when the apparatus has been changed. Normal production is studied both in speech and sign and the effects of changed articulation are studied with regards to speech. Both these studies are done by using carrier sentences. Furthermore, sign production is studied giving the informants possibility for spontaneous speech. The production data from the signing informants is also used as the basis for input in the sign synthesis stimuli used in sign perception test battery. Speech and sign perception were studied using the informants’ answers to questions using forced choice in identification and discrimination tasks. These answers were then compared across language modalities. Three different informant groups participated in the sign perception tests: native signers, sign language interpreters and Finnish adults with no knowledge of any signed language. This gave a chance to investigate which of the characteristics found in the results were due to the language per se and which were due to the changes in modality itself. As the analogous test batteries yielded similar results over different informant groups, some common threads of results could be observed. Starting from very early on in acquiring speech and sign the results were highly individual. However, the results were the same within one individual when the same test was repeated. This individuality of results represented along same patterns across different language modalities and - in some occasions - across language groups. As both modalities yield similar answers to analogous study questions, this has lead us to providing methods for basic input for sign language applications, i.e. signing avatars. This has also given us answers to questions on precision of the animation and intelligibility for the users – what are the parameters that govern intelligibility of synthesised speech or sign and how precise must the animation or synthetic speech be in order for it to be intelligible. The results also give additional support to the well-known fact that intelligibility in fact is not the same as naturalness. In some cases, as shown within the sign perception test battery design, naturalness decreases intelligibility. This also has to be taken into consideration when designing applications. All in all, results from each of the test batteries, be they for signers or speakers, yield strikingly similar patterns, which would indicate yet further support for the common core for all human communication. Thus, we can modify and deepen the phonetic framework models for human communication based on the knowledge obtained from the results of the test batteries within this thesis.Siirretty Doriast

UTUPub

Auralization of an orchestra using multichannel and multisource technique (A)

Author: Rindel Jens Holger
Vigeant Michelle C.
Wang Lily M.
Publication venue
Publication date: 01/01/2006
Field of study

Online Research Database In Technology

A transparency model and its applications for simulation of reflector arrays and sound transmission (A)

Author: Christensen Claus Lynge
Rindel Jens Holger
Publication venue
Publication date: 01/01/2006
Field of study

Online Research Database In Technology

Proceedings of the 6th International Workshop on Folk Music Analysis, 15-17 June, 2016

Author: Beauguitte Pierre
Duggan Bryan
Kelleher John D.
Publication venue: Technological University Dublin
Publication date: 01/06/2016
Field of study

The Folk Music Analysis Workshop brings together computational music analysis and ethnomusicology. Both symbolic and audio representations of music are considered, with a broad range of scientific approaches being applied (signal processing, graph theory, deep learning). The workshop features a range of interesting talks from international researchers in areas such as Indian classical music, Iranian singing, Ottoman-Turkish Makam music scores, Flamenco singing, Irish traditional music, Georgian traditional music and Dutch folk songs. Invited guest speakers were Anja Volk, Utrecht University and Peter Browne, Technological University Dublin

Arrow@TUDublin

Investigating the build-up of precedence effect using reflection masking

Author: Buchholz Jörg
Hartcher-O'Brien Jessica
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/2006
Field of study

The auditory processing level involved in the build‐up of precedence [Freyman et al., J. Acoust. Soc. Am. 90, 874–884 (1991)] has been investigated here by employing reflection masked threshold (RMT) techniques. Given that RMT techniques are generally assumed to address lower levels of the auditory signal processing, such an approach represents a bottom‐up approach to the buildup of precedence. Three conditioner configurations measuring a possible buildup of reflection suppression were compared to the baseline RMT for four reflection delays ranging from 2.5–15 ms. No buildup of reflection suppression was observed for any of the conditioner configurations. Buildup of template (decrease in RMT for two of the conditioners), on the other hand, was found to be delay dependent. For five of six listeners, with reflection delay=2.5 and 15 ms, RMT decreased relative to the baseline. For 5‐ and 10‐ms delay, no change in threshold was observed. It is concluded that the low‐level auditory processing involved in RMT is not sufficient to realize a buildup of reflection suppression. This confirms suggestions that higher level processing is involved in PE buildup. The observed enhancement of reflection detection (RMT) may contribute to active suppression at higher processing levels

Online Research Database In Technology

MPG.PuRe

Äänisisällön automaattisen luokittelun menetelmiä

Author: Pohjalainen Jouni
Publication venue: Teknillinen korkeakoulu
Publication date: 01/01/2007
Field of study

This study presents an overview of different methods of digital signal processing and pattern recognition that are frequently applicable to automatic recognition, classification and description of audio content. Moreover, strategies for the combination of the said methods are discussed. Some of the published practical applications from different areas are cited to illustrate the use of the basic methods and the combined recognition strategies. A brief overview of human auditory perception is also given, with emphasis on the aspects that are important for audio recognition.Tässä työssä esitetään yleiskatsaus sellaisiin signaalinkäsittelyn ja hahmontunnistuksen menetelmiin, jotka ovat usein sovellettavissa äänisisällön automaattiseen tunnistamiseen, luokitteluun ja kuvaamiseen. Lisäksi työssä esitetään strategioita mainittujen menetelmien yhdistelyyn ja annetaan näihin ratkaisuihin liittyviä esimerkinomaisia viitteitä kirjallisuudesta löytyviin käytännön sovelluksiin eri sovellusalueilta. Työ sisältää myös suppean esityksen ihmisen kuulon toiminnan pääpiirteistä äänitunnistuksen kannalta

Aaltodoc Publication Archive