27 research outputs found

    Application of sound source separation methods to advanced spatial audio systems

    Full text link
    This thesis is related to the field of Sound Source Separation (SSS). It addresses the development and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in twochannel stereo format, special up-converters are required to use advanced spatial audio reproduction formats, such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is required. Source separation problems in digital signal processing are those in which several signals have been mixed together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied to existing two-channel mixtures to extract the different objects that compose the stereo scene. Unfortunately, most stereo mixtures are underdetermined, i.e., there are more sound sources than audio channels. This condition makes the SSS problem especially difficult and stronger assumptions have to be taken, often related to the sparsity of the sources under some signal transformation. This thesis is focused on the application of SSS techniques to the spatial sound reproduction field. As a result, its contributions can be categorized within these two areas. First, two underdetermined SSS methods are proposed to deal efficiently with the separation of stereo sound mixtures. These techniques are based on a multi-level thresholding segmentation approach, which enables to perform a fast and unsupervised separation of sound sources in the time-frequency domain. Although both techniques rely on the same clustering type, the features considered by each of them are related to different localization cues that enable to perform separation of either instantaneous or real mixtures.Additionally, two post-processing techniques aimed at improving the isolation of the separated sources are proposed. The performance achieved by several SSS methods in the resynthesis of WFS sound scenes is afterwards evaluated by means of listening tests, paying special attention to the change observed in the perceived spatial attributes. Although the estimated sources are distorted versions of the original ones, the masking effects involved in their spatial remixing make artifacts less perceptible, which improves the overall assessed quality. Finally, some novel developments related to the application of time-frequency processing to source localization and enhanced sound reproduction are presented.Cobos Serrano, M. (2009). Application of sound source separation methods to advanced spatial audio systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8969Palanci

    Resynthesis of Acoustic Scenes Combining Sound Source Separation and WaveField Synthesis Techniques

    Full text link
    [ES] La Separacón de Fuentes ha sido un tema de intensa investigación en muchas aplicaciones de tratamiento de señaal, cubriendo desde el procesado de voz al análisis de im'agenes biomédicas. Aplicando estas técnicas a los sistemas de reproducci'on espacial de audio, se puede solucionar una limitaci ón importante en la resíntesis de escenas sonoras 3D: la necesidad de disponer de las se ñales individuales correspondientes a cada fuente. El sistema Wave-field Synthesis (WFS) puede sintetizar un campo acústico mediante arrays de altavoces, posicionando varias fuentes en el espacio. Sin embargo, conseguir las señales de cada fuente de forma independiente es normalmente un problema. En este trabajo se propone la utilización de distintas técnicas de separaci'on de fuentes sonoras para obtener distintas pistas a partir de grabaciones mono o estéreo. Varios métodos de separación han sido implementados y comprobados, siendo uno de ellos desarrollado por el autor. Aunque los algoritmos existentes están lejos de conseguir una alta calidad, se han realizado tests subjetivos que demuestran cómo no es necesario obtener una separación óptima para conseguir resultados aceptables en la reproducción de escenas 3D[EN] Source Separation has been a subject of intense research in many signal processing applications, ranging from speech processing to medical image analysis. Applied to spatial audio systems, it can be used to overcome one fundamental limitation in 3D scene resynthesis: the need of having the independent signals for each source available. Wave-field Synthesis is a spatial sound reproduction system that can synthesize an acoustic field by means of loudspeaker arrays and it is also capable of positioning several sources in space. However, the individual signals corresponding to these sources must be available and this is often a difficult problem. In this work, we propose to use Sound Source Separation techniques in order to obtain different tracks from stereo and mono mixtures. Some separation methods have been implemented and tested, having been one of them developed by the author. Although existing algorithms are far from getting hi-fi quality, subjective tests show how it is not necessary an optimum separation for getting acceptable results in 3D scene reproductionCobos Serrano, M. (2007). Resynthesis of Acoustic Scenes Combining Sound Source Separation and WaveField Synthesis Techniques. http://hdl.handle.net/10251/12515Archivo delegad

    Maximum a Posteriori Binary Mask Estimation for Underdetermined Source Separation Using Smoothed Posteriors

    Full text link
    Sound source separation has become a topic of intensive research in the last years. The research effort has been specially relevant for the underdetermined case, where a considerable number of sparse methods working in the time-frequency (T-F) domain have appeared. In this context, although binary masking seems to be a preferred choice for source demixing, the estimated masks differ substantially from the ideal ones. This paper proposes a maximum a posteriori (MAP) framework for binary mask estimation. To this end, class-conditional source probabilities according to the observed mixing parameters are modeled via ratios of dependent Cauchy distributions while source priors are iteratively calculated from the observed histograms. Moreover, spatially smoothed posteriors in the T-F domain are proposed to avoid noisy estimates, showing that the estimated masks are closer to the ideal ones in terms of objective performance measures.This work was supported by the Spanish Ministry of Science and Innovation under project TEC2009-14414-C03-01. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Jingdong Chen.Cobos Serrano, M.; López Monfort, JJ. (2012). Maximum a Posteriori Binary Mask Estimation for Underdetermined Source Separation Using Smoothed Posteriors. IEEE Transactions on Audio, Speech and Language Processing. 20(7):2059-2064. doi:10.1109/TASL.2012.2195654S2059206420

    Computer-based detection and classification of flaws in citrus fruits

    Full text link
    [EN] In this paper, a system for quality control in citrus fruits is presented. In current citrus manufacturing industries, calliper and color are successfully used for the automatic classification of fruits using vision systems. However, the detection of flaws in the citrus surface is carried out by means of human inspection. In this work, a computer vision system capable of detecting defects in the citrus peel and also classifying the type of flaw is presented. First, a review of citrus illnesses has been carried out in order to build a database of digitalized oranges classified by the kind of fault, which is used as a training set. The segmentation of faulty zones is performed by applying the Sobel gradient to the image. Afterwards, color and texture features of the flaw are extracted considering different color spaces, some of them related to high order statistics. Several techniques have been employed for classification purposes: Euler distance to a prototype, to the nearest neighbor and k-nearest neighbors. Additionally, a three layer neural network has been tested and compared, obtaining promising results.López Monfort, JJ.; Cobos Serrano, M.; Aguilera Martí, E. (2011). Computer-based detection and classification of flaws in citrus fruits. Neural Computing and Applications. 20(7):975-981. doi:10.1007/s00521-010-0396-2S975981207Blasco J, Aleixos J, Molto E (2007) Computer vision detection of peel defects in citrus by means of a region oriented segmentation. J Food Eng 81:535–543Blasco J, Aleixos N, Gomez J, Molto E (2007) Citrus sorting by identification of the most common defects using multispectral computer vision. J Food Eng 83:384–391Bryson AE, Ho YC (1969) Applied optimal control: optimization, estimation, and control. Xerox College Publishing, Lexington, MAConners RWea (1983) Identifying and locating surface defects in wood. IEEE Trans Pattern Anal Mach Intell 5:573–583Diaz R, Gil L, Serrano C, Blasco M, Molto E, Blasco J (2004) Comparison of three algorithms in the classification of table olives by means of computer vision. J Food Eng 61:101–107Douglas DH, Peucker TK (1973) Algorithm for the reduction of the number of points required to represent a line or its caricature. The Can Cartogr 10(2):112–122Du CJ, Sun DW (2005) Comparison of three methods for classification of pizza topping using different colour space transformations. J Food Eng 68:277–287Kolesnikov A (2003) Efficient algorithms for vectorization and polygonal approximation. Ph.D. thesis, University of Joensuu, FinlandMolto E (1997) A computer vision system for inspecting citrus, peaches and apples. In: Proceedings of VII national symposium on pattern recognition and image analysis. Sabadell, Spain, pp 121–126Muir AY, Porteus RL, Wastie RL (1982) Experiments in the detection of incipient diseases in potato tubers by optical methods. J Agric Eng Res 27:131–138Q Li (2002) Computer vision based system for apple surface defect detection. computer and electronics in agriculture. Comput Electron Agric 36:215–223Ruiz LA, Molto E, Juste F, Pla F, Valiente R (1996) Location and characterization of the stem–calyx area on oranges by computer vision. J Agric Eng Res 64:165–172Tan TSC, Kittler J (1994) Colour texture analysis using colour histogram. IEEE Proc Vis Image Signal Process 141:403–412Wen Z, Tao Y (1999) Building a rule-based machine-vision system for defect inspection on apple sorting and packing lines. Expert Syst Appl 16:307–31

    Fast channel estimation in the transformed spatial domain for analog millimeter wave systems

    Get PDF
    Fast channel estimation in millimeter-wave (mmWave) systems is a fundamental enabler of high-gain beamforming, which boosts coverage and capacity. The channel estimation stage typically involves an initial beam training process where a subset of the possible beam directions at the transmitter and receiver is scanned along a predefined codebook. Unfortunately, the high number of transmit and receive antennas deployed in mmWave systems increase the complexity of the beam selection and channel estimation tasks. In this work, we tackle the channel estimation problem in analog systems from a different perspective than used by previous works. In particular, we propose to move the channel estimation problem from the angular domain into the transformed spatial domain, in which estimating the angles of arrivals and departures corresponds to estimating the angular frequencies of paths constituting the mmWave channel. The proposed approach, referred to as transformed spatial domain channel estimation (TSDCE) algorithm, exhibits robustness to additive white Gaussian noise by combining low-rank approximations and sample autocorrelation functions for each path in the transformed spatial domain. Numerical results evaluate the mean square error of the channel estimation and the direction of arrival estimation capability. TSDCE significantly reduces the first, while exhibiting a remarkably low computational complexity compared with well-known benchmarking schemes

    A Parallel Approach to HRTF Approximation and Interpolation Based on a Parametric Filter Model

    Get PDF
    "© 2017 IEEE. Personal use of this material is permitted. Permissíon from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertisíng or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works."[EN] Spatial audio-rendering techniques using head-related transfer functions (HRTFs) are currently used in many different contexts such as immersive teleconferencing systems, gaming, or 3-D audio reproduction. Since all these applications usually involve real-time constraints, efficient processing structures for HRTF modeling and interpolation are necessary for providing real-time binaural audio solutions. This letter presents a parametric parallel model that allows us to perform HRTF filtering and interpolation efficiently from an input HRTF dataset. The resulting model, which is an adaptation from a recently proposed modeling technique, not only reduces the size of HRTF datasets significantly, but also allows for simplified interpolation and real-time computation over parallel processors. In order to discuss the suitability of this new model, an implementation over a graphic processing unit is presented.This work was supported by the Spanish Ministry of Economy and Competitiveness under Grant TEC2012-37945-C02-02 and FEDER funds and by the UNKP-16-4-III New National Excellence Program of the Hungarian Ministry of Human Capacities. The work of J. A. Belloch was supported by GVA Postdoctoral Contract APOSTD/2016/069.Ramos Peinado, G.; Cobos Serrano, M.; Bank, B.; Belloch Rodríguez, JA. (2017). A Parallel Approach to HRTF Approximation and Interpolation Based on a Parametric Filter Model. IEEE Signal Processing Letters. 24(10):1507-1511. https://doi.org/10.1109/LSP.2017.2741724S15071511241

    Speaker Localization and Detection in Videoconferencing Environments Using a Modified SRP-PHAT Algorithm

    Full text link
    [EN] The Steered Response Power - Phase Transform (SRP-PHAT) algorithm has been shown to be one of the most robust sound source localization approaches operating in noisy and reverberant environments. However, its practical implementation is usually based on a costly fine grid-search procedure, making the computational cost of the method a real issue. In this paper, we introduce an effective strategy which performs a full exploration of the sampled space rather than computing the SRP at discrete spatial positions, increasing its robustness and allowing for a coarser spatial grid that reduces the computational cost required in a practical implementation. The modified SRP-PHAT functional has been successfully implemented in a real time speaker localization system for multiparticipant videoconferencing environments. Moreover, a localization-based speech-non speech frame discriminator is presented.This work was supported by the Ministry of Education and Science under the project TEC2009-14414-C03-01.Martí Guerola, A.; Cobos Serrano, M.; Aguilera Martí, E.; López Monfort, JJ. (2011). Speaker Localization and Detection in Videoconferencing Environments Using a Modified SRP-PHAT Algorithm. Waves. 3:40-47. http://hdl.handle.net/10251/57648S4047

    Subjective quality assessment of multichannel audio accompanied with video in representative broadcasting genres

    Full text link
    Immersive broadcasting applications have received a lot of attention in the last years. In this context, the development of advanced HDTV and 3DTV formats is being successfully adopted by the consumer market, having a strong impact in the way that traditional broadcasting contents are displayed to final users. Together with the above advances in video technology, multichannel spatial audio has also experienced a considerable impulse within the audiovisual industry. However, the need for specific production tools and loudspeaker setups corresponding to multiple competing audio formats seems to be an important factor affecting their adoption by the consumer community. Moreover, it is well-known that the perceived audio quality is highly influenced by the reproduction context, where the existing multimodal interaction between audio and video plays a very important role. This paper presents a formal evaluation of the perceived sound quality provided by several spatial audio formats accompanied with video in the context of television broadcasting. Stereo, advanced surround formats and 3D Binaural sound are evaluated considering a set of representative broadcasting contents (sports, movies, music and animation) to assess their impact on the perceptual attributes contemplated within the international recommendations.The Spanish Ministry of Economy and Competitiveness and FEDER supported this work under the projects TEC2012-37945- 1091 C02- 01/02.Cobos Serrano, M.; López Monfort, JJ.; Navarro Ruiz, JM.; Ramos Peinado, G. (2015). Subjective quality assessment of multichannel audio accompanied with video in representative broadcasting genres. Multimedia Systems. 21(4):363-379. doi:10.1007/s00530-013-0340-2S363379214Apostolopoulos, J., Chou, P., Culbertson, B., Kalker, T., Trott, M., Wee, S.: The road to immersive communication. Proc. IEEE 100(4), 974–990 (2012). doi: 10.1109/JPROC.2011.2182069Huang, Y., Chen, J., Benesty, J.: Immersive audio schemes. IEEE Signal Process. Mag. 28(1), 20–32 (2011). doi: 10.1109/MSP.2010.938754Huynh-Thu, Q., Barkowsky, M., Le Callet, P.: The importance of visual attention in improving the 3D-TV viewing experience: overview and new perspectives. IEEE Trans. Broadcast. 57(2), 421–431 (2011). doi: 10.1109/TBC.2011.2128250Wang, K., Barkowsky, M., Brunnstrom, K., Sjostrom, M., Cousseau, R., Le Callet, P.: Perceived 3D TV transmission quality assessment: multi-laboratory results using absolute category rating on quality of experience scale. IEEE Trans. Broadcast. PP(99), 1 (2012). doi: 10.1109/TBC.2012.2191031Zhang, L., Vazquez, C., Knorr, S.: 3D-TV content creation: automatic 2D-to-3D video conversion. IEEE Trans. Broadcast. 57(2), 372–383 (2011). doi: 10.1109/TBC.2011.2122930Kyriakakis, C., Tsakalides, P., Holman, T.: Surrounded by sound. IEEE Signal Process. Mag. 16(1), 55–66 (1999). doi: 10.1109/79.743868Rumsey, F.: Spatial Audio. Focal Press, Waltham (2001)Eargle, J.M. (eds): AES Anthology: Stereophonic Techniques. Publications of the Audio Engineering Society, New York (1986)Holman, T.: 5.1 Surround Sound: Up and Running, 2nd edn. Focal Press, Waltham (2007)Steinke, G.: High definition surround sound with accompanying HD picture. In: Proceedings of the International Tonmeister Symposium. Vabaria (2005)Breebaart, J., Faller, C.: Spatial Audio Processing: MPEG Surround and Other Applications. Wiley, Chichester (2007)Holman, T.: Sound for Film and Television, 3rd edn. Focal Press, Waltham (2010)Theile, G.: HDTV sound systems: how many channels? In: Proceedings of the AES 9th International Conference. Detroit, Michigan (1991)Strohmeier, D., Jumisko-Pyykkö S.: How does my 3D video sound like?—impact of loudspeaker set-ups on audiovisual quality on mid-sized autostereoscopic display. In: Proceedings of the 3DTV Conference (3DTV-CON’08). Istanbul, Turkey (2008)Zielinski, S., Rumsey, F., Bech, S.: Subjective audio quality trade-offs in consumer multichannel audio-visual delivery systems. Part I: Effects of high frequency limitation. In: Proceedings of the AES 112th Convention. Munich, Germany (2002)Bech, S., Zacharov, N.: Perceptual audio evaluation—theory, method and application. John Wiley & Sons, Chichester (2006)Brotherton, M.D., Huynh-Thu, Q., Hands, D.S., Brunnstrom, K.: Subjective multimedia quality assessment. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E89-A(11), 2920–2932 (2006)Jumisko-Pyykkö, S., Hakkinen, J., Nyman, G.: Experienced quality factors—qualitative evaluation approach to audiovisual quality. In: Proceedings of 19th SPIE Annual Symposium on Electronic Imaging. San Jose, California, USA (2007)Recommendation ITU-R BS.775-1: Multichannel Stereophonic Sound System With and Without Accompanying Picture. International Telecommunications Union, Geneva (1994)Recommendation ITU-T P.911: Subjective Audiovisual Quality Assessment Methods for Multimedia Applications. International Telecommunications Union, Geneva (1998)EBU Tech 3276-E: Supplement 1—Listening Conditions for the Assessment of Sound Programme Material: Multichannel Sound. International Telecommunications Union, Geneva (2004)Theile, G.: On the naturalness of two-channel stereo sound. J. Audio Eng. Soc. 39, 761–767 (1991)Dolby 7.1 home theater speaker guide. http://www.dolby.com/ . Accessed 05 July 2012Silzle, A., George, S., Habets, E.A.P., Bachmann, T.: Investigation on the quality of 3D sound reproduction. In: Proceedings of the International Conference on Spatial Audio (ICSA 2011). Detmold, Germany (2011)Hamasaki, K., Hiyama, K., Okumura, R.: The 22.2 multichannel sound system and its application. In: Proceedings of the 118th AES Convention. Barcelona, Spain (2005)Theile, G., Wittek, H.: Principles in surround recordings with height. In: Proceedings of the 130th AES Convention. London, UK (2011)Dolby ProLogic IIz. http://www.dolby.com/ . Accessed 05 July 2012Kim, S., Lee, Y.W., Pulkki, V.: New 10.2-channel vertical surround system (10.2-VSS); comparison study of perceived audio quality in various multichannel sound systems with height loudspeakers. In: Proceedings of the 129th AES Convention. San Francisco, USA (2010)Algazi, V.R., Duda, R.Q.: Headphone-based spatial sound. IEEE Signal Process. Mag. 28(1), 33–42 (2011)Moller, H., Sorensen, M.F., Jensen, C.B., Hammershoi, D.: Binaural technique: do we need individual recordings? J. Audio Eng. Soc. 44, 451–468 (1996)Jumisko-Pyykkö, S., Weitzel, M., Strohmeier, D.: Designing for user experience: what to expect from mobile 3D TV and video? In: Proceedings of the 1st International Conference on Designing Interactive User Experiences for TV and Video (UXTV ’08). Mountain View, CA, USA (2008)Goldstein, E.B.: Sensation and Perception. Wadsworth Publishing, Belmont (2002)Jumisko-Pyykkö, S.: User-Centered Quality of Experience and its Evaluation Methods for Mobile Television. Ph.D. thesis, Tampere University of Technology (2011)Hollier, M.P., Rimell, A.N., Hands, D.S., Voelcker, R.M.: Multi-modal perception. BT Technol. J. 17(1), 35–46 (1999)Hands, D.S.: A basic multimedia quality model. IEEE Trans. Multimed. 6(6), 806–816 (2004)Beerends, J.G., de Caluwe, F.E.: The influence of video quality on perceived audio quality and vice versa. J. Audio Eng. Soc. 47(5), 355–362 (1999)You, J., Reiter, U., Hannuksela, M.M., Gabbouj, M., Perkins, A.: Perceptual-based quality assessment for audio-visual services: a survey. Signal Process. Image Commun. 25, 482–501 (2010)Jones, C., Atkinson, D.J.: Development of opinion-based audiovisual quality models for desktop video-teleconferencing. In: Proceedings of the 6th International Workshop on Quality of Services (IWQoS 98). Napa Valley, CA (1998)Jumisko-Pyykkö, S., Strohmeier, D.: Cognitive styles and visual quality. In: Proceedings of SPIE 8667, Multimedia Content and Mobile Devices (2013)Nixon, N.F., Spitz, L.: The diction of auditory visual desynchrony. Perception 9, 719–721 (1980)Belmudez, B., Moeller, S., Lewcio, B., Raake, A., Mehmood, A.: Audio and video channel impact on perceived audio-visual quality in different interactive contexts. In: IEEE International Workshop on Multimedia Signal Processing, 2009. (MMSP ’09) (2009)Reiter, U.: Subjective assessment of the optimum number of loudspeaker channels in audio-visual applications using large screens. In: Proceedings of the 28th AES International Conference (2006)Steinke, G.: Surround-sound: Relations of listening and viewing configurations. In: Proceedings of the 116th AES Convention. Berlin, Germany. Paper 6019 (2004)Recommendation ITU-R BS.1116-1: Methods for Subjective Assessment of Small Impairments in Audio Systems Including Multichannel Sound Systems (1994)Recommendation ITU-R: 710-4: Subjective Assessment Methods for Image Quality in High-Definition Television. International Telecommunications Union, Geneva (1998)Recommendation ITU-R 500: Methodology for the Subjective Assessment of the Quality of Television Pictures. International Telecommunications Union, Geneva (2002)Recommendation ITU-R BS.1284-1: General Methods for the Subjective Assessment of Sound Quality. International Telecommunications Union, Geneva (2003)Recommendation ITU-R BT.1128-2: Subjective Assessment of Conventional Television Systems. International Telecommunications Union, Geneva (1997)Drewery, J.O., Salmon, R.A.: Tests of Visual Acuity to Determine the Resolution Required of a Television Transmission System. BBC R& D White Paper. WHP, 092 (2004)Recommendation ITU-R BS.1283: Subjective Assessment of Sound Quality—A Guide to Existing Recommendations. International Telecommunications Union, Geneva (1997)Recommendation ITU-R BS.1285: Pre-Selection Methods for the Subjective Assessment of Small Impairments in Audio Systems. International Telecommunications Union, Geneva (1997)Recommendation ITU-R BS.1286: Methods for the Subjective Assessment of Audio Systems with Accompanying Picture. International Telecommunications Union, Geneva (1998)Hershey, J., Movellan, J.: Audio-vision: using audio-visual synchrony to locate sounds. In: Advances in Neural Information Processing Systems, pp. 813–819. MIT Press, Cambridge (1999)Thurston, L.L.: A law of comparative judgment. Psychol. Rev. 101(2), 266–270 (1994)One TV Year in the World (2011 issue). Tech. rep., Mediametrie (2011)Neuraltm Upmix by DTS User Guide. DTS Document Number 9302J70400B (2010)Nuendo 3: Operation Manual. Steinberg Media Technologies, GmbH (2005)Pulkki, V.: Virtual sound source positioning using vector base amplitude panning. J. Audio Eng. Soc. 45(6), 456–566 (1997)Dolby headphone webpage. http://www.dolby.com/us/en/consumer/technology/home-theater/dolby-headphone.html (2012). Accessed 07 May 2012H3D Binaural Spatializer Manual. Longcat Audio Technologies SARL (2011)Blauert, J.: Spatial hearing. In: The Psychophysics of Human Sound Localization. MIT Press, Cambridge (1996)Kramer, C.Y.: Extension of multiple range tests to group means with unequal numbers of replications. Biometrics 12, 307–310 (1956)Mosteller, F.: Remarks on the method of paired comparisons: the least squares solution assuming equal standard deviations and equal correlations. Psychometrika 16(1), 3–9 (1951)Tsukida, K., Gupta, M.R.: How to Analyze Paired Comparison Data. Tech. rep., Department of Electrical Engineering, University of Washington (2011

    A modified SRP-PHAT functional for robust real-time sound source localization with scalable spatial sampling

    Full text link
    The Steered Response Power – Phase Transform (SRP-PHAT) algorithm has been shown to be one of the most robust sound source localization approaches operating in noisy and reverberant environments. However, its practical implementation is usually based on a costly fine grid-search procedure, making the computational cost of the method a real issue. In this letter, we introduce an effective strategy that extends the conventional SRP-PHAT functional with the aim of considering the volume surrounding the discrete locations of the spatial grid. As a result, the modified functional performs a full exploration of the sampled space rather than computing the SRP at discrete spatial positions, increasing its robustness and allowing for a coarser spatial grid. To this end, the Generalized Cross-Correlation (GCC) function corresponding to each microphone pair must be properly accumulated according to the defined microphone setup. Experiments carried out under different acoustic conditions confirm the validity of the proposed approach.Manuscript received September 06, 2010; revised October 22, 2010; accepted October 27, 2010. Date of publication November 11, 2010; date of current version December 16, 2010. This work was suported by the The Spanish Ministry of Science and Innovation supported this work under the project TEC2009-14414-C03-01. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Constantine L. Kotropoulos.Cobos Serrano, M.; Martí Guerola, A.; López Monfort, JJ. (2011). A modified SRP-PHAT functional for robust real-time sound source localization with scalable spatial sampling. IEEE Signal Processing Letters. 18:71-74. doi:10.1109/LSP.2010.2091502S71741

    An Immersive Multi-Party Conferencing System for Mobile Devices Using 3D Binaural Audio

    Full text link
    [EN] The use of mobile telephony, along with the widespread of smartphones in the consumer market, is gradually displacing traditional telephony. Fixed-line telephone conference calls have been widely employed for carrying out distributed meetings around the world in the last decades. However, the powerful characteristics brought by modern mobile devices and data networks allow for new conferencing schemes based on immersive communication, one the fields having major commercial and technical interest within the telecommunications industry today. In this context, adding spatial audio features into conventional conferencing systems is a natural way of creating a realistic communication environment. In fact, the human auditory system takes advantage of spatial audio cues to locate, separate and understand multiple speakers when they talk simultaneously. As a result, speech intelligibility is significantly improved if the speakers are simulated to be spatially distributed. This paper describes the development of a new immersive multi-party conference call service for mobile devices (smartphones and tablets) that substantially improves the identification and intelligibility of the participants. Headphone-based audio reproduction and binaural sound processing algorithms allow the user to locate the different speakers within a virtual meeting room. Moreover, the use of a large touch screen helps the user to identify and remember the participants taking part in the conference, with the possibility of changing their spatial location in an interactive way.This work has been partially supported by the government of Spain grant TEC-2009-14414-C03-01 and by the new technologies department of TelefónicaAguilera Martí, E.; López Monfort, JJ.; Cobos Serrano, M.; Macià Pina, L.; Martí Guerola, A. (2012). An Immersive Multi-Party Conferencing System for Mobile Devices Using 3D Binaural Audio. Waves. 4:5-14. http://hdl.handle.net/10251/57918S514
    corecore