1,658 research outputs found

    Robust Personal Audio Geometry Optimization in the SVD-Based Modal Domain

    Get PDF
    © 2014 IEEE. Personal audio generates sound zones in a shared space to provide private and personalized listening experiences with minimized interference between consumers. Regularization has been commonly used to increase the robustness of such systems against potential perturbations in the sound reproduction. However, the performance is limited by the system geometry such as the number and location of the loudspeakers and controlled zones. This paper proposes a geometry optimization method to find the most geometrically robust approach for personal audio amongst all available candidate system placements. The proposed method aims to approach the most 'natural' sound reproduction so that the solo control of the listening zone coincidently accompanies the preferred quiet zone. Being formulated in the SVD-based modal domain, the method is demonstrated by applications in three typical personal audio optimizations, i.e., the acoustic contrast control, the pressure matching, and the planarity control. Simulation results show that the proposed method can obtain the system geometry with better avoidance of 'occlusion,' improved robustness to regularization, and improved broadband equalization

    ModDrop: adaptive multi-modal gesture recognition

    Full text link
    We present a method for gesture detection and localisation based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at three temporal scales. Key to our technique is a training strategy which exploits: i) careful initialization of individual modalities; and ii) gradual fusion involving random dropping of separate channels (dubbed ModDrop) for learning cross-modality correlations while preserving uniqueness of each modality-specific representation. We present experiments on the ChaLearn 2014 Looking at People Challenge gesture recognition track, in which we placed first out of 17 teams. Fusing multiple modalities at several spatial and temporal scales leads to a significant increase in recognition rates, allowing the model to compensate for errors of the individual classifiers as well as noise in the separate channels. Futhermore, the proposed ModDrop training technique ensures robustness of the classifier to missing signals in one or several channels to produce meaningful predictions from any number of available modalities. In addition, we demonstrate the applicability of the proposed fusion scheme to modalities of arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure

    Universal Adversarial Perturbations for Speech Recognition Systems

    Get PDF
    In this work, we demonstrate the existence of universal adversarial audio perturbations that cause mis-transcription of audio signals by automatic speech recognition (ASR) systems. We propose an algorithm to find a single quasi-imperceptible perturbation, which when added to any arbitrary speech signal, will most likely fool the victim speech recognition model. Our experiments demonstrate the application of our proposed technique by crafting audio-agnostic universal perturbations for the state-of-the-art ASR system -- Mozilla DeepSpeech. Additionally, we show that such perturbations generalize to a significant extent across models that are not available during training, by performing a transferability test on a WaveNet based ASR system.Comment: Published as a conference paper at INTERSPEECH 201

    Filter Optimization for Personal Sound Zones Systems

    Full text link
    [ES] Los sistemas de zonas de sonido personal (o sus siglas en inglés PSZ) utilizan altavoces y técnicas de procesado de señal para reproducir sonidos distintos en diferentes zonas de un mismo espacio compartido. Estos sistemas se han popularizado en los últimos años debido a la amplia gama de aplicaciones que podrían verse beneficiadas por la generación de zonas de escucha individuales. El diseño de los filtros utilizados para procesar las señales de sonido es uno de los aspectos más importantes de los sistemas PSZ, al menos para las frecuencias bajas y medias. En la literatura se han propuesto diversos algoritmos para calcular estos filtros, cada uno de ellos con sus ventajas e inconvenientes. En el presente trabajo se revisan los algoritmos para sistemas PSZ propuestos en la literatura y se evalúa experimentalmente su rendimiento en un entorno reverberante. Los distintos algoritmos se comparan teniendo en cuenta aspectos como el aislamiento acústico entre zonas, el error de reproducción, la energía de los filtros y el retardo del sistema. Además, se estudian estrategias computacionalmente eficientes para obtener los filtros y también se compara su complejidad computacional. Los resultados experimentales obtenidos revelan que las soluciones existentes no pueden ofrecer una complejidad computacional baja y al mismo tiempo un buen rendimiento con baja latencia. Por ello se propone un nuevo algoritmo basado en el filtrado subbanda, y se demuestra experimentalmente que este algoritmo mitiga las limitaciones de los algoritmos existentes. Asimismo, este algoritmo ofrece una mayor versatilidad que los algoritmos existentes, ya que se pueden utilizar configuraciones distintas en cada subbanda, como por ejemplo, diferentes longitudes de filtro o distintos conjuntos de altavoces. Por último, se estudia la influencia de las respuestas objetivo en la optimización de los filtros y se propone un nuevo método en el que se aplica una ventana temporal a estas respuestas. El método propuesto se evalúa experimentalmente en dos salas con diferentes tiempos de reverberación y los resultados obtenidos muestran que se puede reducir la energía de las interferencias entre zonas gracias al efecto de la ventana temporal.[CA] Els sistemes de zones de so personal (o les seves sigles en anglés PSZ) fan servir altaveus i tècniques de processament de senyal per a reproduir sons distints en diferents zones d'un mateix espai compartit. Aquests sistemes s'han popularitzat en els últims anys a causa de l'àmplia gamma d'aplicacions que podrien veure's beneficiades per la generació de zones d'escolta individuals. El disseny dels filtres utilitzats per a processar els senyals de so és un dels aspectes més importants dels sistemes PSZ, particularment per a les freqüències baixes i mitjanes. En la literatura s'han proposat diversos algoritmes per a calcular aquests filtres, cadascun d'ells amb els seus avantatges i inconvenients. En aquest treball es revisen els algoritmes proposats en la literatura per a sistemes PSZ i s'avalua experimentalment el seu rendiment en un entorn reverberant. Els distints algoritmes es comparen tenint en compte aspectes com l'aïllament acústic entre zones, l'error de reproducció, l'energia dels filtres i el retard del sistema. A més, s'estudien estratègies de còmput eficient per obtindre els filtres i també es comparen les seves complexitats computacionals. Els resultats experimentals obtinguts revelen que les solucions existents no poder oferir al mateix temps una complexitat computacional baixa i un bon rendiment amb latència baixa. Per això es proposa un nou algoritme basat en el filtrat subbanda que mitiga aquestes limitacions. A més, l'algoritme proposat ofereix una major versatilitat que els algoritmes existents, ja que en cada subbanda el sistema pot utilitzar configuracions diferents, com per exemple, distintes longituds de filtre o distints conjunts d'altaveus. L'algoritme proposat s'avalua experimentalment en un entorn reverberant, i es mostra com pot mitigar satisfactòriament les limitacions dels algoritmes existents. Finalment, s'estudia la influència de les respostes objectiu en l'optimització dels filtres i es proposa un nou mètode en el que s'aplica una finestra temporal a les respostes objectiu. El mètode proposat s'avalua experimentalment en dues sales amb diferents temps de reverberació i els resultats obtinguts mostren que es pot reduir el nivell d'interferència entre zones grècies a l'efecte de la finestra temporal.[EN] Personal Sound Zones (PSZ) systems deliver different sounds to a number of listeners sharing an acoustic space through the use of loudspeakers together with signal processing techniques. These systems have attracted a lot of attention in recent years because of the wide range of applications that would benefit from the generation of individual listening zones, e.g., domestic or automotive audio applications. A key aspect of PSZ systems, at least for low and mid frequencies, is the optimization of the filters used to process the sound signals. Different algorithms have been proposed in the literature for computing those filters, each exhibiting some advantages and disadvantages. In this work, the state-of-the-art algorithms for PSZ systems are reviewed, and their performance in a reverberant environment is evaluated. Aspects such as the acoustic isolation between zones, the reproduction error, the energy of the filters, and the delay of the system are considered in the evaluations. Furthermore, computationally efficient strategies to obtain the filters are studied, and their computational complexity is compared too. The performance and computational evaluations reveal the main limitations of the state-of-the-art algorithms. In particular, the existing solutions can not offer low computational complexity and at the same time good performance for short system delays. Thus, a novel algorithm based on subband filtering that mitigates these limitations is proposed for PSZ systems. In addition, the proposed algorithm offers more versatility than the existing algorithms, since different system configurations, such as different filter lengths or sets of loudspeakers, can be used in each subband. The proposed algorithm is experimentally evaluated and tested in a reverberant environment, and its efficacy to mitigate the limitations of the existing solutions is demonstrated. Finally, the effect of the target responses in the optimization is discussed, and a novel approach that is based on windowing the target responses is proposed. The proposed approach is experimentally evaluated in two rooms with different reverberation levels. The evaluation results reveal that an appropriate windowing of the target responses can reduce the interference level between zones.Molés Cases, V. (2022). Filter Optimization for Personal Sound Zones Systems [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/18611
    corecore