130 research outputs found

    Detecció de les mans al volant durant la conducció

    Get PDF
    En l'actualitat ens trobem en un estat entre la conducció tradicional i la conducció autònoma. Per a poder donar completament el salt d'una a l'altre s'ha de passar per un estat intermedi en el qual s'afegeixen elements d'automatització durant la conducció tradicional. En particular, aquest projecte detecta les mans sobre el volant en diferents entorns, el qual ajuda a aportar seguretat durant la conducció, ja que el temps de reacció d'una persona envers una situació de risc augmenta si no té les dues mans a sobre. El projecte parteix d'un projecte anteriorment realitzat per dos alumnes de la UAB l'any passat, els quals també detectaven mans però d'una manera poc automàtica, per tant, en aquest projecte es pretén millorar el sistema de detecció fent servir diferents mètodes de filtratge com background subtraction, filtratge pel color de pell, operacions morfològiques, filtratge suavitzat i filtratge per àrea. Finalment s'analitzaran les dades.Nowadays we are in a state between traditional driving and autonomous driving. In order to be able to pass from one to the other one it is necessary to go through an intermediate state in which automation elements are added during traditional driving. In particular, this project detects hands on the steering wheel in different environments, which helps to provide safety during driving because the reaction time of a person facing a risky situation increases if he does not have both hands on the steering wheel. This project is based on a project previously carried out by two students of the UAB last year, which also detected hands but not in a very automatic way, therefore, this project aims to improve the computer vision detection system using different filtering methods as background subtraction, skin color filter, morphological operations, blurring filtering and area filtering. Finally, the data will be analyzed.En la actualidad nos encontramos en un estado entre la conducción tradicional y la conducción autónoma. Para poder dar completamente el salto de una a la otra se debe pasar por un estado intermedio en el que se añaden elementos de automatización durante la conducción tradicional. En particular, este proyecto detecta las manos sobre el volante en diferentes entornos, el cual ayuda a aportar seguridad durante la conducción, ya que el tiempo de reacción de una persona hacia una situación de riesgo aumenta si no tiene las dos manos encima. El proyecto parte de un proyecto anteriormente realizado por dos alumnos de la UAB el año pasado, los cuales también detectaban manos pero de una manera poco automática, por lo tanto, en este proyecto se pretende mejorar el sistema de detección utilizando diferentes métodos de filtrado como background subtraction, filtrado por el color de piel, operaciones morfológicas, filtrado suavizado y filtrado por área. Finalmente se analizarán los datos

    Evaluation of Tacotron Based Synthesizers for Spanish and Basque

    Get PDF
    In this paper, we describe the implementation and evaluation of Text to Speech synthesizers based on neural networks for Spanish and Basque. Several voices were built, all of them using a limited number of data. The system applies Tacotron 2 to compute mel-spectrograms from the input sequence, followed by WaveGlow as neural vocoder to obtain the audio signals from the spectrograms. The limited number of data used for training the models leads to synthesis errors in some sentences. To automatically detect those errors, we developed a new method that is able to find the sentences that have lost the alignment during the inference process. To mitigate the problem, we implemented a guided attention providing the system with the explicit duration of the phonemes. The resulting system was evaluated to assess its robustness, quality and naturalness both with objective and subjective measures. The results reveal the capacity of the system to produce good quality and natural audios.This work was funded by the Basque Government (Project refs. PIBA 2018-035, IT-1355-19). This work is part of the project Grant PID 2019-108040RB-C21 funded by MCIN/AEI/10.13039/ 501100011033

    New sulfur-phosphine ligands derived from sugars: synthesis and application in palladium-catalyzed allylic alkylation and in rhodium asymmetric hydrogenation

    Get PDF
    An efficient route to mixed phosphine / thioglycoside ligands type IV starting from glucose pentaacetate is reported. In only five steps the key epoxide 6 has been obtained in high yield and its structure determined by X-ray analysis. The ring opening of the tert-butyl 4,6-O-benzylidene- 2,3-anhydro-1-thio-β-D-allopyranoside 6 with diphenylphosphinyl lithium afforded the desired ligand as a single diastereoisomer. The prepared compounds act as a bidentate ligands as shown by X-ray analysis of the Rh(I)-complex 12. Preliminary results on the behaviour of these ligands in Pd(0)-catalyzed allylic alkylation, and in Rh(I)-catalyzed enamide hydrogenation are also reported.Dirección General de Investigaciones Científicas y Técnicas CTQ2006- 15515-CO2-01 y CTQ2007-61185Junta de Andalucía P06-FQM-01852 y P07- FQM-2774Fundación Ramón Arece

    An on-line VAD based on Multi-Normalisation Scoring (MNS) of observation likelihoods

    Get PDF
    Preprint del artículo públicado online el 31 de mayo 2018Voice activity detection (VAD) is an essential task in expert systems that rely on oral interfaces. The VAD module detects the presence of human speech and separates speech segments from silences and non-speech noises. The most popular current on-line VAD systems are based on adaptive parameters which seek to cope with varying channel and noise conditions. The main disadvantages of this approach are the need for some initialisation time to properly adjust the parameters to the incoming signal and uncertain performance in the case of poor estimation of the initial parameters. In this paper we propose a novel on-line VAD based only on previous training which does not introduce any delay. The technique is based on a strategy that we have called Multi-Normalisation Scoring (MNS). It consists of obtaining a vector of multiple observation likelihood scores from normalised mel-cepstral coefficients previously computed from different databases. A classifier is then used to label the incoming observation likelihood vector. Encouraging results have been obtained with a Multi-Layer Perceptron (MLP). This technique can generalise for unseen noise levels and types. A validation experiment with two current standard ITU-T VAD algorithms demonstrates the good performance of the method. Indeed, lower classification error rates are obtained for non-speech frames, while results for speech frames are similar.This work was partially supported by the EU (ERDF) under grant TEC2015-67163-C2-1-R (RESTORE) (MINECO/ERDF, EU) and by the Basque Government under grant KK-2017/00043 (BerbaOla)

    New sulfur-phosphine ligands derived from sugars: synthesis and application in palladium-catalyzed allylic alkilation and in rhodium asymmetric hydrogenation

    Get PDF
    14 páginas, 4 figuras, 5 esquemas, 2 tablas.An efficient route to mixed phosphine / thioglycoside ligands type IV starting from glucose pentaacetate is reported. In only five steps the key epoxide 6 has been obtained in high yield and its structure determined by X-ray analysis. The ring opening of the tert-butyl 4,6-O-benzylidene-2,3-anhydro-1-thio-β-D-allopyranoside 6 with diphenylphosphinyl lithium afforded the desired ligand as a single diastereoisomer. The prepared compounds act as a bidentate ligands as shown by X-ray analysis of the Rh(I)-complex 12. Preliminary results on the behaviour of these ligands in Pd(0)-catalyzed allylic alkylation, and in Rh(I)-catalyzed enamide hydrogenation are also reported.We thank the Dirección General de Investigación Científica y Técnica (grant No. CTQ2006-15515-CO2-01 and CTQ2007-61185), the Junta de Andalucía (grant P06-FQM-01852 and P07-FQM-2774), la Fundación Ramón Areces for financial support, and Mr M. Rudkowski for performing preliminary experimental work.Peer reviewe

    Modelo de duración para conversión de texto a voz en euskera

    Get PDF
    En este artículo se presenta el trabajo realizado en el modelado de la duración de los fonemas en euskera estándar, para ser utilizado en conversión de texto a voz. El modelado estadístico se ha llevado a cabo mediante árboles binarios de regresión utilizando un corpus de 57.300 fonemas. Se han realizado varios experimentos de predicción testeando diferentes factores de influencia. El resultado obtenido en la predicción de la duración tiene un RMSE de 22.23 ms.This paper presents the modelling of phone durations in standard Basque, to be included in a text-to-speech system. The statistical modelling has been done using binary regression trees and a large corpus containing 57.300 phones. Several experiments have been performed, testing different sets of predicting factors. The result when predicting durations with this model has a RMSE of 22.23 ms.Este trabajo ha sido parcialmente financiado por el Ministerio de Ciencia y Tecnología (TIC2000-1005-C03-03 y TIC2000-1669-C04-03)

    Enrichment of Oesophageal Speech: Voice Conversion with Duration-Matched Synthetic Speech as Target

    Get PDF
    Pathological speech such as Oesophageal Speech (OS) is difficult to understand due to the presence of undesired artefacts and lack of normal healthy speech characteristics. Modern speech technologies and machine learning enable us to transform pathological speech to improve intelligibility and quality. We have used a neural network based voice conversion method with the aim of improving the intelligibility and reducing the listening effort (LE) of four OS speakers of varying speaking proficiency. The novelty of this method is the use of synthetic speech matched in duration with the source OS as the target, instead of parallel aligned healthy speech. We evaluated the converted samples from this system using a collection of Automatic Speech Recognition systems (ASR), an objective intelligibility metric (STOI) and a subjective test. ASR evaluation shows that the proposed system had significantly better word recognition accuracy compared to unprocessed OS, and baseline systems which used aligned healthy speech as the target. There was an improvement of at least 15% on STOI scores indicating a higher intelligibility for the proposed system compared to unprocessed OS, and a higher target similarity in the proposed system compared to baseline systems. The subjective test reveals a significant preference for the proposed system compared to unprocessed OS for all OS speakers, except one who was the least proficient OS speaker in the data set.This project was supported by funding from the European Union’s H2020 research and innovation programme under the MSCA GA 675324 (the ENRICH network: www.enrich-etn.eu (accessed on 25 June 2021)), and the Basque Government (PIBA_2018_1_0035 and IT355-19)

    Proline-coated gold nanoparticles as a highly efficient nanocatalyst for the enantioselective direct aldol reaction in water

    Get PDF
    Reported is an efficient approach to the synthesis of water-soluble proline-coated gold nanoparticles through a place exchange reaction between pentanethiolate stabilized gold nanoparticles and a proline-tethered amphiphilic thiol. Preliminary studies show that the nanocatalyst is highly active in an enamine type aldolisation leading to the desired product with nearly perfect diastereoselectivity and enantioselectivity using water as an innocuous solventMinisterio de Economía y Competitividad CTQ2010-21755-CO2-00Junta de Andalucía P07-FQM-277

    Intelligibility and Listening Effort of Spanish Oesophageal Speech

    Get PDF
    Communication is a huge challenge for oesophageal speakers, be it for interactions with fellow humans or with digital voice assistants. We aim to quantify these communication challenges (both human-human and human-machine interactions) by measuring intelligibility and Listening Effort (LE) of Oesophageal Speech (OS) in comparison to Healthy Laryngeal Speech (HS). We conducted two listening tests (one web-based, the other in laboratory settings) to collect these measurements. Participants performed a sentence recognition and LE rating task in each test. Intelligibility, calculated as Word Error Rate, showed significant correlation with self-reported LE ratings. Speaker type (healthy or oesophageal) had a major effect on intelligibility and effort. More LE was reported for OS compared to HS even when OS intelligibility was close to HS. Listeners familiar with OS reported less effort when listening to OS compared to nonfamiliar listeners. However, such advantage of familiarity was not observed for intelligibility. Automatic speech recognition scores were higher for OS compared to HS.This project was supported by funding from the EUs H2020 research and innovation programme under the MSCA GA 67532*4 (the ENRICH network: www.enrich-etn.eu), the Spanish Ministry of Economy and Competitiveness with FEDER support (RESTORE project, TEC2015-67163-C2-1-R) and the Basque Government (DL4NLP KK-2019/00045, PIBA_2018_1_0035 and IT355-19)

    Intelligibility and Listening Effort of Spanish Oesophageal Speech

    Get PDF
    Communication is a huge challenge for oesophageal speakers, be it for interactions with fellow humans or with digital voice assistants. We aim to quantify these communication challenges (both human-human and human-machine interactions) by measuring intelligibility and Listening Effort (LE) of Oesophageal Speech (OS) in comparison to Healthy Laryngeal Speech (HS). We conducted two listening tests (one web-based, the other in laboratory settings) to collect these measurements. Participants performed a sentence recognition and LE rating task in each test. Intelligibility, calculated as Word Error Rate, showed significant correlation with self-reported LE ratings. Speaker type (healthy or oesophageal) had a major effect on intelligibility and effort. More LE was reported for OS compared to HS even when OS intelligibility was close to HS. Listeners familiar with OS reported less effort when listening to OS compared to nonfamiliar listeners. However, such advantage of familiarity was not observed for intelligibility. Automatic speech recognition scores were higher for OS compared to HS.This project was supported by funding from the EUs H2020 research and innovation programme under the MSCA GA 67532*4 (the ENRICH network: www.enrich-etn.eu), the Spanish Ministry of Economy and Competitiveness with FEDER support (RESTORE project, TEC2015-67163-C2-1-R) and the Basque Government (DL4NLP KK-2019/00045, PIBA_2018_1_0035 and IT355-19)