213 research outputs found

    Von Mises-Fisher models in the total variability subspace for language recognition

    Full text link
    Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. I. Lopez-Moreno, D. Ramos, J. Gonzalez-Dominguez, and J. Gonzalez-Rodriguez, "Von Mises-Fisher models in the total variability subspace for language recognition", IEEE Signal Processing Letters, vol. 18, no. 12, pp. 705-708, October 2011This letter proposes a new modeling approach for the Total Variability subspace within a Language Recognition task. Motivated by previous works in directional statistics, von Mises-Fisher distributions are used for assigning language-conditioned probabilities to language data, assumed to be spherically distributed in this subspace. The two proposed methods use Kernel Density Functions or Finite Mixture Models of such distributions. Experiments conducted on NIST LRE 2009 show that the proposed techniques significantly outperform the baseline cosine distance approach in most of the considered experimental conditions, including different speech conditions, durations and the presence of unseen languages.This work was supported by the Ministerio de Ciencia e Innovación under FPI Grant TEC2009-14719-C02-01 and cátedra UAM-Telefónic

    ATVS-UAM NIST LRE 2009 System Description

    Full text link
    Official contribution of the National Institute of Standards and Technology; not subject to copyright in the United States.ATVS-UAM submits a fast, light and efficient single system. The use of a task-adapted nonspeech-recognition-based VAD (apart from NIST conversation labels) and gender-dependent total variability compensation technology allows our submitted system to obtain excellent development results with SRE08 data with exceptional computational efficiency. In order to test the VAD influence in the evaluation results, a contrastive equivalent system has been submitted exclusively changing ATVS VAD labels with BUT publicly contributed ones. In all contributed systems, two gender-independent calibrations have been trained with respectively telephone-only and mic (either mic-tel, tel-mic or mic-mic) data. The submitted systems have been designed for English speech in an application-independent way, all results being interpretable in the form of calibrated likelihood ratios to be properly evaluated with Cllr. Sample development results with English SRE08 data are 0.53% (male) and 1.11% (female) EER in tel-tel data (optimistic as all English speakers in SRE08 are included in total variability matrices), going up to 3.5% (tel-tel) to 5.1% EER (tel-mic) in pessimistic cross-validation experiments (25% of test speakers totally excluded from development data in each xval set). The submitted system is extremely light in computational resources, running 77 times faster than real time. Moreover, once VAD and feature extraction are performed (the heaviest components of our system), training and testing are performed respectively at 5300 and 2950 times faster than real time

    Frame-by-frame language identification in short utterances using deep neural networks

    Full text link
    This is the author’s version of a work that was accepted for publication in Neural Networks. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Neural Networks, VOL 64, (2015) DOI 10.1016/j.neunet.2014.08.006This work addresses the use of deep neural networks (DNNs) in automatic language identification (LID) focused on short test utterances. Motivated by their recent success in acoustic modelling for speech recognition, we adapt DNNs to the problem of identifying the language in a given utterance from the short-term acoustic features. We show how DNNs are particularly suitable to perform LID in real-time applications, due to their capacity to emit a language identification posterior at each new frame of the test utterance. We then analyse different aspects of the system, such as the amount of required training data, the number of hidden layers, the relevance of contextual information and the effect of the test utterance duration. Finally, we propose several methods to combine frame-by-frame posteriors. Experiments are conducted on two different datasets: the public NIST Language Recognition Evaluation 2009 (3 s task) and a much larger corpus (of 5 million utterances) known as Google 5M LID, obtained from different Google Services. Reported results show relative improvements of DNNs versus the i-vector system of 40% in LRE09 3 second task and 76% in Google 5M LID

    Automatic language identification using deep neural networks

    Full text link
    Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. I. López-Moreno, J. González-Domínguez, P. Oldrich, D. R. Martínez, J. González-Rodríguez, "Automatic language identification using deep neural networks", IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP, Florence (Italy), 2014This work studies the use of deep neural networks (DNNs) to address automatic language identification (LID). Motivated by their recent success in acoustic modelling, we adapt DNNs to the problem of identifying the language of a given spoken utterance from short-term acoustic features. The proposed approach is compared to state-of-the-art i-vector based acoustic systems on two different datasets: Google 5M LID corpus and NIST LRE 2009. Results show how LID can largely benefit from using DNNs, especially when a large amount of training data is available. We found relative improvements up to 70%, in Cavg, over the baseline system

    Coupled-oscillator model to analyze the interaction between a quartz resonator and trapped ions

    Get PDF
    The novel application of a piezoelectric quartz resonator for the detection of trapped ions has enabled the observation of the quartz-ions interaction under nonequilibrium conditions, opening new perspectives for high-sensitive motional frequency measurements of radioactive particles. Energized quartz crystals have (long) decay-time constants in the order of milliseconds, permitting the coherent detection of charged particles within short time scales. In this paper we develop a detailed model governing the interaction between trapped 40Ca+ ions and a quartz resonator connected to a low-noise amplifier. We apply this model to experimental data and extract the ions’ reduced-cyclotron frequency in our 7-T Penning trap setup. We also obtain an upper limit for the coupling constant g with the present quartz-amplifier-trap (QAT) configuration. The study of the reduced-cyclotron frequency is especially important for the use of this resonator in precision Penning-trap mass spectrometry. The improvement in sensitivity can be accomplished by increasing the quality factor of the QAT configuration, which in turn will improve the performance of the system towards the strong-coupling regim

    On the use of high-level information in speaker and language recognition

    Full text link
    Actas de las IV Jornadas de Tecnología del Habla (JTH 2006)Automatic Speaker Recognition systems have been largely dominated by acoustic-spectral based systems, relying in proper modelling of the short-term vocal tract of speakers. However, there is scientific and intuitive evidence that speaker specific information is embedded in the speech signal in multiple short- and long-term characteristics. In this work, a multilevel speaker recognition system combining acoustic, phonotactic and prosodic subsystems is presented and assessed using NIST 2005 Speaker Recognition Evaluation data. For language recognition systems, the NIST 2005 Language Recognition Evaluation was selected to measure performance of a high-level language recognition systems

    Genetic study of the hepcidin gene (HAMP) promoter and functional analysis of the c.-582A > G variant

    Get PDF
    Background Hepcidin acts as the main regulator of iron homeostasis through regulation of intestinal absorption and macrophage release. Hepcidin deficiency causes iron overload whereas its overproduction is associated with anaemia of chronic diseases. The aims of the study were: to identify genetic variants in the hepcidin gene (HAMP) promoter, to asses the associations between the variants found and iron status parameters, and to functionally study the role on HAMP expression of the most frequent variant. Results The sequencing of HAMP promoter from 103 healthy individuals revealed two genetic variants: The c.-153C > T with a frequency of 0.014 for allele T, which is known to reduce hepcidin expression and the c.-582A > G with a 0.218 frequency for allele G. In an additional group of 224 individuals, the c.-582A > G variant genotype showed no association with serum iron, transferrin or ferritin levels. The c.-582G HAMP promoter variant decreased the transcriptional activity by 20% compared to c.-582A variant in cells from the human hepatoma cell line HepG2 when cotransfected with luciferase reporter constructs and plasmid expressing upstream stimulatory factor 1 (USF1) and by 12-14% when cotransfected with plasmid expressing upstream stimulatory factor 2 (USF2). Conclusions The c.-582A > G HAMP promoter variant is not associated with serum iron, transferrin or ferritin levels in the healthy population. The in vitro effect of the c.-582A > G variant resulted in a small reduction of the gene transactivation by allele G compared to allele A. Therefore the effect of the variant on the hepcidin levels in vivo would be likely negligible. Finally, the c.-153C > T variant showed a frequency high enough to be considered when a genetic analysis is done in iron overload patientsThis work was supported by a grant from the Fondo de Investigaciones Sanitarias del Instituto de Salud Carlos III (PI052249 to LL) and Xunta de Galicia (PGIDIT06PXIC9101136PN)S

    Multilevel and session variability compensated language recognition: ATVS-UAM systems at NIST LRE 2009

    Full text link
    Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. J. Gonzalez-Dominguez, I. Lopez-Moreno, J. Franco-Pedroso, D. Ramos, D. T. Toledano, and J. Gonzalez-Rodriguez, "Multilevel and Session Variability Compensated Language Recognition: ATVS-UAM Systems at NIST LRE 2009" IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 6, pp. 1084 – 1093, December 2010This work presents the systems submitted by the ATVS Biometric Recognition Group to the 2009 Language Recognition Evaluation (LRE’09), organized by NIST. New challenges included in this LRE edition can be summarized by three main differences with respect to past evaluations. Firstly, the number of languages to be recognized expanded to 23 languages from 14 in 2007, and 7 in 2005. Secondly, the data variability has been increased by including telephone speech excerpts extracted from Voice of America (VOA) radio broadcasts through Internet in addition to Conversational Telephone Speech (CTS). The third difference was the volume of data, involving in this evaluation up to 2 terabytes of speech data for development, which is an order of magnitude greater than past evaluations. LRE’09 thus required participants to develop robust systems able not only to successfully face the session variability problem but also to do it with reasonable computational resources. ATVS participation consisted of state-of-the-art acoustic and high-level systems focussing on these issues. Furthermore, the problem of finding a proper combination and calibration of the information obtained at different levels of the speech signal was widely explored in this submission. In this work, two original contributions were developed. The first contribution was applying a session variability compensation scheme based on Factor Analysis (FA) within the statistics domain into a SVM-supervector (SVM-SV) approach. The second contribution was the employment of a novel backend based on anchor models in order to fuse individual systems prior to one-vs-all calibration via logistic regression. Results both in development and evaluation corpora show the robustness and excellent performance of the submitted systems, exemplified by our system ranked 2nd in the 30 second open-set condition, with remarkably scarce computational resources.This work has been supported by the Spanish Ministry of Education under project TEC2006-13170-C02-01. Javier Gonzalez-Dominguez also thanks Spanish Ministry of Education for supporting his doctoral research under project TEC2006-13141-C03-03. Special thanks are given to Dr. David Van Leeuwen from TNO Human Factors (Utrech, The Netherlands) for his strong collaboration, valuable discussions and ideas. Also, authors thank to Dr. Patrick Lucey for his final support on (non-target) Australian English review of the manuscript

    Assessment of air management strategies to improve the transient response of advanced gasoline engines operating under high EGR conditions

    Get PDF
    [EN] Advanced gasoline engines may lead the medium-term future of the passenger vehicle market, working in conventional and hybrid powertrains. Downsizing with turbocharging is the most extended way to improve fuel economy in gasoline engines. It is also proven that exhaust gas recirculation (EGR) reduces fuel consumption, but extracting the maximum benefit from EGR requires operating with high EGR rates. This fact can compromise the transient engine operation due to the greater turbocharger dependence. This research evaluates the EGR influence on the transient response of a turbocharged gasoline engine and, mainly, the potential of three air management strategies to accelerate the said response. Tip-in maneuvers at 1500 rpm (6-12 bar BMEP) were tested and simulated to this end. The three strategies are: reducing the EGR dilution by closing the EGR valve simultaneously with the throttle opening, using a pressurized air tank (PAT), and installing an electric supercharger at the compressor outlet in series. Engine tests show that the torque response time with EGR is 2-s slower than without EGR. 1D modeling results reveal that: the PAT connected to the intake manifold provides the fastest response, and the electric supercharger guarantees an excellent tradeoff between fuel consumption and torque response.Galindo, J.; Climent, H.; De La Morena, J.; González-Domínguez, D.; Guilain, S. (2023). Assessment of air management strategies to improve the transient response of advanced gasoline engines operating under high EGR conditions. Energy. 262. https://doi.org/10.1016/j.energy.2022.12558626

    A linguistically-motivated speaker recognition front-end through session variability compensated cepstral trajectories in phone units

    Full text link
    Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. J. González-Rodríguez, J. González-Domínguez, J. Franco-Pedroso, D. Ramos, "A linguistically-motivated speaker recognition front-end through session variability compensated cepstral trajectories in phone units" in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto (Japan), 2012, 4389 - 4392In this paper a new linguistically-motivated front-end is presented showing major performance improvements from the use of session variability compensated cepstral trajectories in phone units. Extending our recent work on temporal contours in linguistic units (TCLU), we have combined the potential of those unit-dependent trajectories with the ability of feature domain factor analysis techniques to compensate session variability effects, which has resulted in consistent and discriminant phone-dependent trajectories across different recording sessions. Evaluating with NIST SRE04 English-only 1s1s task, we report EERs as low as 5.40% from the trajectories in a single phone, with 29 different phones producing each of them EERs smaller than 10%, and additionally showing an excellent calibration performance per unit. The combination of different units shows significant complementarity reporting EERs as 1.63% (100×DCF=0.732) from a simple sum fusion of 23 best phones, or 0.68% (100×DCF=0.304) when fusing them through logistic regression.Supported by MEC grant PR-2010-123, MICINN project TEC09-14179, ForBayes project CCG10-UAM/TIC-5792 and Cátedra UAM-Telefónica
    corecore