523 research outputs found

    Template-based ASR using Posterior features and synthetic references: comparing different TTS systems

    Get PDF
    In recent works, the use of phone class-conditional posterior probabilities (posterior features) directly as features provided successful results in template-based ASR systems. In this paper, motivated by the high quality of current text-to-speech systems and the robustness of posterior features toward undesired variability, we investigate the use of synthetic speech to generate reference templates. The use of synthetic speech in template-based ASR not only allows to address the issue of in-domain data collection but also expansion of vocabulary. On 75- and 600-word task-independent and speaker-independent setup of Phonebook corpus, we show the feasibility of this approach by investigating different synthetic voices produced by HTS-based synthesizer trained on two different databases. Our study shows that synthetic speech templates can yield performance comparable to the natural speech templates, especially with synthetic voices that have high intelligibility

    Objective Speech Intelligibility Assessment through Comparison of Phoneme Class Conditional Probability Sequences

    Get PDF
    Assessment of speech intelligibility is important for the development of speech systems, such as telephony systems and text-to-speech (TTS) systems. Existing approaches to the automatic assessment of intelligibility in telephony typically compare a reference speech signal to a degraded copy, which requires that both signals be from the same speaker. In this paper, we propose a novel approach that does not have such a requirement, making it possible to also evaluate TTS systems and recent very low bit rate codecs that may modify speaker characteristics. More specifically, our approach is based on comparing sequences of phoneme class conditional probabilities. We show the potential of our approach on low bit rate telephony conditions, and compare it against subjective TTS intelligibility scores from the 2011 Blizzard Challenge

    Directed evolution of ancestral and modern enzymes

    Full text link
    Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología Molecular. Fecha de lectura: 18-10-2019Esta tesis tiene embargado el acceso al texto completo hasta el 18-04-2021Ancestral sequence reconstruction (ASR) and resurrection (i.e. functional expression in a heterologous host) allows enzymes with different properties to be disclosed while its combination with directed evolution may lead to the development of a new generation of biocatalysts. In this Doctoral Thesis we have explored the combination of such powerful methods using as blueprints two different enzyme systems, Rubisco and laccase. In the first chapter of this Thesis we reconstructed and resurrected (in Escherichia coli) Precambrian Rubisco nodes which were evolved in parallel with the extant Rubisco counterpart. An in vitro dual high-throughput screening (HTS) method was set out to identify thermostable and functional variants after- applying a palette of directed evolution strategies. The stronger tolerance to mutational loads, the improved expression and the different kinetic behavior were some of the traits that highlighted in the Precambrian enzyme. Particularly, the evolved ancestral Clone B2 stood out as a case study of this elusive protein due to its alternative performance in the classical equilibrium of Rubisco kinetic constants. In the second chapter we focused ASR and directed evolution on basidiomycete laccases. Firstly, ancestral nodes from the Paleozoic were reconstructed and resurrected in Saccharomyces cerevisiae. The resurrected enzymes showed a higher heterologous expression and a broader pH stability profile than the modern -laboratory evolved- counterpart. The most promising ancestral node was subjected to structure-guided evolution for the oxidation of β–diketones, an unusual type of redox mediators capable to initiate the polymerization of vinyl monomers. The final chapter of the Thesis deals with consensus design, a long-standing protein engineering method to increase stability without compromising activity. We applied an in-house consensus method to stabilize a laboratory evolved high-redox potential laccase. Multiple sequence alignments were carried out and computationally refined by applying relative entropy and mutual information thresholds. Through this approach, an ensemble of 20 consensus mutations were identified, 18 of which were consensus ancestral mutations. After analyzing potential epistasis by site directed recombination in vivo, the best mutant was characterized displaying dramatically improved thermostability, kinetic values and secretion levels.La reconstrucción y resurrección (i.e. expresión funcional en un hospedador heterólogo) de secuencias ancestrales permite obtener enzimas con diferentes propiedades que al ser combinadas con la evolución dirigida pueden dar lugar al desarrollo de una nueva generación de biocatalizadores. En la presente Tesis Doctoral hemos explorado la combinación de estos potentes métodos usando como modelos dos sistemas enzimáticos diferentes, Rubisco y lacasa. En el primer capítulo se reconstruyeron y resucitaron (en Escherichia coli) nodos de rubiscos Precámbricas con el fin de evolucionarlos en paralelo con una versión moderna de Rubisco. Se desarrolló un método de cribado dual in vitro para poder identificar variantes termoestables y funcionales tras aplicar varias estrategias de evolución dirigida. Las enzimas Precámbricas destacaron por una alta tolerancia a tasas mutagénicas, expresión funcional mejorada y valores cinéticos diferentes a los de las enzimas modernas. En particular, la rubisco ancestral evolucionada, clon B2, despuntó como caso de estudio de esta complicada enzima debido al comportamiento alternativo que muestra con respecto al equilibrio clásico de las constantes cinéticas de la Rubisco. En el segundo capítulo se llevo a cabo la resurrección y evolución dirigida de lacasas de basidiomicetos. En primer lugar se reconstruyeron y resucitaron en Saccharomyces cerevisiae nodos ancestrales del Paleozoico. Las enzimas ancestrales mostraron mayor nivel de expresión heteróloga así como un perfil de estabilidad a diferentes pHs más amplio que el de la versión –evolucionada en el laboratorio- moderna. El nodo ancestral más prometedor fue sometido a evolución estructuralmente guiada para la oxidación de β-dicetonas, un tipo de mediador redox poco usual capaz de iniciar la polimerización de monómeros de vinilo. El capítulo final de la Tesis trata sobre el diseño consenso, un método clásico de ingeniería de proteínas para aumentar la estabilidad sin afectar a la actividad. Se aplicó un método consenso propio para la estabilización de una lacasa de alto potencial redox evolucionada en el laboratorio. Se llevó a cabo un alineamiento de múltiples secuencias que fue refinado computacionalmente mediante el uso de los marcadores de entropía relativa e información mutua. Mediante este procedimiento se identificaron 20 mutaciones consenso, 18 de las cuales corresponden a mutaciones ancestrales-consenso. Se analizó la posible epistasia de estas mutaciones mediante recombinación dirigida in vivo y se caracterizó el mejor mutante que presentó mayores niveles de estabilidad, valores cinéticos y secreciónLa presente Tesis Doctoral se ha llevado a cabo gracias a la financiación recibida a través de una beca para la formación de personal investigador (FPI) del Ministerio de Economía y Competitividad (BES-2014-068887) dentro de los proyectos nacionales DEWRY (BIO2013-43407-R) y LIGNOLUTION (BIO2016-79106-R)

    "Can you hear me now?":Automatic assessment of background noise intrusiveness and speech intelligibility in telecommunications

    Get PDF
    This thesis deals with signal-based methods that predict how listeners perceive speech quality in telecommunications. Such tools, called objective quality measures, are of great interest in the telecommunications industry to evaluate how new or deployed systems affect the end-user quality of experience. Two widely used measures, ITU-T Recommendations P.862 âPESQâ and P.863 âPOLQAâ, predict the overall listening quality of a speech signal as it would be rated by an average listener, but do not provide further insight into the composition of that score. This is in contrast to modern telecommunication systems, in which components such as noise reduction or speech coding process speech and non-speech signal parts differently. Therefore, there has been a growing interest for objective measures that assess different quality features of speech signals, allowing for a more nuanced analysis of how these components affect quality. In this context, the present thesis addresses the objective assessment of two quality features: background noise intrusiveness and speech intelligibility. The perception of background noise is investigated with newly collected datasets, including signals that go beyond the traditional telephone bandwidth, as well as Lombard (effortful) speech. We analyze listener scores for noise intrusiveness, and their relation to scores for perceived speech distortion and overall quality. We then propose a novel objective measure of noise intrusiveness that uses a sparse representation of noise as a model of high-level auditory coding. The proposed approach is shown to yield results that highly correlate with listener scores, without requiring training data. With respect to speech intelligibility, we focus on the case where the signal is degraded by strong background noises or very low bit-rate coding. Considering that listeners use prior linguistic knowledge in assessing intelligibility, we propose an objective measure that works at the phoneme level and performs a comparison of phoneme class-conditional probability estimations. The proposed approach is evaluated on a large corpus of recordings from public safety communication systems that use low bit-rate coding, and further extended to the assessment of synthetic speech, showing its applicability to a large range of distortion types. The effectiveness of both measures is evaluated with standardized performance metrics, using corpora that follow established recommendations for subjective listening tests

    Deep Spoken Keyword Spotting:An Overview

    Get PDF
    Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with different purposes like the activation of voice assistants. Prospects suggest a sustained growth in terms of social use of this technology. Thus, it is not surprising that deep KWS has become a hot research topic among speech scientists, who constantly look for KWS performance improvement and computational complexity reduction. This context motivates this paper, in which we conduct a literature review into deep spoken KWS to assist practitioners and researchers who are interested in this technology. Specifically, this overview has a comprehensive nature by covering a thorough analysis of deep KWS systems (which includes speech features, acoustic modeling and posterior handling), robustness methods, applications, datasets, evaluation metrics, performance of deep KWS systems and audio-visual KWS. The analysis performed in this paper allows us to identify a number of directions for future research, including directions adopted from automatic speech recognition research and directions that are unique to the problem of spoken KWS

    Evolution of CRISPR-associated endonucleases as inferred from resurrected proteins

    Get PDF
    Clustered regularly interspaced short palindromic repeats (CRISPR)-associated Cas9 is an effector protein that targets invading DNA and plays a major role in the prokaryotic adaptive immune system. Although Streptococcus pyogenes CRISPR–Cas9 has been widely studied and repurposed for applications including genome editing, its origin and evolution are poorly understood. Here, we investigate the evolution of Cas9 from resurrected ancient nucleases (anCas) in extinct firmicutes species that last lived 2.6 billion years before the present. We demonstrate that these ancient forms were much more flexible in their guide RNA and protospacer-adjacent motif requirements compared with modern-day Cas9 enzymes. Furthermore, anCas portrays a gradual palaeoenzymatic adaptation from nickase to double-strand break activity, exhibits high levels of activity with both single-stranded DNA and single-stranded RNA targets and is capable of editing activity in human cells. Prediction and characterization of anCas with a resurrected protein approach uncovers an evolutionary trajectory leading to functionally flexible ancient enzymes.This work has been supported by grant nos. PID2019-109087RB-I00 (to R.P.-J.) and RTI2018-101223-B-I00 and PID2021-127644OB-I00 (to L.M.) from the Spanish Ministry of Science and Innovation. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 964764 (to R.P.-J.). The content presented in this document represents the views of the authors, and the European Commission has no liability in respect to the content. We acknowledge financial support from the Spanish Foundation for the Promotion of Research of Amyotrophic Lateral Sclerosis. A.F. acknowledges Spanish Center for Biomedical Network Research on Rare Diseases (CIBERE) intramural funds (no. ER19P5AC756/2021). F.J.M.M. acknowledges research support by Conselleria d’Educació, Investigació, Cultura i Esport from Generalitat Valenciana, research project nos. PROMETEO/2017/129 and PROMETEO/2021/057. M.M. acknowledges funding from CIBERER (grant no. ER19P5AC728/2021). The work has received funding from the Regional Government of Madrid (grant no. B2017/BMD3721 to M.A.M.-P.) and from Instituto de Salud Carlos III, cofounded with the European Regional Development Fund ‘A way to make Europe’ within the National Plans for Scientific and Technical Research and Innovation 2017–2020 and 2021–2024 (nos. PI17/1659, PI20/0429 and IMP/00009; to M.A.M.-P. B.P.K. was supported by an MGH ECOR Howard M. Goodman Award and NIH P01 HL142494

    Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification

    Get PDF
    Objective assessment of synthetic speech intelligibility can be a useful tool for the development of text-to-speech (TTS) systems, as it provides a reproducible and inexpensive alternative to subjective listening tests. In a recent work, it was shown that the intelligibility of synthetic speech could be assessed objectively by comparing two sequences of phoneme class conditional probabilities, corresponding to instances of synthetic and human reference speech, respectively. In this paper, we build on those findings to propose a novel approach that formulates objective intelligibility assessment as an utterance verification problem using hidden Markov models, thereby alleviating the need for human reference speech. Specifically, given each text input to the TTS system, the proposed approach automatically verifies the words in the output synthetic speech signal and estimates an intelligibility score based on word recall statistics. We evaluate the proposed approach on the 2011 Blizzard Challenge data, and show that the estimated scores and the subjective intelligibility scores are highly correlated (Pearson’s |R| = 0.94)
    corecore