108 research outputs found

    Emotion identification from text using semantic disambiguation

    Get PDF
    Este artículo presenta un sistema de identificación de emociones basado en texto con una arquitectura independiente del idioma. Éste usa distintas tareas de procesamiento del lenguaje natural además de un diccionario afectivo. Su principal novedad es la incorporación de un desambiguador semántico que permite considerar el significado de la palabra en la frase antes de categorizarla emocionalmente. Los experimentos muestran la mejora obtenida sobre un corpus de titulares en inglés.This paper presents a text-based emotion identification system based on text implemented by means of a language-independent architecture. The system includes several natural language processing tasks besides an affective keyword dictionary. The main novelty of the system is the incorporation of a semantic disambiguation module which focuses on the meaning of the word within the sentence before labelling it emotionally. The achived results show the this purpose. The conducted experiments show the achieved improvement on a corpus of English headlines

    Statistical prediction of spectral discontinuities of speech in concatenative synthesis

    Get PDF
    La estimación de discontinuidades espectrales es uno de los mayores problemas en el ámbito de la síntesis concatenativa del habla. Este artículo presenta una metodología basada en el estudio del comportamiento estadístico de medidas objetivas sobre uniones naturales. El objetivo es definir un proceso automático para seleccionar qué medidas emplear como coste de unión para sintetizar un habla lo más natural posible. El artículo presenta los resultados objetivos y subjetivos que permiten validar la propuesta.The estimation of spectral discontinuities is one of the most common problems in speech concatenative synthesis. This paper introduces a methodology based on analyzing the statistical behaviour of objective measures for natural concatenations. The main goal is defining an automatic process capable of including the most appropriate measures as concatenation cost to generate high quality synthetic speech. This paper describes both the objective and subjective results for validating the proposal

    Conversión de texto en habla multidominio basada en selección de unidades con ajuste subjetivo de pesos y marcado robusto de pitch

    Get PDF
    El propòsit final de la conversió de text a parla (CTP) és la generació de parla sintètica completament natural a partir d'un text d'entrada qualsevol. Històricament, s'han seguit dues estratègies per a assolir aquest objectiu: la que prima la flexibilitat de la conversió davant la qualitat de la síntesi, donant lloc als sistemes de conversió de text a parla de propòsit general (CTP-PG); i la que anteposa la naturalitat de la síntesi a la generalitat de la CTP, coneguda com a conversió de text a parla de domini restringit (CTP-DR). En l'actualitat, l'estratègia més utilitzada per a desenvolupar els sistemes de CTP és la conversió de text a parla basada en corpus o per selecció d'unitats (CTP-SU). Tot i que la qualitat dels sistemes de CTP-SU és bastant bona en general, encara existeixen qüestions que continuen essent font d'investigació. En aquesta tesi es presenten diverses aportacions en el context de la CTP-SU per a millorar, d'una banda, la naturalitat dels sistemes de CTP-PG i, per l'altra, la flexibilitat dels sistemes de CTP-DR. Per abordar la primera qüestió, es presenta una tècnica que permet incorporar de forma eficient la percepció humana al procés de selecció de les unitats del corpus de veu mitjançant l'ajust subjectiu dels pesos de la funció de cost que guia la selecció de les unitats, controlant la fatiga i la consistència de l'usuari. Així mateix, es presenta un mètode per a millorar la fiabilitat del procés d'etiquetatge automàtic del corpus de veu, concretament, de les marques de pitch ---qüestió fonamental en el context dels CTP basats en selecció d'unitats. En quant al segon problema, i seguint l'estratègia de CTP-DR, es presenta la conversió de text a parla multidomini (CTP-MD), que persegueix aconseguir una qualitat sintètica equivalent a la dels sistemes de CTP-DR, augmentant la seva flexibilitat per considerar diferents dominis (estils de locució, emocions, temàtiques, etc.) per a la síntesi. En aquest context, és necessari que el sistema de CTP-MD conegui, durant el procés de conversió de text a parla, quin domini o dominis són els més adequats per a poder sintetitzar el text d'entrada amb la major naturalitat possible. En aquest cas, el sistema de CTP-MD incorpora un mòdul de classificació de textos a l'arquitectura clàssica dels sistemes de CTP adaptat a les necessitats que planteja la CTP-MD. Finalment, totes les propostes descrites s'avaluen en termes objectius ---mitjançant l'ús de mesures clàssiques juntament amb noves propostes--- i/o subjectius ---mitjançant proves perceptives--- per a validar les millores aconseguides pels mètodes desenvolupats en el context de la CTP-SU en el camí cap al desenvolupament de nous sistemes de CTP d'alta qualitat y flexibilitat.El propósito final de la conversión de texto en habla (CTH) es la generación de habla sintética completamente natural a partir de un texto de entrada cualquiera. Históricamente, se han seguido dos estrategias para lograr este objetivo: la que prima la flexibilidad de la conversión ante la calidad de la síntesis, dando lugar a los sistemas de conversión de texto en habla de propósito general (CTH-PG); y la que antepone la naturalidad de la síntesis a la generalidad de la CTH, conocida como conversión de texto en habla de dominio restringido (CTH-DR). En la actualidad, la estrategia más utilizada para desarrollar los sistemas de CTH es la conversión de texto en habla basada en corpus o por selección de unidades (CTH-SU). Aunque la calidad de los sistemas de CTH-SU es bastante buena en general, todavía existen elementos que continúan siendo fuente de investigación. En esta tesis se presentan distintas aportaciones en el contexto de la CTH-SU para mejorar, por un lado, la naturalidad de los sistemas de CTH-PG y, por otro, la flexibilidad de los sistemas de CTH-DR. Para abordar la primera cuestión, se presenta una técnica que permite incorporar de forma eficiente la percepción humana al proceso de selección de las unidades del corpus de voz mediante el ajuste subjetivo de los pesos de la función de coste que guía la selección de las unidades, controlando la fatiga y la consistencia del usuario. Asimismo, se presenta un método para mejorar la fiabilidad del proceso de etiquetado automático del corpus de voz, concretamente, de las marcas de pitch ---cuestión fundamental en el contexto de los CTH basados en selección de unidades. En cuanto al segundo problema, y siguiendo la estrategia de CTH-DR, se presenta la conversión de texto en habla multidominio (CTH-MD), que persigue conseguir una calidad sintética equivalente a la de los sistemas de CTH-DR, aumentando su flexibilidad al considerar distintos dominios (estilos de locución, emociones, temáticas, etc.) para la síntesis. En este contexto, es necesario que el sistema de CTH-MD conozca, durante el proceso de conversión de texto en habla, qué dominio o dominios son los más adecuados para poder sintetizar el texto de entrada con la mayor naturalidad posible. En este caso, el sistema de CTH-MD incorpora un módulo de clasificación de textos a la arquitectura clásica de los sistemas de CTH adaptado a las necesidades que plantea la CTH-MD. Finalmente, todas las propuestas descritas se evalúan en términos objetivos ---mediante el uso de medidas clásicas junto a nuevas propuestas--- y/o subjetivos ---mediante pruebas de percepción--- para validar las mejoras conseguidas por los métodos desarrollados en el contexto de la CTH-SU en el camino hacia el desarrollo de nuevos sistemas de CTH de elevada calidad y flexibilidad.The final purpose of any Text-to-Speech (TTS) system is the generation of perfectly natural synthetic speech from any input text. Historically, two strategies have been followed in the quest for this goal: the general purpose TTS synthesis (GP-TTS), which strives the flexibility of the application at the expense of the achieved synthetic speech quality; and the limited domain TTS synthesis (LD-TTS), which prioritizes the development of high quality TTS systems by restricting the scope of the input text. At present, the most used strategy to develop TTS systems is the so called corpus-based text-to-speech or unit selection TTS (US-TTS) synthesis. Although the quality of US-TTS synthesis systems is quite good in general, there are still several open issues which are still being investigated. This PhD thesis introduces different contributions for US-TTS systems in order to improve, by one hand, the naturalness of GP-TTS systems, and by the other hand, the flexibility of LD-TTS systems. To deal with the former problem, a new technique for efficiently incorporating human perception in the unit selection process by means of subjective weight tuning is introduced, which also allows controlling user fatigue and user consistency. Moreover, a new method for improving the reliability of automatic speech corpus labelling is described, particularly, a generic pitch marks filtering algorithm is introduced ---an essential issue in corpus-based TTS systems. Moreover, the latter problem is addressed by multi-domain TTS (MD-TTS) synthesis, following the LD-TTS approach, which deals with achieving synthetic speech quality equivalent to that of LD-TTS systems, but improving TTS flexibility by considering different domains (speaking styles, emotions, topics, etc.) for conducting speech synthesis. In this context, the MD-TTS system needs to know, at run time, which domain or domains are the most suitable for synthesizing the input text with the highest synthetic speech quality. To that effect, the MD-TTS system incorporates a text classification module to classic TTS synthesis architecture adapted to the MD-TTS classification particularities. Finally, all the proposals are evaluated in terms of objective experiments ---by means of classic or new measures--- and/or subjective tests ---perceptual tests--- in order to validate the improvements achieved by the methods developed in the US-TTS framework, as a step further in our research towards developing high quality and flexible text-to-speech synthesis systems

    CIBERER : Spanish national network for research on rare diseases: A highly productive collaborative initiative

    Get PDF
    Altres ajuts: Instituto de Salud Carlos III (ISCIII); Ministerio de Ciencia e Innovación.CIBER (Center for Biomedical Network Research; Centro de Investigación Biomédica En Red) is a public national consortium created in 2006 under the umbrella of the Spanish National Institute of Health Carlos III (ISCIII). This innovative research structure comprises 11 different specific areas dedicated to the main public health priorities in the National Health System. CIBERER, the thematic area of CIBER focused on rare diseases (RDs) currently consists of 75 research groups belonging to universities, research centers, and hospitals of the entire country. CIBERER's mission is to be a center prioritizing and favoring collaboration and cooperation between biomedical and clinical research groups, with special emphasis on the aspects of genetic, molecular, biochemical, and cellular research of RDs. This research is the basis for providing new tools for the diagnosis and therapy of low-prevalence diseases, in line with the International Rare Diseases Research Consortium (IRDiRC) objectives, thus favoring translational research between the scientific environment of the laboratory and the clinical setting of health centers. In this article, we intend to review CIBERER's 15-year journey and summarize the main results obtained in terms of internationalization, scientific production, contributions toward the discovery of new therapies and novel genes associated to diseases, cooperation with patients' associations and many other topics related to RD research

    Description of Anomalous Noise Events for Reliable Dynamic Traffic Noise Mapping in Real-Life Urban and Suburban Soundscapes

    No full text
    Traffic noise is one of the main pollutants in urban and suburban areas. European authorities have driven several initiatives to study, prevent and reduce the effects of exposure of population to traffic. Recent technological advances have allowed the dynamic computation of noise levels by means of Wireless Acoustic Sensor Networks (WASN) such as that developed within the European LIFE DYNAMAP project. Those WASN should be capable of detecting and discarding non-desired sound sources from road traffic noise, denoted as anomalous noise events (ANE), in order to generate reliable noise level maps. Due to the local, occasional and diverse nature of ANE, some works have opted to artificially build ANE databases at the cost of misrepresentation. This work presents the production and analysis of a real-life environmental audio database in two urban and suburban areas specifically conceived for anomalous noise events’ collection. A total of 9 h 8 min of labelled audio data is obtained differentiating among road traffic noise, background city noise and ANE. After delimiting their boundaries manually, the acoustic salience of the ANE samples is automatically computed as a contextual signal-to-noise ratio (SNR). The analysis of the real-life environmental database shows high diversity of ANEs in terms of occurrences, durations and SNRs, as well as confirming both the expected differences between the urban and suburban soundscapes in terms of occurrences and SNRs, and the rare nature of ANE

    Characterization of a WASN-Based Urban Acoustic Dataset for the Dynamic Mapping of Road Traffic Noise

    No full text
    Road Traffic Noise (RTN) is one of the main pollutants in urban and suburban areas, negatively affecting the quality of life of their inhabitants. In the context of the European LIFE DYNAMAP project, two Wireless Acoustic Sensor Networks (WASN) have been deployed to monitor RTN: one in District 9 of Milan, and another along the A90 motorway of Rome. Since the dynamic mapping system should be able to identify and remove those Anomalous Noise Events (ANEs) unrelated to regular road traffic (e.g., sirens, horns, speech, and doors), an Anomalous Noise Event Detector (ANED) has been included in the dynamic noise mapping pipeline to avoid biasing the computation of the equivalent RTN levels. After deploying the 24 low-cost acoustic sensor networks in both pilot areas, WASN-based acoustic datasets were built to adapt the previous version of the ANED algorithm to run in real-operation conditions. In this work, we describe the preliminary results of the analysis of the 154 h WASN-based urban acoustic dataset obtained from the Milan city in terms of the main characteristics of ANEs. The results confirm the unbalanced nature of the problem (83.7% of the data corresponds to RTN), showing the urban WASN-based dataset a larger number of ANEs with higher local predominance than what was observed in the previous expert-based recording campaign, which underlines the importance of the accurate modeling of the urban acoustic environment to train the ANED properly
    corecore