954 research outputs found

    Evolution At the Surface of Euclid:Elements of A Long Infinity in Motion Along Space

    Get PDF
    It is modernly debated whether application of the free will has potential to cause harm to nature. Power possessed to the discourse, sensory/perceptual, physical influences on life experience by the slow moving machinery of change is a viral element in the problems of civilization; failed resolution of historical paradox involving mind and matter is a recurring source of problems. Reference is taken from the writing of Euclid in which a oneness of nature as an indivisible point of thought is made prerequisite in criteria of interpretation to demonstrate that contemporary scientific methodologies alternately ensue from the point of empirically centered induction. A qualification for conceptualizations is proposed that involves a physically describable form bound to energy in addition to contemporary notions of energy bound to form and a visually based mathematical-physical form is elaborated and discussed with respect to biological and natural processes

    Synthesising prosody with insufficient context

    Get PDF
    Prosody is a key component in human spoken communication, signalling emotion, attitude, information structure, intention, and other communicative functions through perceived variation in intonation, loudness, timing, and voice quality. However, the prosody in text-to-speech (TTS) systems is often monotonous and adds no additional meaning to the text. Synthesising prosody is difficult for several reasons: I focus on three challenges. First, prosody is embedded in the speech signal, making it hard to model with machine learning. Second, there is no clear orthography for prosody, meaning it is underspecified in the input text and making it difficult to directly control. Third, and most importantly, prosody is determined by the context of a speech act, which TTS systems do not, and will never, have complete access to. Without the context, we cannot say if prosody is appropriate or inappropriate. Context is wide ranging, but state-of-the-art TTS acoustic models only have access to phonetic information and limited structural information. Unfortunately, most context is either difficult, expensive, or impos- sible to collect. Thus, fully specified prosodic context will never exist. Given there is insufficient context, prosody synthesis is a one-to-many generative task: it necessitates the ability to produce multiple renditions. To provide this ability, I propose methods for prosody control in TTS, using either explicit prosody features, such as F0 and duration, or learnt prosody representations disentangled from the acoustics. I demonstrate that without control of the prosodic variability in speech, TTS will produce average prosody—i.e. flat and monotonous prosody. This thesis explores different options for operating these control mechanisms. Random sampling of a learnt distribution of prosody produces more varied and realistic prosody. Alternatively, a human-in-the-loop can operate the control mechanism—using their intuition to choose appropriate prosody. To improve the effectiveness of human-driven control, I design two novel approaches to make control mechanisms more human interpretable. Finally, it is important to take advantage of additional context as it becomes available. I present a novel framework that can incorporate arbitrary additional context, and demonstrate my state-of- the-art context-aware model of prosody using a pre-trained and fine-tuned language model. This thesis demonstrates empirically that appropriate prosody can be synthesised with insufficient context by accounting for unexplained prosodic variation

    Linguistic Variation Issues: Case and Agreement in Northern Russian Participial Constructions

    Get PDF
    This study offers a novel approach to a longstanding problem in Slavic Linguistics, the formal representation of the Northern Russian participial constructions in -n(o)/-t(o). Unlike previous works, the methodological stance adopted by the author focuses on singling out all the relevant patterns of variation and on pursuing a unified explanation for them. The key to the solution of the puzzle is the idea that the participial affix -n-/-t- and the agreement inflections are not just pieces of morphology inserted post-syntactically, but true heads that enter the computation and are able to manipulate the argumental roles of the verb and to check the EPP. The author’s proposal is properly framed in the context of current debate on interlanguage variation

    Técnicas de personalización de voces sintéticas para su uso por personas con discapacidad oral

    Get PDF
    151 p.Esta tesis presenta avances realizados en la personalización de voces sintéticas que emplean los sistemas de conversión de texto a voz utilizados por personas con alguna discapacidad oral. Se presenta un nuevo algoritmo de adaptación de locutor para voces sintéticas basadas en síntesis estadístico paramétrica. Este algoritmo hace uso únicamente de fragmentos vocálicos para imitar la voz del locutor objetivo y se ha demostrado que es robusto frente a la escasez de datos y que tiene un desempeño similar a otros algoritmos del estado del arte.También se describe el diseño e implementación de un banco de voces en el cual cualquier persona puede realizar grabaciones de su voz real para generar una voz sintética que posteriormente puede ser empleada por otro usuario. De esta manera las personas pueden ¿donar¿ su voz.Por último, se presenta una metodología que hace uso de diversas medidas objetivas de evaluación de señales de voz para puntuar la calidad de las voces disponibles en el banco de voces

    Efficient Approaches for Voice Change and Voice Conversion Systems

    Get PDF
    In this thesis, the study and design of Voice Change and Voice Conversion systems are presented. Particularly, a voice change system manipulates a speaker’s voice to be perceived as it is not spoken by this speaker; and voice conversion system modifies a speaker’s voice, such that it is perceived as being spoken by a target speaker. This thesis mainly includes two sub-parts. The first part is to develop a low latency and low complexity voice change system (i.e. includes frequency/pitch scale modification and formant scale modification algorithms), which can be executed on the smartphones in 2012 with very limited computational capability. Although some low-complexity voice change algorithms have been proposed and studied, the real-time implementations are very rare. According to the experimental results, the proposed voice change system achieves the same quality as the baseline approach but requires much less computational complexity and satisfies the requirement of real-time. Moreover, the proposed system has been implemented in C language and was released as a commercial software application. The second part of this thesis is to investigate a novel low-complexity voice conversion system (i.e. from a source speaker A to a target speaker B) that improves the perceptual quality and identity without introducing large processing latencies. The proposed scheme directly manipulates the spectrum using an effective and physically motivated method – Continuous Frequency Warping and Magnitude Scaling (CFWMS) to guarantee high perceptual naturalness and quality. In addition, a trajectory limitation strategy is proposed to prevent the frame-by-frame discontinuity to further enhance the speech quality. The experimental results show that the proposed method outperforms the conventional baseline solutions in terms of either objective tests or subjective tests

    Pathway to Future Symbiotic Creativity

    Full text link
    This report presents a comprehensive view of our vision on the development path of the human-machine symbiotic art creation. We propose a classification of the creative system with a hierarchy of 5 classes, showing the pathway of creativity evolving from a mimic-human artist (Turing Artists) to a Machine artist in its own right. We begin with an overview of the limitations of the Turing Artists then focus on the top two-level systems, Machine Artists, emphasizing machine-human communication in art creation. In art creation, it is necessary for machines to understand humans' mental states, including desires, appreciation, and emotions, humans also need to understand machines' creative capabilities and limitations. The rapid development of immersive environment and further evolution into the new concept of metaverse enable symbiotic art creation through unprecedented flexibility of bi-directional communication between artists and art manifestation environments. By examining the latest sensor and XR technologies, we illustrate the novel way for art data collection to constitute the base of a new form of human-machine bidirectional communication and understanding in art creation. Based on such communication and understanding mechanisms, we propose a novel framework for building future Machine artists, which comes with the philosophy that a human-compatible AI system should be based on the "human-in-the-loop" principle rather than the traditional "end-to-end" dogma. By proposing a new form of inverse reinforcement learning model, we outline the platform design of machine artists, demonstrate its functions and showcase some examples of technologies we have developed. We also provide a systematic exposition of the ecosystem for AI-based symbiotic art form and community with an economic model built on NFT technology. Ethical issues for the development of machine artists are also discussed

    Application-driven visual computing towards industry 4.0 2018

    Get PDF
    245 p.La Tesis recoge contribuciones en tres campos: 1. Agentes Virtuales Interactivos: autónomos, modulares, escalables, ubicuos y atractivos para el usuario. Estos IVA pueden interactuar con los usuarios de manera natural.2. Entornos de RV/RA Inmersivos: RV en la planificación de la producción, el diseño de producto, la simulación de procesos, pruebas y verificación. El Operario Virtual muestra cómo la RV y los Co-bots pueden trabajar en un entorno seguro. En el Operario Aumentado la RA muestra información relevante al trabajador de una manera no intrusiva. 3. Gestión Interactiva de Modelos 3D: gestión online y visualización de modelos CAD multimedia, mediante conversión automática de modelos CAD a la Web. La tecnología Web3D permite la visualización e interacción de estos modelos en dispositivos móviles de baja potencia.Además, estas contribuciones han permitido analizar los desafíos presentados por Industry 4.0. La tesis ha contribuido a proporcionar una prueba de concepto para algunos de esos desafíos: en factores humanos, simulación, visualización e integración de modelos
    • …
    corecore