954 research outputs found
Evolution At the Surface of Euclid:Elements of A Long Infinity in Motion Along Space
It is modernly debated whether application of the free will has potential to cause harm to nature. Power possessed to the discourse, sensory/perceptual, physical influences on life experience by the slow moving machinery of change is a viral element in the problems of civilization; failed resolution of historical paradox involving mind and matter is a recurring source of problems. Reference is taken from the writing of Euclid in which a oneness of nature as an indivisible point of thought is made prerequisite in criteria of interpretation to demonstrate that contemporary scientific methodologies alternately ensue from the point of empirically centered induction. A qualification for conceptualizations is proposed that involves a physically describable form bound to energy in addition to contemporary notions of energy bound to form and a visually based mathematical-physical form is elaborated and discussed with respect to biological and natural processes
Synthesising prosody with insufficient context
Prosody is a key component in human spoken communication, signalling emotion, attitude, information structure, intention, and other communicative functions through perceived variation in intonation, loudness, timing, and voice quality. However, the prosody in text-to-speech (TTS) systems is often monotonous and adds no additional meaning to the text. Synthesising prosody is difficult for several reasons: I focus on three challenges. First, prosody is embedded in the speech signal, making it hard to model with machine learning. Second, there is no clear orthography for prosody, meaning it is underspecified in the input text and making it difficult to directly control. Third, and most importantly, prosody is determined by the context of a speech act, which TTS systems do not, and will never, have complete access to. Without the context, we cannot say if prosody is appropriate or inappropriate. Context is wide ranging, but state-of-the-art TTS acoustic models only have access to phonetic information and limited structural information. Unfortunately, most context is either difficult, expensive, or impos- sible to collect. Thus, fully specified prosodic context will never exist. Given there is insufficient context, prosody synthesis is a one-to-many generative task: it necessitates the ability to produce multiple renditions. To provide this ability, I propose methods for prosody control in TTS, using either explicit prosody features, such as F0 and duration, or learnt prosody representations disentangled from the acoustics. I demonstrate that without control of the prosodic variability in speech, TTS will produce average prosody—i.e. flat and monotonous prosody.
This thesis explores different options for operating these control mechanisms. Random sampling of a learnt distribution of prosody produces more varied and realistic prosody. Alternatively, a human-in-the-loop can operate the control mechanism—using their intuition to choose appropriate prosody. To improve the effectiveness of human-driven control, I design two novel approaches to make control mechanisms more human interpretable. Finally, it is important to take advantage of additional context as it becomes available. I present a novel framework that can incorporate arbitrary additional context, and demonstrate my state-of- the-art context-aware model of prosody using a pre-trained and fine-tuned language model. This thesis demonstrates empirically that appropriate prosody can be synthesised with insufficient context by accounting for unexplained prosodic variation
Linguistic Variation Issues: Case and Agreement in Northern Russian Participial Constructions
This study offers a novel approach to a longstanding problem in Slavic Linguistics, the formal representation of the Northern Russian participial constructions in -n(o)/-t(o). Unlike previous works, the methodological stance adopted by the author focuses on singling out all the relevant patterns of variation and on pursuing a unified explanation for them. The key to the solution of the puzzle is the idea that the participial affix -n-/-t- and the agreement inflections are not just pieces of morphology inserted post-syntactically, but true heads that enter the computation and are able to manipulate the argumental roles of the verb and to check the EPP. The author’s proposal is properly framed in the context of current debate on interlanguage variation
Técnicas de personalización de voces sintéticas para su uso por personas con discapacidad oral
151 p.Esta tesis presenta avances realizados en la personalización de voces sintéticas que emplean los sistemas de conversión de texto a voz utilizados por personas con alguna discapacidad oral. Se presenta un nuevo algoritmo de adaptación de locutor para voces sintéticas basadas en sÃntesis estadÃstico paramétrica. Este algoritmo hace uso únicamente de fragmentos vocálicos para imitar la voz del locutor objetivo y se ha demostrado que es robusto frente a la escasez de datos y que tiene un desempeño similar a otros algoritmos del estado del arte.También se describe el diseño e implementación de un banco de voces en el cual cualquier persona puede realizar grabaciones de su voz real para generar una voz sintética que posteriormente puede ser empleada por otro usuario. De esta manera las personas pueden ¿donar¿ su voz.Por último, se presenta una metodologÃa que hace uso de diversas medidas objetivas de evaluación de señales de voz para puntuar la calidad de las voces disponibles en el banco de voces
Efficient Approaches for Voice Change and Voice Conversion Systems
In this thesis, the study and design of Voice Change and Voice Conversion systems are
presented. Particularly, a voice change system manipulates a speaker’s voice to be perceived
as it is not spoken by this speaker; and voice conversion system modifies a speaker’s voice,
such that it is perceived as being spoken by a target speaker.
This thesis mainly includes two sub-parts. The first part is to develop a low latency and low
complexity voice change system (i.e. includes frequency/pitch scale modification and formant
scale modification algorithms), which can be executed on the smartphones in 2012 with very
limited computational capability. Although some low-complexity voice change algorithms
have been proposed and studied, the real-time implementations are very rare. According to the
experimental results, the proposed voice change system achieves the same quality as the
baseline approach but requires much less computational complexity and satisfies the
requirement of real-time. Moreover, the proposed system has been implemented in C
language and was released as a commercial software application. The second part of this
thesis is to investigate a novel low-complexity voice conversion system (i.e. from a source
speaker A to a target speaker B) that improves the perceptual quality and identity without
introducing large processing latencies. The proposed scheme directly manipulates the
spectrum using an effective and physically motivated method – Continuous Frequency
Warping and Magnitude Scaling (CFWMS) to guarantee high perceptual naturalness and
quality. In addition, a trajectory limitation strategy is proposed to prevent the frame-by-frame
discontinuity to further enhance the speech quality. The experimental results show that the
proposed method outperforms the conventional baseline solutions in terms of either objective
tests or subjective tests
Pathway to Future Symbiotic Creativity
This report presents a comprehensive view of our vision on the development
path of the human-machine symbiotic art creation. We propose a classification
of the creative system with a hierarchy of 5 classes, showing the pathway of
creativity evolving from a mimic-human artist (Turing Artists) to a Machine
artist in its own right. We begin with an overview of the limitations of the
Turing Artists then focus on the top two-level systems, Machine Artists,
emphasizing machine-human communication in art creation. In art creation, it is
necessary for machines to understand humans' mental states, including desires,
appreciation, and emotions, humans also need to understand machines' creative
capabilities and limitations. The rapid development of immersive environment
and further evolution into the new concept of metaverse enable symbiotic art
creation through unprecedented flexibility of bi-directional communication
between artists and art manifestation environments. By examining the latest
sensor and XR technologies, we illustrate the novel way for art data collection
to constitute the base of a new form of human-machine bidirectional
communication and understanding in art creation. Based on such communication
and understanding mechanisms, we propose a novel framework for building future
Machine artists, which comes with the philosophy that a human-compatible AI
system should be based on the "human-in-the-loop" principle rather than the
traditional "end-to-end" dogma. By proposing a new form of inverse
reinforcement learning model, we outline the platform design of machine
artists, demonstrate its functions and showcase some examples of technologies
we have developed. We also provide a systematic exposition of the ecosystem for
AI-based symbiotic art form and community with an economic model built on NFT
technology. Ethical issues for the development of machine artists are also
discussed
Application-driven visual computing towards industry 4.0 2018
245 p.La Tesis recoge contribuciones en tres campos: 1. Agentes Virtuales Interactivos: autónomos, modulares, escalables, ubicuos y atractivos para el usuario. Estos IVA pueden interactuar con los usuarios de manera natural.2. Entornos de RV/RA Inmersivos: RV en la planificación de la producción, el diseño de producto, la simulación de procesos, pruebas y verificación. El Operario Virtual muestra cómo la RV y los Co-bots pueden trabajar en un entorno seguro. En el Operario Aumentado la RA muestra información relevante al trabajador de una manera no intrusiva. 3. Gestión Interactiva de Modelos 3D: gestión online y visualización de modelos CAD multimedia, mediante conversión automática de modelos CAD a la Web. La tecnologÃa Web3D permite la visualización e interacción de estos modelos en dispositivos móviles de baja potencia.Además, estas contribuciones han permitido analizar los desafÃos presentados por Industry 4.0. La tesis ha contribuido a proporcionar una prueba de concepto para algunos de esos desafÃos: en factores humanos, simulación, visualización e integración de modelos
- …