26 research outputs found

    New method for mathematical modelling of human visual speech

    Get PDF
    Audio-visual speech recognition and visual speech synthesisers are used as interfaces between humans and machines. Such interactions specifically rely on the analysis and synthesis of both audio and visual information, which humans use for face-to-face communication. Currently, there is no global standard to describe these interactions nor is there a standard mathematical tool to describe lip movements. Furthermore, the visual lip movement for each phoneme is considered in isolation rather than a continuation from one to another. Consequently, there is no globally accepted standard method for representing lip movement during articulation. This thesis addresses these issues by designing a transcribed group of words, by mathematical formulas, and so introducing the concept of a visual word, allocating signatures to visual words and finally building a visual speech vocabulary database. In addition, visual speech information has been analysed in a novel way by considering both lip movements and phonemic structure of the English language. In order to extract the visual data, three visual features on the lip have been chosen; these are on the outer upper, lower and corner of the lip. The extracted visual data during articulation is called the visual speech sample set. The final visual data is obtained after processing the visual speech sample sets to correct experimented artefacts such as head tilting, which happened during articulation and visual data extraction. The ‘Barycentric Lagrange Interpolation’ (BLI) formulates the visual speech sample sets into visual speech signals. The visual word is defined in this work and consists of the variation of three visual features. Further processing on relating the visual speech signals to the uttered word leads to the allocation of signatures that represent the visual word. This work suggests the visual word signature can be used either as a ‘visual word barcode’, a ‘digital visual word’ or a ‘2D/3D representations’. The 2D version of the visual word provides a unique signature that allows the identification of the words being uttered. In addition, identification of visual words has also been performed using a technique called ‘volumetric representations of the visual words’. Furthermore, the effect of altering the amplitudes and sampling rate for BLI has been evaluated. In addition, the performance of BLI in reconstructing the visual speech sample sets has been considered. Finally, BLI has been compared to signal reconstruction approach by RMSE and correlation coefficients. The results show that the BLI is the more reliable method for the purpose of this work according to Section 7.7

    New method for mathematical modelling of human visual speech

    Get PDF
    Audio-visual speech recognition and visual speech synthesisers are used as interfaces between humans and machines. Such interactions specifically rely on the analysis and synthesis of both audio and visual information, which humans use for face-to-face communication. Currently, there is no global standard to describe these interactions nor is there a standard mathematical tool to describe lip movements. Furthermore, the visual lip movement for each phoneme is considered in isolation rather than a continuation from one to another. Consequently, there is no globally accepted standard method for representing lip movement during articulation. This thesis addresses these issues by designing a transcribed group of words, by mathematical formulas, and so introducing the concept of a visual word, allocating signatures to visual words and finally building a visual speech vocabulary database. In addition, visual speech information has been analysed in a novel way by considering both lip movements and phonemic structure of the English language. In order to extract the visual data, three visual features on the lip have been chosen; these are on the outer upper, lower and corner of the lip. The extracted visual data during articulation is called the visual speech sample set. The final visual data is obtained after processing the visual speech sample sets to correct experimented artefacts such as head tilting, which happened during articulation and visual data extraction. The ‘Barycentric Lagrange Interpolation’ (BLI) formulates the visual speech sample sets into visual speech signals. The visual word is defined in this work and consists of the variation of three visual features. Further processing on relating the visual speech signals to the uttered word leads to the allocation of signatures that represent the visual word. This work suggests the visual word signature can be used either as a ‘visual word barcode’, a ‘digital visual word’ or a ‘2D/3D representations’. The 2D version of the visual word provides a unique signature that allows the identification of the words being uttered. In addition, identification of visual words has also been performed using a technique called ‘volumetric representations of the visual words’. Furthermore, the effect of altering the amplitudes and sampling rate for BLI has been evaluated. In addition, the performance of BLI in reconstructing the visual speech sample sets has been considered. Finally, BLI has been compared to signal reconstruction approach by RMSE and correlation coefficients. The results show that the BLI is the more reliable method for the purpose of this work according to Section 7.7

    Hidden Markov model based visual speech recognition

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Using sentence context and mouth cues to aid speech comprehension: an electroencephalographic study on Cochlear Implant users

    Get PDF
    The research project presented in the thesis explores the electrophysiological correlates of linguistic prediction and audio-visual speech processing in deaf people with Cochlear Implant (CI) and people with normal hearing, in order to explore possible group differences. We implement an experimental paradigm in which participants observe audio-visual speech stimuli that vary for predictability of the last word of the sentence (i.e. the target) and visibility of mouth articulatory movements. During the procedure, we record the electroencephalographic signal (EEG) in order to compare the different experimental conditions in terms of neural oscillations and Event Related Potential (ERP) response to the target word. We also administrate linguistic tests to participants in order to relate behavioural performance to the electrophysiological results. The thesis presents a theoretical overview on prediction in language comprehension, the neural correlates of prediction and audio-visual speech integration and previous studies exploring these processes in CI users. Then, it presents the methods used in the experiment and preliminary data from a subgroup of participants with CI

    Animation and Interaction of Responsive, Expressive, and Tangible 3D Virtual Characters

    Get PDF
    This thesis is framed within the field of 3D Character Animation. Virtual characters are used in many Human Computer Interaction applications such as video games and serious games. Within these virtual worlds they move and act in similar ways to humans controlled by users through some form of interface or by artificial intelligence. This work addresses the challenges of developing smoother movements and more natural behaviors driving motions in real-time, intuitively, and accurately. The interaction between virtual characters and intelligent objects will also be explored. With these subjects researched the work will contribute to creating more responsive, expressive, and tangible virtual characters. The navigation within virtual worlds uses locomotion such as walking, running, etc. To achieve maximum realism, actors' movements are captured and used to animate virtual characters. This is the philosophy of motion graphs: a structure that embeds movements where the continuous motion stream is generated from concatenating motion pieces. However, locomotion synthesis, using motion graphs, involves a tradeoff between the number of possible transitions between different kinds of locomotion, and the quality of these, meaning smooth transition between poses. To overcome this drawback, we propose the method of progressive transitions using Body Part Motion Graphs (BPMGs). This method deals with partial movements, and generates specific, synchronized transitions for each body part (group of joints) within a window of time. Therefore, the connectivity within the system is not linked to the similarity between global poses allowing us to find more and better quality transition points while increasing the speed of response and execution of these transitions in contrast to standard motion graphs method. Secondly, beyond getting faster transitions and smoother movements, virtual characters also interact with each other and with users by speaking. This interaction requires the creation of appropriate gestures according to the voice that they reproduced. Gestures are the nonverbal language that accompanies voiced language. The credibility of virtual characters when speaking is linked to the naturalness of their movements in sync with the voice in speech and intonation. Consequently, we analyzed the relationship between gestures, speech, and the performed gestures according to that speech. We defined intensity indicators for both gestures (GSI, Gesture Strength Indicator) and speech (PSI, Pitch Strength Indicator). We studied the relationship in time and intensity of these cues in order to establish synchronicity and intensity rules. Later we adapted the mentioned rules to select the appropriate gestures to the speech input (tagged text from speech signal) in the Gesture Motion Graph (GMG). The evaluation of resulting animations shows the importance of relating the intensity of speech and gestures to generate believable animations beyond time synchronization. Subsequently, we present a system that leads automatic generation of gestures and facial animation from a speech signal: BodySpeech. This system also includes animation improvements such as: increased use of data input, more flexible time synchronization, and new features like editing style of output animations. In addition, facial animation also takes into account speech intonation. Finally, we have moved virtual characters from virtual environments to the physical world in order to explore their interaction possibilities with real objects. To this end, we present AvatARs, virtual characters that have tangible representation and are integrated into reality through augmented reality apps on mobile devices. Users choose a physical object to manipulate in order to control the animation. They can select and configure the animation, which serves as a support for the virtual character represented. Then, we explored the interaction of AvatARs with intelligent physical objects like the Pleo social robot. Pleo is used to assist hospitalized children in therapy or simply for playing. Despite its benefits, there is a lack of emotional relationship and interaction between the children and Pleo which makes children lose interest eventually. This is why we have created a mixed reality scenario where Vleo (AvatAR as Pleo, virtual element) and Pleo (real element) interact naturally. This scenario has been tested and the results conclude that AvatARs enhances children's motivation to play with Pleo, opening a new horizon in the interaction between virtual characters and robots.Aquesta tesi s'emmarca dins del món de l'animació de personatges virtuals tridimensionals. Els personatges virtuals s'utilitzen en moltes aplicacions d'interacció home màquina, com els videojocs o els serious games, on es mouen i actuen de forma similar als humans dins de mons virtuals, i on són controlats pels usuaris per mitjà d'alguna interfície, o d'altra manera per sistemes intel·ligents. Reptes com aconseguir moviments fluids i comportament natural, controlar en temps real el moviment de manera intuitiva i precisa, i inclús explorar la interacció dels personatges virtuals amb elements físics intel·ligents; són els que es treballen a continuació amb l'objectiu de contribuir en la generació de personatges virtuals responsius, expressius i tangibles. La navegació dins dels mons virtuals fa ús de locomocions com caminar, córrer, etc. Per tal d'aconseguir el màxim de realisme, es capturen i reutilitzen moviments d'actors per animar els personatges virtuals. Així funcionen els motion graphs, una estructura que encapsula moviments i per mitjà de cerques dins d'aquesta, els concatena creant un flux continu. La síntesi de locomocions usant els motion graphs comporta un compromís entre el número de transicions entre les diferents locomocions, i la qualitat d'aquestes (similitud entre les postures a connectar). Per superar aquest inconvenient, proposem el mètode transicions progressives usant Body Part Motion Graphs (BPMGs). Aquest mètode tracta els moviments de manera parcial, i genera transicions específiques i sincronitzades per cada part del cos (grup d'articulacions) dins d'una finestra temporal. Per tant, la conectivitat del sistema no està lligada a la similitud de postures globals, permetent trobar més punts de transició i de més qualitat, i sobretot incrementant la rapidesa en resposta i execució de les transicions respecte als motion graphs estàndards. En segon lloc, més enllà d'aconseguir transicions ràpides i moviments fluids, els personatges virtuals també interaccionen entre ells i amb els usuaris parlant, creant la necessitat de generar moviments apropiats a la veu que reprodueixen. Els gestos formen part del llenguatge no verbal que acostuma a acompanyar a la veu. La credibilitat dels personatges virtuals parlants està lligada a la naturalitat dels seus moviments i a la concordança que aquests tenen amb la veu, sobretot amb l'entonació d'aquesta. Així doncs, hem realitzat l'anàlisi de la relació entre els gestos i la veu, i la conseqüent generació de gestos d'acord a la veu. S'han definit indicadors d'intensitat tant per gestos (GSI, Gesture Strength Indicator) com per la veu (PSI, Pitch Strength Indicator), i s'ha estudiat la relació entre la temporalitat i la intensitat de les dues senyals per establir unes normes de sincronia temporal i d'intensitat. Més endavant es presenta el Gesture Motion Graph (GMG), que selecciona gestos adients a la veu d'entrada (text anotat a partir de la senyal de veu) i les regles esmentades. L'avaluació de les animaciones resultants demostra la importància de relacionar la intensitat per generar animacions cre\"{ibles, més enllà de la sincronització temporal. Posteriorment, presentem un sistema de generació automàtica de gestos i animació facial a partir d'una senyal de veu: BodySpeech. Aquest sistema també inclou millores en l'animació, major reaprofitament de les dades d'entrada i sincronització més flexible, i noves funcionalitats com l'edició de l'estil les animacions de sortida. A més, l'animació facial també té en compte l'entonació de la veu. Finalment, s'han traslladat els personatges virtuals dels entorns virtuals al món físic per tal d'explorar les possibilitats d'interacció amb objectes reals. Per aquest fi, presentem els AvatARs, personatges virtuals que tenen representació tangible i que es visualitzen integrats en la realitat a través d'un dispositiu mòbil gràcies a la realitat augmentada. El control de l'animació es duu a terme per mitjà d'un objecte físic que l'usuari manipula, seleccionant i parametritzant les animacions, i que al mateix temps serveix com a suport per a la representació del personatge virtual. Posteriorment, s'ha explorat la interacció dels AvatARs amb objectes físics intel·ligents com el robot social Pleo. El Pleo s'utilitza per a assistir a nens hospitalitzats en teràpia o simplement per jugar. Tot i els seus beneficis, hi ha una manca de relació emocional i interacció entre els nens i el Pleo que amb el temps fa que els nens perdin l'interès en ell. Així doncs, hem creat un escenari d'interacció mixt on el Vleo (un AvatAR en forma de Pleo; element virtual) i el Pleo (element real) interactuen de manera natural. Aquest escenari s'ha testejat i els resultats conclouen que els AvatARs milloren la motivació per jugar amb el Pleo, obrint un nou horitzó en la interacció dels personatges virtuals amb robots.Esta tesis se enmarca dentro del mundo de la animación de personajes virtuales tridimensionales. Los personajes virtuales se utilizan en muchas aplicaciones de interacción hombre máquina, como los videojuegos y los serious games, donde dentro de mundo virtuales se mueven y actúan de manera similar a los humanos, y son controlados por usuarios por mediante de alguna interfaz, o de otro modo, por sistemas inteligentes. Retos como conseguir movimientos fluidos y comportamiento natural, controlar en tiempo real el movimiento de manera intuitiva y precisa, y incluso explorar la interacción de los personajes virtuales con elementos físicos inteligentes; son los que se trabajan a continuación con el objetivo de contribuir en la generación de personajes virtuales responsivos, expresivos y tangibles. La navegación dentro de los mundos virtuales hace uso de locomociones como andar, correr, etc. Para conseguir el máximo realismo, se capturan y reutilizan movimientos de actores para animar los personajes virtuales. Así funcionan los motion graphs, una estructura que encapsula movimientos y que por mediante búsquedas en ella, los concatena creando un flujo contínuo. La síntesi de locomociones usando los motion graphs comporta un compromiso entre el número de transiciones entre las distintas locomociones, y la calidad de estas (similitud entre las posturas a conectar). Para superar este inconveniente, proponemos el método transiciones progresivas usando Body Part Motion Graphs (BPMGs). Este método trata los movimientos de manera parcial, y genera transiciones específicas y sincronizadas para cada parte del cuerpo (grupo de articulaciones) dentro de una ventana temporal. Por lo tanto, la conectividad del sistema no está vinculada a la similitud de posturas globales, permitiendo encontrar más puntos de transición y de más calidad, incrementando la rapidez en respuesta y ejecución de las transiciones respeto a los motion graphs estándards. En segundo lugar, más allá de conseguir transiciones rápidas y movimientos fluídos, los personajes virtuales también interaccionan entre ellos y con los usuarios hablando, creando la necesidad de generar movimientos apropiados a la voz que reproducen. Los gestos forman parte del lenguaje no verbal que acostumbra a acompañar a la voz. La credibilidad de los personajes virtuales parlantes está vinculada a la naturalidad de sus movimientos y a la concordancia que estos tienen con la voz, sobretodo con la entonación de esta. Así pues, hemos realizado el análisis de la relación entre los gestos y la voz, y la consecuente generación de gestos de acuerdo a la voz. Se han definido indicadores de intensidad tanto para gestos (GSI, Gesture Strength Indicator) como para la voz (PSI, Pitch Strength Indicator), y se ha estudiado la relación temporal y de intensidad para establecer unas reglas de sincronía temporal y de intensidad. Más adelante se presenta el Gesture Motion Graph (GMG), que selecciona gestos adientes a la voz de entrada (texto etiquetado a partir de la señal de voz) y las normas mencionadas. La evaluación de las animaciones resultantes demuestra la importancia de relacionar la intensidad para generar animaciones creíbles, más allá de la sincronización temporal. Posteriormente, presentamos un sistema de generación automática de gestos y animación facial a partir de una señal de voz: BodySpeech. Este sistema también incluye mejoras en la animación, como un mayor aprovechamiento de los datos de entrada y una sincronización más flexible, y nuevas funcionalidades como la edición del estilo de las animaciones de salida. Además, la animación facial también tiene en cuenta la entonación de la voz. Finalmente, se han trasladado los personajes virtuales de los entornos virtuales al mundo físico para explorar las posibilidades de interacción con objetos reales. Para este fin, presentamos los AvatARs, personajes virtuales que tienen representación tangible y que se visualizan integrados en la realidad a través de un dispositivo móvil gracias a la realidad aumentada. El control de la animación se lleva a cabo mediante un objeto físico que el usuario manipula, seleccionando y configurando las animaciones, y que a su vez sirve como soporte para la representación del personaje. Posteriormente, se ha explorado la interacción de los AvatARs con objetos físicos inteligentes como el robot Pleo. Pleo se utiliza para asistir a niños en terapia o simplemente para jugar. Todo y sus beneficios, hay una falta de relación emocional y interacción entre los niños y Pleo que con el tiempo hace que los niños pierdan el interés. Así pues, hemos creado un escenario de interacción mixto donde Vleo (AvatAR en forma de Pleo; virtual) y Pleo (real) interactúan de manera natural. Este escenario se ha testeado y los resultados concluyen que los AvatARs mejoran la motivación para jugar con Pleo, abriendo un nuevo horizonte en la interacción de los personajes virtuales con robots

    Developing an Affect-Aware Rear-Projected Robotic Agent

    Get PDF
    Social (or Sociable) robots are designed to interact with people in a natural and interpersonal manner. They are becoming an integrated part of our daily lives and have achieved positive outcomes in several applications such as education, health care, quality of life, entertainment, etc. Despite significant progress towards the development of realistic social robotic agents, a number of problems remain to be solved. First, current social robots either lack enough ability to have deep social interaction with human, or they are very expensive to build and maintain. Second, current social robots have yet to reach the full emotional and social capabilities necessary for rich and robust interaction with human beings. To address these problems, this dissertation presents the development of a low-cost, flexible, affect-aware rear-projected robotic agent (called ExpressionBot), that is designed to support verbal and non-verbal communication between the robot and humans, with the goal of closely modeling the dynamics of natural face-to-face communication. The developed robotic platform uses state-of-the-art character animation technologies to create an animated human face (aka avatar) that is capable of showing facial expressions, realistic eye movement, and accurate visual speech, and then project this avatar onto a face-shaped translucent mask. The mask and the projector are then rigged onto a neck mechanism that can move like a human head. Since an animation is projected onto a mask, the robotic face is highly flexible research tool, mechanically simple, and low-cost to design, build and maintain compared with mechatronic and android faces. The results of our comprehensive Human-Robot Interaction (HRI) studies illustrate the benefits and values of the proposed rear-projected robotic platform over a virtual-agent with the same animation displayed on a 2D computer screen. The results indicate that ExpressionBot is well accepted by users, with some advantages in expressing facial expressions more accurately and perceiving mutual eye gaze contact. To improve social capabilities of the robot and create an expressive and empathic social agent (affect-aware) which is capable of interpreting users\u27 emotional facial expressions, we developed a new Deep Neural Networks (DNN) architecture for Facial Expression Recognition (FER). The proposed DNN was initially trained on seven well-known publicly available databases, and obtained significantly better than, or comparable to, traditional convolutional neural networks or other state-of-the-art methods in both accuracy and learning time. Since the performance of the automated FER system highly depends on its training data, and the eventual goal of the proposed robotic platform is to interact with users in an uncontrolled environment, a database of facial expressions in the wild (called AffectNet) was created by querying emotion-related keywords from different search engines. AffectNet contains more than 1M images with faces and 440,000 manually annotated images with facial expressions, valence, and arousal. Two DNNs were trained on AffectNet to classify the facial expression images and predict the value of valence and arousal. Various evaluation metrics show that our deep neural network approaches trained on AffectNet can perform better than conventional machine learning methods and available off-the-shelf FER systems. We then integrated this automated FER system into spoken dialog of our robotic platform to extend and enrich the capabilities of ExpressionBot beyond spoken dialog and create an affect-aware robotic agent that can measure and infer users\u27 affect and cognition. Three social/interaction aspects (task engagement, being empathic, and likability of the robot) are measured in an experiment with the affect-aware robotic agent. The results indicate that users rated our affect-aware agent as empathic and likable as a robot in which user\u27s affect is recognized by a human (WoZ). In summary, this dissertation presents the development and HRI studies of a perceptive, and expressive, conversational, rear-projected, life-like robotic agent (aka ExpressionBot or Ryan) that models natural face-to-face communication between human and emapthic agent. The results of our in-depth human-robot-interaction studies show that this robotic agent can serve as a model for creating the next generation of empathic social robots

    High-quality face capture, animation and editing from monocular video

    Get PDF
    Digitization of virtual faces in movies requires complex capture setups and extensive manual work to produce superb animations and video-realistic editing. This thesis pushes the boundaries of the digitization pipeline by proposing automatic algorithms for high-quality 3D face capture and animation, as well as photo-realistic face editing. These algorithms reconstruct and modify faces in 2D videos recorded in uncontrolled scenarios and illumination. In particular, advances in three main areas offer solutions for the lack of depth and overall uncertainty in video recordings. First, contributions in capture include model-based reconstruction of detailed, dynamic 3D geometry that exploits optical and shading cues, multilayer parametric reconstruction of accurate 3D models in unconstrained setups based on inverse rendering, and regression-based 3D lip shape enhancement from high-quality data. Second, advances in animation are video-based face reenactment based on robust appearance metrics and temporal clustering, performance-driven retargeting of detailed facial models in sync with audio, and the automatic creation of personalized controllable 3D rigs. Finally, advances in plausible photo-realistic editing are dense face albedo capture and mouth interior synthesis using image warping and 3D teeth proxies. High-quality results attained on challenging application scenarios confirm the contributions and show great potential for the automatic creation of photo-realistic 3D faces.Die Digitalisierung von Gesichtern zum Einsatz in der Filmindustrie erfordert komplizierte Aufnahmevorrichtungen und die manuelle Nachbearbeitung von Rekonstruktionen, um perfekte Animationen und realistische Videobearbeitung zu erzielen. Diese Dissertation erweitert vorhandene Digitalisierungsverfahren durch die Erforschung von automatischen Verfahren zur qualitativ hochwertigen 3D Rekonstruktion, Animation und Modifikation von Gesichtern. Diese Algorithmen erlauben es, Gesichter in 2D Videos, die unter allgemeinen Bedingungen und unbekannten Beleuchtungsverhältnissen aufgenommen wurden, zu rekonstruieren und zu modifizieren. Vor allem Fortschritte in den folgenden drei Hauptbereichen tragen zur Kompensation von fehlender Tiefeninformation und der allgemeinen Mehrdeutigkeit von 2D Videoaufnahmen bei. Erstens, Beiträge zur modellbasierten Rekonstruktion von detaillierter und dynamischer 3D Geometrie durch optische Merkmale und die Shading-Eigenschaften des Gesichts, mehrschichtige parametrische Rekonstruktion von exakten 3D Modellen mittels inversen Renderings in allgemeinen Szenen und regressionsbasierter 3D Lippenformverfeinerung mittels qualitativ hochwertigen Daten. Zweitens, Fortschritte im Bereich der Computeranimation durch videobasierte Gesichtsausdrucksübertragung und temporaler Clusterbildung, Übertragung von detaillierten Gesichtsmodellen, deren Mundbewegung mit Ton synchronisiert ist, und die automatische Erstellung von personalisierten "3D Face Rigs". Schließlich werden Fortschritte im Bereich der realistischen Videobearbeitung vorgestellt, welche auf der dichten Rekonstruktion von Hautreflektionseigenschaften und der Mundinnenraumsynthese mittels bildbasierten und geometriebasierten Verfahren aufbauen. Qualitativ hochwertige Ergebnisse in anspruchsvollen Anwendungen untermauern die Wichtigkeit der geleisteten Beiträgen und zeigen das große Potential der automatischen Erstellung von realistischen digitalen 3D Gesichtern auf

    IberSPEECH 2020: XI Jornadas en Tecnología del Habla and VII Iberian SLTech

    Get PDF
    IberSPEECH2020 is a two-day event, bringing together the best researchers and practitioners in speech and language technologies in Iberian languages to promote interaction and discussion. The organizing committee has planned a wide variety of scientific and social activities, including technical paper presentations, keynote lectures, presentation of projects, laboratories activities, recent PhD thesis, discussion panels, a round table, and awards to the best thesis and papers. The program of IberSPEECH2020 includes a total of 32 contributions that will be presented distributed among 5 oral sessions, a PhD session, and a projects session. To ensure the quality of all the contributions, each submitted paper was reviewed by three members of the scientific review committee. All the papers in the conference will be accessible through the International Speech Communication Association (ISCA) Online Archive. Paper selection was based on the scores and comments provided by the scientific review committee, which includes 73 researchers from different institutions (mainly from Spain and Portugal, but also from France, Germany, Brazil, Iran, Greece, Hungary, Czech Republic, Ucrania, Slovenia). Furthermore, it is confirmed to publish an extension of selected papers as a special issue of the Journal of Applied Sciences, “IberSPEECH 2020: Speech and Language Technologies for Iberian Languages”, published by MDPI with fully open access. In addition to regular paper sessions, the IberSPEECH2020 scientific program features the following activities: the ALBAYZIN evaluation challenge session.Red Española de Tecnologías del Habla. Universidad de Valladoli

    Augmented Reality

    Get PDF
    Augmented Reality (AR) is a natural development from virtual reality (VR), which was developed several decades earlier. AR complements VR in many ways. Due to the advantages of the user being able to see both the real and virtual objects simultaneously, AR is far more intuitive, but it's not completely detached from human factors and other restrictions. AR doesn't consume as much time and effort in the applications because it's not required to construct the entire virtual scene and the environment. In this book, several new and emerging application areas of AR are presented and divided into three sections. The first section contains applications in outdoor and mobile AR, such as construction, restoration, security and surveillance. The second section deals with AR in medical, biological, and human bodies. The third and final section contains a number of new and useful applications in daily living and learning
    corecore