Search CORE

7 research outputs found

An RNN-based Quantized F0 Model with Multi-tier Feedback Links for Text-to-Speech Synthesis

Author: Takaki Shinji
Wang Xin
Yamagishi Junichi
Publication venue: 'International Speech Communication Association'
Publication date: 24/08/2017
Field of study

Crossref

Edinburgh Research Explorer

Descripción automática de imágenes

Author: Pallarés Font de Mora Pablo
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 20/10/2021
Field of study

[ES] El propósito de este Trabajo es el estudio, implementación, y desarrollo de sistemas, basados en Deep Learning, orientados a la generación automática de descripciones de imágenes o Image Captioning. Este campo aúna las áreas del Procesamiento del Lenguaje Natural (PLN), y de la Visión por Computador (VPC). Antes proceder a la implementación, se ha realizado un análisis de los diferentes enfoques utilizados para abordar esta tarea, los corpus disponibles formato: [Imagen - Caption/s], y las arquitecturas o modelos utilizados. Tras este análisis, se ha optado, inicialmente, por abordarlo desde el enfoque más usual: basado en modelos del lenguaje, con una arquitectura Codificador-Decodificador. Para ello, se realiza una codificación de: las descripciones (captions) a un espacio vectorial de Embeddings Word2Vec, por una parte, y, por otra, las imágenes haciendo uso redes convolucionales CNN. Con esta información codificada, el Decodificador es el encargado de aprender un Modelo del Lenguaje con redes neuronales recurrentes RNN capaz de Generar descripciones. Las diferentes implementaciones de este trabajo se han realizado bajo la plataforma de software Python, empleando la biblioteca de código abierto TensorFlow, orientada al entrenamiento de modelos de Aprendizaje automático, y el framework de alto nivel para el aprendizaje, Keras.[EN] The purpose of this work is the study, implementation, and development of Deep Learning systems, oriented to Image Captioning. This field combines the areas of Natural Language Processing (NLP) and Computer Vision (CV). Before proceeding to the implementation, an analysis of the different approaches used to tackle this task has been carried out, the available corpora with format: [Image - Caption/s], and the architectures or models used. After this analysis, it has been chosen, initially, to approach it from the most usual approach: based on language models, with an Encoder-Decoder architecture. For this purpose, the descriptions (captions) are encoded in a vector space of Word2Vec Embeddings on the one hand, and on the other hand, the images using CNN convolutional networks. With this encoded information, the Decoder is in charge of learning a Language Model with Recurrent Neural Network RNN capable of generating descriptions. The different implementations of this work have been carried out under the Python software platform, using the open-source library TensorFlow, oriented to the training of Machine Learning models, and the high-level framework for learning, Keras.Pallarés Font De Mora, P. (2021). Descripción automática de imágenes. Universitat Politècnica de València. http://hdl.handle.net/10251/175035TFG

RiuNet