Search CORE

6 research outputs found

Comparison between rule-based and data-driven natural language processing algorithms for Brazilian Portuguese speech synthesis

Author: Vecchietti Luiz Felipe Santos
Publication venue: 'Programa de Pos-graduacao em Ciencias Contabeis da UFRJ'
Publication date: 01/04/2017
Field of study

Due to the exponential growth in the use of computers, personal digital assistants and smartphones, the development of Text-to-Speech (TTS) systems have become highly demanded during the last years. An important part of these systems is the Text Analysis block, that converts the input text into linguistic specifications that are going to be used to generate the final speech waveform. The Natural Language Processing algorithms presented in this block are crucial to the quality of the speech generated by synthesizers. These algorithms are responsible for important tasks such as Grapheme-to-Phoneme Conversion, Syllabification and Stress Determination. For Brazilian Portuguese (BP), solutions for the algorithms presented in the Text Analysis block have been focused in rule-based approaches. These algorithms perform well for BP but have many disadvantages. On the other hand, there is still no research to evaluate and analyze the performance of data-driven approaches that reach state-of-the-art results for complex languages, such as English. So, in this work, we compare different data-driven approaches and rule-based approaches for NLP algorithms presented in a TTS system. Moreover, we propose, as a novel application, the use of Sequence-to-Sequence models as solution for the Syllabification and Stress Determination problems. As a brief summary of the results obtained, we show that data-driven algorithms can achieve state-of-the-art performance for the NLP algorithms presented in the Text Analysis block of a BP TTS system.Nos últimos anos, devido ao grande crescimento no uso de computadores, assistentes pessoais e smartphones, o desenvolvimento de sistemas capazes de converter texto em fala tem sido bastante demandado. O bloco de análise de texto, onde o texto de entrada é convertido em especificações linguísticas usadas para gerar a onda sonora final é uma parte importante destes sistemas. O desempenho dos algoritmos de Processamento de Linguagem Natural (NLP) presentes neste bloco é crucial para a qualidade dos sintetizadores de voz. Conversão Grafema-Fonema, separação silábica e determinação da sílaba tônica são algumas das tarefas executadas por estes algoritmos. Para o Português Brasileiro (BP), os algoritmos baseados em regras têm sido o foco na solução destes problemas. Estes algoritmos atingem bom desempenho para o BP, contudo apresentam diversas desvantagens. Por outro lado, ainda não há pesquisa no intuito de avaliar o desempenho de algoritmos data-driven, largamente utilizados para línguas complexas, como o inglês. Desta forma, expõe-se neste trabalho uma comparação entre diferentes técnicas data-driven e baseadas em regras para algoritmos de NLP utilizados em um sintetizador de voz. Além disso, propõe o uso de Sequence-to-Sequence models para a separação silábica e a determinação da tonicidade. Em suma, o presente trabalho demonstra que o uso de algoritmos data-driven atinge o estado-da-arte na performance dos algoritmos de Processamento de Linguagem Natural de um sintetizador de voz para o Português Brasileiro

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Pantheon

Off-Policy Reinforcement Learning with Loss Function Weighted by Temporal Difference Error

Author: Har Dongsoo
Kim Taeyoung
Moon Woohyeon
Park Bumgeun
Vecchietti Luiz Felipe
Publication venue
Publication date: 26/12/2022
Field of study

Training agents via off-policy deep reinforcement learning (RL) requires a large memory, named replay memory, that stores past experiences used for learning. These experiences are sampled, uniformly or non-uniformly, to create the batches used for training. When calculating the loss function, off-policy algorithms assume that all samples are of the same importance. In this paper, we hypothesize that training can be enhanced by assigning different importance for each experience based on their temporal-difference (TD) error directly in the training objective. We propose a novel method that introduces a weighting factor for each experience when calculating the loss function at the learning stage. In addition to improving convergence speed when used with uniform sampling, the method can be combined with prioritization methods for non-uniform sampling. Combining the proposed method with prioritization methods improves sampling efficiency while increasing the performance of TD-based off-policy RL algorithms. The effectiveness of the proposed method is demonstrated by experiments in six environments of the OpenAI Gym suite. The experimental results demonstrate that the proposed method achieves a 33%~76% reduction of convergence speed in three environments and an 11% increase in returns and a 3%~10% increase in success rate for other three environments.Comment: to be submitted to an AI conferenc

arXiv.org e-Print Archive

A Lightweight Domain Adaptive Absolute Pose Regressor Using Barlow Twins Objective

Author: Har Dongsoo
Lai-Dang Quoc-Vinh
Rajendran Praveen Kumar
Vecchietti Luiz Felipe
Publication venue
Publication date: 20/11/2022
Field of study

Identifying the camera pose for a given image is a challenging problem with applications in robotics, autonomous vehicles, and augmented/virtual reality. Lately, learning-based methods have shown to be effective for absolute camera pose estimation. However, these methods are not accurate when generalizing to different domains. In this paper, a domain adaptive training framework for absolute pose regression is introduced. In the proposed framework, the scene image is augmented for different domains by using generative methods to train parallel branches using Barlow Twins objective. The parallel branches leverage a lightweight CNN-based absolute pose regressor architecture. Further, the efficacy of incorporating spatial and channel-wise attention in the regression head for rotation prediction is investigated. Our method is evaluated with two datasets, Cambridge landmarks and 7Scenes. The results demonstrate that, even with using roughly 24 times fewer FLOPs, 12 times fewer activations, and 5 times fewer parameters than MS-Transformer, our approach outperforms all the CNN-based architectures and achieves performance comparable to transformer-based architectures. Our method ranks 2nd and 4th with the Cambridge Landmarks and 7Scenes datasets, respectively. In addition, for augmented domains not encountered during training, our approach significantly outperforms the MS-transformer. Furthermore, it is shown that our domain adaptive framework achieves better performance than the single branch model trained with the identical CNN backbone with all instances of the unseen distribution.Comment: [draft-v1] 18 pages, 8 figures, and 10 table

arXiv.org e-Print Archive

Power Management of Nanogrid Cluster with P2P Electricity Trading Based on Future Trends of Load Demand and PV Power Production

Author: Har Dongsoo
Hong Junhee
Jin Hojun
Lee Sangkeum
Park Ki-Bum
Vecchietti Luiz Felipe
Publication venue
Publication date: 02/12/2020
Field of study

This paper presents the power management of the nanogrid clusters assisted by a novel peer-to-peer(P2P) electricity trading. In our work, unbalance of power consumption among clusters is mitigated by the proposed P2P trading method. For power management of individual clusters, multi-objective optimization simultaneously minimizing total power consumption, portion of grid power consumption, and total delay incurred by scheduling is attempted. A renewable power source photovoltaic(PV) system is adopted for each cluster as a secondary source. The temporal surplus of self-supply PV power of a cluster can be sold through P2P trading to another cluster (s) experiencing temporal power shortage. The cluster in temporal shortage of electric power buys the PV power to reduce peak load and total delay. In P2P trading, a cooperative game model is used for buyers and sellers to maximize their welfare. To increase P2P trading efficiency, future trends of load demand and PV power production are considered for power management of each cluster to resolve instantaneous unbalance between load demand and PV power production. To this end, a gated recurrent unit network is used to forecast future load demand and future PV power production. Simulations verify the effectiveness of the proposed P2P trading for nanogrid clusters.Comment: This article is submitted for publication in Sustainable Cities and Societ

arXiv.org e-Print Archive

Pose Estimation Utilizing a Gated Recurrent Unit Network for Visual Localization

Author: Dongsoo Har
Inhwan Kim
Luiz Felipe Vecchietti
Sungkwan Kim
Publication venue: 'MDPI AG'
Publication date: 11/12/2020
Field of study

Lately, pose estimation based on learning-based Visual Odometry (VO) methods, where raw image data are provided as the input of a neural network to get 6 Degrees of Freedom (DoF) information, has been intensively investigated. Despite its recent advances, learning-based VO methods still perform worse than the classical VO that consists of feature-based VO methods and direct VO methods. In this paper, a new pose estimation method with the help of a Gated Recurrent Unit (GRU) network trained by pose data acquired by an accurate sensor is proposed. The historical trajectory data of the yaw angle are provided to the GRU network to get a yaw angle at the current timestep. The proposed method can be easily combined with other VO methods to enhance the overall performance via an ensemble of predicted results. Pose estimation using the proposed method is especially advantageous in the cornering section which often introduces an estimation error. The performance is improved by reconstructing the rotation matrix using a yaw angle that is the fusion of the yaw angles estimated from the proposed GRU network and other VO methods. The KITTI dataset is utilized to train the network. On average, regarding the KITTI sequences, performance is improved as much as 1.426% in terms of translation error and 0.805 deg/100 m in terms of rotation error

Multidisciplinary Digital Publishing Institute