Search CORE

5 research outputs found

Multilingual video dubbing—a technology review and current challenges

Author: Dan Bigioi
Peter Corcoran
Publication venue: Frontiers Media S.A.
Publication date: 01/09/2023
Field of study

The proliferation of multi-lingual content on today’s streaming services has created a need for automated multi-lingual dubbing tools. In this article, current state-of-the-art approaches are discussed with reference to recent works in automatic dubbing and the closely related field of talking head generation. A taxonomy of papers within both fields is presented, and the main challenges of both speech-driven automatic dubbing, and talking head generation are discussed and outlined, together with proposals for future research to tackle these issues

Directory of Open Access Journals

Synthetic Speaking Children -- Why We Need Them and How to Make Them

Author: Bigioi Dan
Corcoran Peter
Farooq Muhammad Ali
Jain Rishabh
Yao Wang
Yiwere Mariam
Publication venue
Publication date: 08/11/2023
Field of study

Contemporary Human Computer Interaction (HCI) research relies primarily on neural network models for machine vision and speech understanding of a system user. Such models require extensively annotated training datasets for optimal performance and when building interfaces for users from a vulnerable population such as young children, GDPR introduces significant complexities in data collection, management, and processing. Motivated by the training needs of an Edge AI smart toy platform this research explores the latest advances in generative neural technologies and provides a working proof of concept of a controllable data generation pipeline for speech driven facial training data at scale. In this context, we demonstrate how StyleGAN2 can be finetuned to create a gender balanced dataset of children's faces. This dataset includes a variety of controllable factors such as facial expressions, age variations, facial poses, and even speech-driven animations with realistic lip synchronization. By combining generative text to speech models for child voice synthesis and a 3D landmark based talking heads pipeline, we can generate highly realistic, entirely synthetic, talking child video clips. These video clips can provide valuable, and controllable, synthetic training data for neural network models, bridging the gap when real data is scarce or restricted due to privacy regulations.Comment: Presented at SpeD 2

arXiv.org e-Print Archive

Corporate Governance in Emerging Economies: The Case of Romania

Author: Adrian Doru BIGIOI
Liliana FELEAGĂ
Niculae FELEAGĂ
Voicu Dan DRAGOMIR
Publication venue: General Association of Economists from Romania
Publication date: 01/09/2011
Field of study

In Romania corporate governance has emerged beginning with the early 2000s. The delay is explainable by the difficult steps taken on the line of political, legal, economic and social reform. In recent years, however, the corporate governance environment in Romania has changed. Transparency and accountability have become key factors not only for shareholders, but also for investors, buyers, suppliers, and other stakeholders. In this context, it is worth to consider, based on statistical data, the degree of development of corporate governance in Romania. The selected indicators are linked to attributes of the Board of directors, in particular Board structure, size, independence, frequency of meetings, and other factors. The sources used are based on the official data published by companies listed on the Bucharest Stock Exchange (BSE). The results will be compared with results of other case studies of emerging countries and the European best practice

Directory of Open Access Journals

Corporate Governance in Emerging Economies: The Case of Romania

Author: Adrian Doru BIGIOI
Liliana FELEAGĂ
Niculae FELEAGĂ
Voicu Dan DRAGOMIR
Publication venue
Publication date
Field of study

Research Papers in Economics

Speech Driven Video Editing via an Audio-Conditioned Diffusion Model

Author: Basak Shubhajit
Bigioi Dan
Corcoran Peter
Jordan Hugh
McDonnell Rachel
Stypułkowski Michał
Zięba Maciej
Publication venue
Publication date: 11/05/2023
Field of study

Taking inspiration from recent developments in visual generative tasks using diffusion models, we propose a method for end-to-end speech-driven video editing using a denoising diffusion model. Given a video of a talking person, and a separate auditory speech recording, the lip and jaw motions are re-synchronized without relying on intermediate structural representations such as facial landmarks or a 3D face model. We show this is possible by conditioning a denoising diffusion model on audio mel spectral features to generate synchronised facial motion. Proof of concept results are demonstrated on both single-speaker and multi-speaker video editing, providing a baseline model on the CREMA-D audiovisual data set. To the best of our knowledge, this is the first work to demonstrate and validate the feasibility of applying end-to-end denoising diffusion models to the task of audio-driven video editing.Comment: 8 Pages, code and project page available here: https://danbigioi.github.io/DiffusionVideoEditing

arXiv.org e-Print Archive