Search CORE

494 research outputs found

TripleNet: A Low Computing Power Platform of Low-Parameter Network

Author: Chiang Jen-Shiun
Jian Jia-Hao
Ju Rui-Yang
Lin Ting-Yu
Publication venue
Publication date: 02/04/2022
Field of study

With the excellent performance of deep learning technology in the field of computer vision, convolutional neural network (CNN) architecture has become the main backbone of computer vision task technology. With the widespread use of mobile devices, neural network models based on platforms with low computing power are gradually being paid attention. This paper proposes a lightweight convolutional neural network model, TripleNet, an improved convolutional neural network based on HarDNet and ThreshNet, inheriting the advantages of small memory usage and low power consumption of the mentioned two models. TripleNet uses three different convolutional layers combined into a new model architecture, which has less number of parameters than that of HarDNet and ThreshNet. CIFAR-10 and SVHN datasets were used for image classification by employing HarDNet, ThreshNet, and our proposed TripleNet for verification. Experimental results show that, compared with HarDNet, TripleNet's parameters are reduced by 66% and its accuracy rate is increased by 18%; compared with ThreshNet, TripleNet's parameters are reduced by 37% and its accuracy rate is increased by 5%.Comment: 4 pages, 2 figure

arXiv.org e-Print Archive

Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications

Author: Chang Jen-Hao Rick
Ranjan Anurag
Tuzel Oncel
Vemulapalli Raviteja
Yang Karren D.
Publication venue
Publication date: 29/11/2023
Field of study

We consider the task of animating 3D facial geometry from speech signal. Existing works are primarily deterministic, focusing on learning a one-to-one mapping from speech signal to 3D face meshes on small datasets with limited speakers. While these models can achieve high-quality lip articulation for speakers in the training set, they are unable to capture the full and diverse distribution of 3D facial motions that accompany speech in the real world. Importantly, the relationship between speech and facial motion is one-to-many, containing both inter-speaker and intra-speaker variations and necessitating a probabilistic approach. In this paper, we identify and address key challenges that have so far limited the development of probabilistic models: lack of datasets and metrics that are suitable for training and evaluating them, as well as the difficulty of designing a model that generates diverse results while remaining faithful to a strong conditioning signal as speech. We first propose large-scale benchmark datasets and metrics suitable for probabilistic modeling. Then, we demonstrate a probabilistic model that achieves both diversity and fidelity to speech, outperforming other methods across the proposed benchmarks. Finally, we showcase useful applications of probabilistic models trained on these large-scale datasets: we can generate diverse speech-driven 3D facial motion that matches unseen speaker styles extracted from reference clips; and our synthetic meshes can be used to improve the performance of downstream audio-visual models

arXiv.org e-Print Archive

Prediction of protein secondary structures with a novel kernel density estimation based classifier

Author: Chang Darby Tien-Hao
Chen Chien-Yu
Hung Hao-Geng
Ou Yu-Yen
Oyang Yen-Jen
Yang Meng-Han
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

A social networking approach for mobile innovation in emerging countries

Author: Yang Jen-Hao, S.M. Massachusetts Institute of Technology
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2011
Field of study

Thesis (S.M. in Engineering and Management)--Massachusetts Institute of Technology, Engineering Systems Division, System Design and Management Program, February 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 121-122).Addressing the global challenges and the next billion mobile subscribers, the MIT NextLab course engages students, industry partners, entrepreneurs and the next billion mobile subscribers to develop innovative mobile services that improve the quality of life in the emerging countries. In three years, NextLab teams developed and deployed 29 projects in 14 counties, and five teams founded their own ventures after perceiving the strong demand from the vast mobile users in the developing world. However, the size and the amount of NextLab projects are limited by the schedule and the location of an academic course. The focus of this thesis is to research and develop a social networking platform that replicates the success of the NextLab course to reach out to more participants around the world. In this document, I utilized the social analysis framework to identify social processes among stakeholders in a general NextLab project, specify the possible social failures and research the possible solutions. Besides, I also reviewed the NextLab projects in 2008 and 2009 and developed the NextLab Project Development Process (NLPDP) that highlights the 12 critical stages of a NextLab project. Finally, I proposed the NextLab 2.0 Community that is integrates with the social networking solutions and the NextLab Project Development Process. The case study of the mobile logistics (m-Logistics) project is used to demonstrate how the proposed solution facilitates the collaboration and communication for a large and cross-country mobile innovation project. A number of recommendations were also discussed for further research.by Jen-Hao Yang.S.M.in Engineering and Managemen

DSpace@MIT

Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis

Author: Chang Jen-Hao Rick
Hu Ting-Yao
Koppula Hema Swetha
Tuzel Oncel
Yang Karren
Publication venue
Publication date: 26/03/2023
Field of study

Adapting generic speech recognition models to specific individuals is a challenging problem due to the scarcity of personalized data. Recent works have proposed boosting the amount of training data using personalized text-to-speech synthesis. Here, we ask two fundamental questions about this strategy: when is synthetic data effective for personalization, and why is it effective in those cases? To address the first question, we adapt a state-of-the-art automatic speech recognition (ASR) model to target speakers from four benchmark datasets representative of different speaker types. We show that ASR personalization with synthetic data is effective in all cases, but particularly when (i) the target speaker is underrepresented in the global data, and (ii) the capacity of the global model is limited. To address the second question of why personalized synthetic data is effective, we use controllable speech synthesis to generate speech with varied styles and content. Surprisingly, we find that the text content of the synthetic data, rather than style, is important for speaker adaptation. These results lead us to propose a data selection strategy for ASR personalization based on speech content.Comment: ICASSP 202

arXiv.org e-Print Archive