Search CORE

49 research outputs found

DART: A Lightweight Quality-Suggestive Data-to-Text Annotation Tool

Author: Caplinger Jeriah
Chang Ernie
Demberg Vera
Marin Alex
Shen Xiaoyu
Publication venue
Publication date: 01/01/2020
Field of study

We present a lightweight annotation tool, the Data AnnotatoR Tool (DART), for the general task of labeling structured data with textual descriptions. The tool is implemented as an interactive application that reduces human efforts in annotating large quantities of structured data, e.g. in the format of a table or tree structure. By using a backend sequence-to-sequence model, our system iteratively analyzes the annotated labels in order to better sample unlabeled data. In a simulation experiment performed on annotating large quantities of structured data, DART has been shown to reduce the total number of annotations needed with active learning and automatically suggesting relevant labels.Comment: Accepted to COLING 2020 (selected as outstanding paper

arXiv.org e-Print Archive

Crossref

MPG.PuRe

FoleyGen: Visually-Guided Audio Generation

Author: Chandra Vikas
Chang Ernie
Lan Gael Le
Mei Xinhao
Nagaraja Varun
Ni Zhaoheng
Shi Yangyang
Publication venue
Publication date: 19/09/2023
Field of study

Recent advancements in audio generation have been spurred by the evolution of large-scale deep learning models and expansive datasets. However, the task of video-to-audio (V2A) generation continues to be a challenge, principally because of the intricate relationship between the high-dimensional visual and auditory data, and the challenges associated with temporal synchronization. In this study, we introduce FoleyGen, an open-domain V2A generation system built on a language modeling paradigm. FoleyGen leverages an off-the-shelf neural audio codec for bidirectional conversion between waveforms and discrete tokens. The generation of audio tokens is facilitated by a single Transformer model, which is conditioned on visual features extracted from a visual encoder. A prevalent problem in V2A generation is the misalignment of generated audio with the visible actions in the video. To address this, we explore three novel visual attention mechanisms. We further undertake an exhaustive evaluation of multiple visual encoders, each pretrained on either single-modal or multi-modal tasks. The experimental results on VGGSound dataset show that our proposed FoleyGen outperforms previous systems across all objective metrics and human evaluations

arXiv.org e-Print Archive

Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition

Author: Chandra Vikas
Chang Ernie
Iandola Forrest N.
Lai Liangzhen
Li Yang
Shangguan Yuan
Shi Yangyang
Publication venue
Publication date: 21/09/2023
Field of study

Transformer-based models excel in speech recognition. Existing efforts to optimize Transformer inference, typically for long-context applications, center on simplifying attention score calculations. However, streaming speech recognition models usually process a limited number of tokens each time, making attention score calculation less of a bottleneck. Instead, the bottleneck lies in the linear projection layers of multi-head attention and feedforward networks, constituting a substantial portion of the model size and contributing significantly to computation, memory, and power usage. To address this bottleneck, we propose folding attention, a technique targeting these linear layers, significantly reducing model size and improving memory and power efficiency. Experiments on on-device Transformer-based streaming speech recognition models show that folding attention reduces model size (and corresponding memory consumption) by up to 24% and power consumption by up to 23%, all without compromising model accuracy or computation overhead

arXiv.org e-Print Archive

Violencia económica patrimonial frente a la omisión de prestación de alimentos

Author: Enciso Chang Jackelyn Jesus
Orihuela Flores Ernie Anderson
Publication venue: 'Universidad Cesar Vallejo'
Publication date: 01/01/2021
Field of study

La investigación tiene por objetivo exponer como la violencia económica patrimonial incide en la omisión de prestación de alimentos, se hace uso de la metodología de tipo básica ya se busca el poder obtener nuevos conocimientos, de enfoque cualitativo, diseño fenomenológico ya que se estudia los hechos dentro de un contexto y de acuerdo a las experiencias de los participantes. Se aplico la técnica de la guía de entrevista y de acuerdo al resultado de las interrogantes poder obtener nuestros resultados, los cuales nos indican que la figura de violencia económica patrimonial es totalmente ajena al delito de omisión de prestación de alimentos ya que la primera por un lado depende de un contexto de subordinación y poder y la segunda es meramente un incumplimiento de una resolución judicial. Como conclusiones tenemos que, dentro del delito de la omisión de prestación de alimentos, el enfoque de género no tiene relevancia y tampoco incide en el delito como tal, la responsabilidad penal de la violencia económica patrimonial, aún está lejos de ser identificada con facilidad por los operadores de justicia ya que dentro de la ley N° 30364 solo se le conceptualiza de manera general a las diferentes formas de agresiones

Repositorio Institucional Universidad César Vallejo: Página de inicio

Not All Weights Are Created Equal: Enhancing Energy Efficiency in On-Device Streaming Speech Recognition

Author: Chandra Vikas
Chang Ernie
Lai Liangzhen
Li Yang
Shangguan Yuan
Shi Yangyang
Wang Yuhao
Zhao Changsheng
Publication venue
Publication date: 20/02/2024
Field of study

Power consumption plays an important role in on-device streaming speech recognition, as it has a direct impact on the user experience. This study delves into how weight parameters in speech recognition models influence the overall power consumption of these models. We discovered that the impact of weight parameters on power consumption varies, influenced by factors including how often they are invoked and their placement in memory. Armed with this insight, we developed design guidelines aimed at optimizing on-device speech recognition models. These guidelines focus on minimizing power use without substantially affecting accuracy. Our method, which employs targeted compression based on the varying sensitivities of weight parameters, demonstrates superior performance compared to state-of-the-art compression methods. It achieves a reduction in energy usage of up to 47% while maintaining similar model accuracy and improving the real-time factor

arXiv.org e-Print Archive

Stack-and-Delay: a new codebook pattern for music generation

Author: Chandra Vikas
Chang Ernie
Iandola Forrest
Kant David
Lan Gael Le
Nagaraja Varun
Ni Zhaoheng
Shi Yangyang
Publication venue
Publication date: 15/09/2023
Field of study

In language modeling based music generation, a generated waveform is represented by a sequence of hierarchical token stacks that can be decoded either in an auto-regressive manner or in parallel, depending on the codebook patterns. In particular, flattening the codebooks represents the highest quality decoding strategy, while being notoriously slow. To this end, we propose a novel stack-and-delay style of decoding strategy to improve upon the flat pattern decoding where generation speed is four times faster as opposed to vanilla flat decoding. This brings the inference time close to that of the delay decoding strategy, and allows for faster inference on GPU for small batch sizes. For the same inference efficiency budget as the delay pattern, we show that the proposed approach performs better in objective evaluations, almost closing the gap with the flat pattern in terms of quality. The results are corroborated by subjective evaluations which show that samples generated by the new model are slightly more often preferred to samples generated by the competing model given the same text prompts

arXiv.org e-Print Archive

Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin

Author: Chang Ernie
Lin Pin-Jie
Saeed Muhammed
Scholman Merel
Publication venue
Publication date: 01/09/2023
Field of study

Developing effective spoken language processing systems for low-resource languages poses several challenges due to the lack of parallel data and limited resources for fine-tuning models. In this work, we target on improving upon both text classification and translation of Nigerian Pidgin (Naija) by collecting a large-scale parallel English-Pidgin corpus and further propose a framework of cross-lingual adaptive training that includes both continual and task adaptive training so as to adapt a base pre-trained model to low-resource languages. Our studies show that English pre-trained language models serve as a stronger prior than multilingual language models on English-Pidgin tasks with up to 2.38 BLEU improvements; and demonstrate that augmenting orthographic data and using task adaptive training with back-translation can have a significant impact on model performance

Utrecht University Repository