Search CORE

18 research outputs found

Negotiating TESOL Discourses and EFL Teaching Contexts in China: Identities and Practices of International Graduates of a TESOL Program

Author: Ilieva Roumiana
Li Aojun
Li Wanjun
Publication venue: Scholarship@Western
Publication date: 14/12/2015
Field of study

This article reports on a study of the material effects of the discourses circulating in a TESOL program housed in a Canadian university on the professional identities and practices that international graduates of the program negotiate and develop in their local professional contexts in China. The principal researcher and two of the study participants discuss pedagogical values salient among program graduates and explore complexities accompanying professional identity negotiation. The article offers recommendations for TESOL programs in affording EFL teachers the possibility to construct hybrid professional identities and dwell comfortably in a “third space” as educational practitioners in a globalized world

Scholarship@Western

Western Libraries OJS

Omni-Dimensional Dynamic Convolution

Author: Li Chao
Yao Anbang
Zhou Aojun
Publication venue
Publication date: 16/09/2022
Field of study

Learning a single static convolutional kernel in each convolutional layer is the common training paradigm of modern Convolutional Neural Networks (CNNs). Instead, recent research in dynamic convolution shows that learning a linear combination of

n

convolutional kernels weighted with their input-dependent attentions can significantly improve the accuracy of light-weight CNNs, while maintaining efficient inference. However, we observe that existing works endow convolutional kernels with the dynamic property through one dimension (regarding the convolutional kernel number) of the kernel space, but the other three dimensions (regarding the spatial size, the input channel number and the output channel number for each convolutional kernel) are overlooked. Inspired by this, we present Omni-dimensional Dynamic Convolution (ODConv), a more generalized yet elegant dynamic convolution design, to advance this line of research. ODConv leverages a novel multi-dimensional attention mechanism with a parallel strategy to learn complementary attentions for convolutional kernels along all four dimensions of the kernel space at any convolutional layer. As a drop-in replacement of regular convolutions, ODConv can be plugged into many CNN architectures. Extensive experiments on the ImageNet and MS-COCO datasets show that ODConv brings solid accuracy boosts for various prevailing CNN backbones including both light-weight and large ones, e.g., 3.77%~5.71%|1.86%~3.72% absolute top-1 improvements to MobivleNetV2|ResNet family on the ImageNet dataset. Intriguingly, thanks to its improved feature learning ability, ODConv with even one single kernel can compete with or outperform existing dynamic convolution counterparts with multiple kernels, substantially reducing extra parameters. Furthermore, ODConv is also superior to other attention modules for modulating the output features or the convolutional weights.Comment: Spotlight paper at ICLR 2022. Code and models are available at https://github.com/OSVAI/ODCon

arXiv.org e-Print Archive

Objective Bayesian analysis for the generalized exponential distribution

Author: Li Aojun
Wang Min
Ye Keying
Publication venue
Publication date: 23/09/2023
Field of study

In this paper, we consider objective Bayesian inference of the generalized exponential distribution using the independence Jeffreys prior and validate the propriety of the posterior distribution under a family of structured priors. We propose an efficient sampling algorithm via the generalized ratio-of-uniforms method to draw samples for making posterior inference. We carry out simulation studies to assess the finite-sample performance of the proposed Bayesian approach. Finally, a real-data application is provided for illustrative purposes.Comment: 13 pages, 5 figures, 2 table

arXiv.org e-Print Archive

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

Author: Gao Peng
Han Jiaming
Hu Xiangfei
Li Hongsheng
Lu Pan
Qiao Yu
Yan Shilin
Zhang Renrui
Zhou Aojun
Publication venue
Publication date: 28/03/2023
Field of study

We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8 A100 GPUs. Specifically, we adopt a set of learnable adaption prompts, and prepend them to the input text tokens at higher transformer layers. Then, a zero-init attention mechanism with zero gating is proposed, which adaptively injects the new instructional cues into LLaMA, while effectively preserves its pre-trained knowledge. With efficient training, LLaMA-Adapter generates high-quality responses, comparable to Alpaca with fully fine-tuned 7B parameters. Furthermore, our approach can be simply extended to multi-modal input, e.g., images, for image-conditioned LLaMA, which achieves superior reasoning capacity on ScienceQA. We release our code at https://github.com/ZrrSkywalker/LLaMA-Adapter.Comment: Work in Progress. Code is available at https://github.com/ZrrSkywalker/LLaMA-Adapte

arXiv.org e-Print Archive

Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

Author: Jia Anya
Li Hongsheng
Lu Shaoqing
Lu Zimu
Luo Sichun
Qin Zipeng
Shi Weikang
Song Linqi
Wang Ke
Zhan Mingjie
Zhou Aojun
Publication venue
Publication date: 15/08/2023
Field of study

Recent progress in large language models (LLMs) like GPT-4 and PaLM-2 has brought significant advancements in addressing math reasoning problems. In particular, OpenAI's latest version of GPT-4, known as GPT-4 Code Interpreter, shows remarkable performance on challenging math datasets. In this paper, we explore the effect of code on enhancing LLMs' reasoning capability by introducing different constraints on the \textit{Code Usage Frequency} of GPT-4 Code Interpreter. We found that its success can be largely attributed to its powerful skills in generating and executing code, evaluating the output of code execution, and rectifying its solution when receiving unreasonable outputs. Based on this insight, we propose a novel and effective prompting method, explicit \uline{c}ode-based \uline{s}elf-\uline{v}erification~(CSV), to further boost the mathematical reasoning potential of GPT-4 Code Interpreter. This method employs a zero-shot prompt on GPT-4 Code Interpreter to encourage it to use code to self-verify its answers. In instances where the verification state registers as ``False'', the model shall automatically amend its solution, analogous to our approach of rectifying errors during a mathematics examination. Furthermore, we recognize that the states of the verification result indicate the confidence of a solution, which can improve the effectiveness of majority voting. With GPT-4 Code Interpreter and CSV, we achieve an impressive zero-shot accuracy on MATH dataset \textbf{(53.9\%

\to

84.3\%)}.Comment: Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verificatio

arXiv.org e-Print Archive

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model

Author: Gao Peng
Geng Shijie
Han Jiaming
He Conghui
Li Hongsheng
Lin Ziyi
Lu Pan
Qiao Yu
Yue Xiangyu
Zhang Renrui
Zhang Wei
Zhou Aojun
Publication venue
Publication date: 28/04/2023
Field of study

How to efficiently transform large language models (LLMs) into instruction followers is recently a popular research direction, while training LLM for multi-modal reasoning remains less explored. Although the recent LLaMA-Adapter demonstrates the potential to handle visual inputs with LLMs, it still cannot generalize well to open-ended visual instructions and lags behind GPT-4. In this paper, we present LLaMA-Adapter V2, a parameter-efficient visual instruction model. Specifically, we first augment LLaMA-Adapter by unlocking more learnable parameters (e.g., norm, bias and scale), which distribute the instruction-following ability across the entire LLaMA model besides adapters. Secondly, we propose an early fusion strategy to feed visual tokens only into the early LLM layers, contributing to better visual knowledge incorporation. Thirdly, a joint training paradigm of image-text pairs and instruction-following data is introduced by optimizing disjoint groups of learnable parameters. This strategy effectively alleviates the interference between the two tasks of image-text alignment and instruction following and achieves strong multi-modal reasoning with only a small-scale image-text and instruction dataset. During inference, we incorporate additional expert models (e.g. captioning/OCR systems) into LLaMA-Adapter to further enhance its image understanding capability without incurring training costs. Compared to the original LLaMA-Adapter, our LLaMA-Adapter V2 can perform open-ended multi-modal instructions by merely introducing 14M parameters over LLaMA. The newly designed framework also exhibits stronger language-only instruction-following capabilities and even excels in chat interactions. Our code and models are available at https://github.com/ZrrSkywalker/LLaMA-Adapter.Comment: Code and models are available at https://github.com/ZrrSkywalker/LLaMA-Adapte

arXiv.org e-Print Archive

JourneyDB: A Benchmark for Generative Image Understanding

Author: Dai Jifeng
Duan Haodong
Ge Yuying
Li Hao
Li Hongsheng
Pan Junting
Qiao Yu
Qin Zipeng
Sun Keqiang
Wang Limin
Wang Yi
Wu Xiaoshi
Zhang Renrui
Zhou Aojun
Publication venue
Publication date: 28/10/2023
Field of study

While recent advancements in vision-language models have had a transformative impact on multi-modal comprehension, the extent to which these models possess the ability to comprehend generated images remains uncertain. Synthetic images, in comparison to real data, encompass a higher level of diversity in terms of both content and style, thereby presenting significant challenges for the models to fully grasp. In light of this challenge, we introduce a comprehensive dataset, referred to as JourneyDB, that caters to the domain of generative images within the context of multi-modal visual understanding. Our meticulously curated dataset comprises 4 million distinct and high-quality generated images, each paired with the corresponding text prompts that were employed in their creation. Furthermore, we additionally introduce an external subset with results of another 22 text-to-image generative models, which makes JourneyDB a comprehensive benchmark for evaluating the comprehension of generated images. On our dataset, we have devised four benchmarks to assess the performance of generated image comprehension in relation to both content and style interpretation. These benchmarks encompass prompt inversion, style retrieval, image captioning, and visual question answering. Lastly, we evaluate the performance of state-of-the-art multi-modal models when applied to the JourneyDB dataset, providing a comprehensive analysis of their strengths and limitations in comprehending generated content. We anticipate that the proposed dataset and benchmarks will facilitate further research in the field of generative content understanding. The dataset is publicly available at https://journeydb.github.io.Comment: Accepted to the Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023

arXiv.org e-Print Archive

PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning

Author: Boticki Ivica
Boyd Stephen
Chen Tianqi
Courbariaux Matthieu
Diederik
Gupta Suyog
He Yihui
Li Hao
Lin Darryl
Liu Baoyuan
Liu Sijia
Ma Xiaolong
Ota Kaoru
Zhou Aojun
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/01/2020
Field of study

With the emergence of a spectrum of high-end mobile devices, many applications that formerly required desktop-level computation capability are being transferred to these devices. However, executing the inference of Deep Neural Networks (DNNs) is still challenging considering high computation and storage demands, specifically, if real-time performance with high accuracy is needed. Weight pruning of DNNs is proposed, but existing schemes represent two extremes in the design space: non-structured pruning is fine-grained, accurate, but not hardware friendly; structured pruning is coarse-grained, hardware-efficient, but with higher accuracy loss. In this paper, we introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space. With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency. In other words, our method achieves the best of both worlds, and is desirable across theory/algorithm, compiler, and hardware levels. The proposed PatDNN is an end-to-end framework to efficiently execute DNN on mobile devices with the help of a novel model compression technique (pattern-based pruning based on extended ADMM solution framework) and a set of thorough architecture-aware compiler- and code generation-based optimizations (filter kernel reordering, compressed weight storage, register load redundancy elimination, and parameter auto-tuning). Evaluation results demonstrate that PatDNN outperforms three state-of-the-art end-to-end DNN frameworks, TensorFlow Lite, TVM, and Alibaba Mobile Neural Network with speedup up to 44.5x, 11.4x, and 7.1x, respectively, with no accuracy compromise. Real-time inference of representative large-scale DNNs (e.g., VGG-16, ResNet-50) can be achieved using mobile devices.Comment: To be published in the Proceedings of Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 20

arXiv.org e-Print Archive

Crossref