Search CORE

38 research outputs found

Flatness-Aware Prompt Selection Improves Accuracy and Sample Efficiency

Author: Khashabi Daniel
Shen Lingfeng
Tan Weiting
Zheng Boyuan
Publication venue
Publication date: 22/10/2023
Field of study

With growing capabilities of large language models, prompting them has become the dominant way to access them. This has motivated the development of strategies for automatically selecting effective language prompts. In this paper, we introduce prompt flatness, a new metric to quantify the expected utility of a language prompt. This metric is inspired by flatness regularization in statistical learning that quantifies the robustness of the model towards its parameter perturbations. We provide theoretical foundations for this metric and its relationship with other prompt selection metrics, providing a comprehensive understanding of existing methods. Empirically, we show that combining prompt flatness with existing metrics improves both performance and sample efficiency. Our metric outperforms the previous prompt selection metrics with an average increase of 5% in accuracy and 10% in Pearson correlation across 6 classification benchmarks

arXiv.org e-Print Archive

Multilingual Coreference Resolution in Multiparty Dialogue

Author: Van Durme Benjamin
Xia Patrick
Yarmohammadi Mahsa
Zheng Boyuan
Publication venue
Publication date: 02/08/2022
Field of study

Existing multiparty dialogue datasets for coreference resolution are nascent, and many challenges are still unaddressed. We create a large-scale dataset, Multilingual Multiparty Coref (MMC), for this task based on TV transcripts. Due to the availability of gold-quality subtitles in multiple languages, we propose reusing the annotations to create silver coreference data in other languages (Chinese and Farsi) via annotation projection. On the gold (English) data, off-the-shelf models perform relatively poorly on MMC, suggesting that MMC has broader coverage of multiparty coreference than prior datasets. On the silver data, we find success both using it for data augmentation and training from scratch, which effectively simulates the zero-shot cross-lingual setting

arXiv.org e-Print Archive

Extraneousness-Aware Imitation Learning

Author: Chen Boyuan
Hu Kaizhe
Xu Huazhe
Yuan Zhecheng
Zheng Ray Chen
Publication venue
Publication date: 04/10/2022
Field of study

Visual imitation learning provides an effective framework to learn skills from demonstrations. However, the quality of the provided demonstrations usually significantly affects the ability of an agent to acquire desired skills. Therefore, the standard visual imitation learning assumes near-optimal demonstrations, which are expensive or sometimes prohibitive to collect. Previous works propose to learn from noisy demonstrations; however, the noise is usually assumed to follow a context-independent distribution such as a uniform or gaussian distribution. In this paper, we consider another crucial yet underexplored setting -- imitation learning with task-irrelevant yet locally consistent segments in the demonstrations (e.g., wiping sweat while cutting potatoes in a cooking tutorial). We argue that such noise is common in real world data and term them "extraneous" segments. To tackle this problem, we introduce Extraneousness-Aware Imitation Learning (EIL), a self-supervised approach that learns visuomotor policies from third-person demonstrations with extraneous subsequences. EIL learns action-conditioned observation embeddings in a self-supervised manner and retrieves task-relevant observations across visual demonstrations while excluding the extraneous ones. Experimental results show that EIL outperforms strong baselines and achieves comparable policies to those trained with perfect demonstration on both simulated and real-world robot control tasks. The project page can be found at https://sites.google.com/view/eil-website.Comment: 7 pages, 6 figure

arXiv.org e-Print Archive

MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Multi-GPU Platforms

Author: Barker Kevin
Ding Yufei
Feng Boyuan
Geng Tong
Li Ang
Wang Yuke
Wang Zheng
Publication venue
Publication date: 28/05/2023
Field of study

The increasing size of input graphs for graph neural networks (GNNs) highlights the demand for using multi-GPU platforms. However, existing multi-GPU GNN systems optimize the computation and communication individually based on the conventional practice of scaling dense DNNs. For irregularly sparse and fine-grained GNN workloads, such solutions miss the opportunity to jointly schedule/optimize the computation and communication operations for high-performance delivery. To this end, we propose MGG, a novel system design to accelerate full-graph GNNs on multi-GPU platforms. The core of MGG is its novel dynamic software pipeline to facilitate fine-grained computation-communication overlapping within a GPU kernel. Specifically, MGG introduces GNN-tailored pipeline construction and GPU-aware pipeline mapping to facilitate workload balancing and operation overlapping. MGG also incorporates an intelligent runtime design with analytical modeling and optimization heuristics to dynamically improve the execution performance. Extensive evaluation reveals that MGG outperforms state-of-the-art full-graph GNN systems across various settings: on average 4.41X, 4.81X, and 10.83X faster than DGL, MGG-UVM, and ROC, respectively

arXiv.org e-Print Archive

Mind2Web: Towards a Generalist Agent for the Web

Author: Chen Shijie
Deng Xiang
Gu Yu
Stevens Samuel
Su Yu
Sun Huan
Wang Boshi
Zheng Boyuan
Publication venue
Publication date: 09/06/2023
Field of study

We introduce Mind2Web, the first dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. Existing datasets for web agents either use simulated websites or only cover a limited set of websites and tasks, thus not suitable for generalist web agents. With over 2,000 open-ended tasks collected from 137 websites spanning 31 domains and crowdsourced action sequences for the tasks, Mind2Web provides three necessary ingredients for building generalist web agents: 1) diverse domains, websites, and tasks, 2) use of real-world websites instead of simulated and simplified ones, and 3) a broad spectrum of user interaction patterns. Based on Mind2Web, we conduct an initial exploration of using large language models (LLMs) for building generalist web agents. While the raw HTML of real-world websites are often too large to be fed to LLMs, we show that first filtering it with a small LM significantly improves the effectiveness and efficiency of LLMs. Our solution demonstrates a decent level of performance, even on websites or entire domains the model has never seen before, but there is still a substantial room to improve towards truly generalizable agents. We open-source our dataset, model implementation, and trained models (https://osu-nlp-group.github.io/Mind2Web) to facilitate further research on building a generalist agent for the web.Comment: website: https://osu-nlp-group.github.io/Mind2We

arXiv.org e-Print Archive

Analysis and Fault-Tolerant Control for Dual-Three-Phase PMSM Based on Virtual Healthy Model

Author: Li Bingjun
Tang Mi
Xu Yongxiang
Zanchetta Pericle
Zheng Boyuan
Zou Jibin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2022
Field of study

Dual-three-phase permanent magnet synchronous machines (DTP-PMSMs) are famous for their fault-tolerant capability. However, the complex modeling, high copper loss, and torque ripple under postfault operation limit their further application. In this article, a fault-tolerant control (FTC) strategy is developed for DTP-PMSMs under the open-phase fault (OPF) with straightforward modeling and smooth output torque. The virtual healthy DTP-PMSM model, where the coordinate transformation, the modulation strategy, and the controller structure remain unchanged under OPF, is adopted in the proposed FTC scheme. And the current references are derived in sinusoidal waves with minimum copper loss. The inaccurate transmission of control signals under OPF is also focused on. Comprehensive theoretical analysis shows the relationship between the controller output voltage and the actual stator voltage should be considered in the proposed FTC strategy; otherwise, distortion in torque and current will be introduced. The voltage compensation is utilized to compensate for the voltage difference and ensure the smooth torque output. Besides, a quasi proportional resonance controller is designed to further suppress the residual torque ripple. The proposed strategy will not induce complex implementation and heavy computation burden. The simulation and experimental results prove the analysis and the effectiveness of the proposed strategy

Repository@Nottingham

Observation of giant nonreciprocal charge transport from quantum Hall edge states of single surface in topological insulator

Author: Chen Wei
Dai Zheng
Guo Fengyi
Li Chunfeng
Song Fengqi
Wang Rui
Wang Xuefeng
Wei Boyuan
Ying Zhe
Zhang Shuai
Publication venue
Publication date: 17/07/2023
Field of study

Symmetry breaking in quantum materials is of great importance and leads to novel nonreciprocal charge transport. The topological insulator system provides a unique platform to study nonreciprocal charge transport due to the exotic surface state. But it is typically small in magnitude because the contributions from the top and bottom surface of topological insulator are usually opposite. Here, we report the observation of giant nonreciprocal charge transport mediated by the quantum Hall state in intrinsic topological insulator Sn-Bi1.1Sb0.9Te2S devices, which is attributed to the coexistence of quantum Hall states and Dirac surface states. A giant nonreciprocal coefficient of up to 2.26*10^5 A^-1 is found, because only a single surface of topological insulator contributes to the nonreciprocal charge transport. Our work not only reveals the intrinsic properties of nonreciprocal charge transport in topological insulators, but also paves the way for future electronic devices

arXiv.org e-Print Archive

Large Exchange Bias Effect and Coverage-Dependent Interfacial Coupling in CrI3/MnBi2Te4 van der Waals Heterostructures

Author: Chen Bo
Dai Zheng
Fei Fucong
Guo Fengyi
Li Chunfeng
Pan Danfeng
Song Fengqi
Wang Xuefeng
Wei Boyuan
Wu Di
Ying Zhe
Zhang Haijun
Zhang Shuai
Publication venue
Publication date: 13/12/2022
Field of study

Igniting interface magnetic ordering of magnetic topological insulators by building a van der Waals heterostructure can help to reveal novel quantum states and design functional devices. Here, we observe an interesting exchange bias effect, indicating successful interfacial magnetic coupling, in CrI3/MnBi2Te4 ferromagnetic insulator/antiferromagnetic topological insulator (FMI/AFM-TI) heterostructure devices. The devices originally exhibit a negative exchange bias field, which decays with increasing temperature and is unaffected by the back-gate voltage. When we change the device configuration to be half-covered by CrI3, the exchange bias becomes positive with a very large exchange bias field exceeding 300 mT. Such sensitive manipulation is explained by the competition between the FM and AFM coupling at the interface of CrI3 and MnBi2Te4, pointing to coverage-dependent interfacial magnetic interactions. Our work will facilitate the development of topological and antiferromagnetic devices

arXiv.org e-Print Archive