Search CORE

6,647 research outputs found

Automatic Extraction of Commonsense LocatedNear Knowledge

Author: Lin Bill Yuchen
Xu Frank F.
Zhu Kenny Q.
Publication venue
Publication date: 01/01/2018
Field of study

LocatedNear relation is a kind of commonsense knowledge describing two physical objects that are typically found near each other in real life. In this paper, we study how to automatically extract such relationship through a sentence-level relation classifier and aggregating the scores of entity pairs from a large corpus. Also, we release two benchmark datasets for evaluation and future research.Comment: Accepted by ACL 2018. A preliminary version is presented on AKBC@NIPS'1

arXiv.org e-Print Archive

Crossref

Incorporating External Knowledge through Pre-training for Natural Language to Code Generation

Author: Jiang Zhengbao
Neubig Graham
Vasilescu Bogdan
Xu Frank F.
Yin Pengcheng
Publication venue
Publication date: 01/01/2020
Field of study

Open-domain code generation aims to generate code in a general-purpose programming language (such as Python) from natural language (NL) intents. Motivated by the intuition that developers usually retrieve resources on the web when writing code, we explore the effectiveness of incorporating two varieties of external knowledge into NL-to-code generation: automatically mined NL-code pairs from the online programming QA forum StackOverflow and programming language API documentation. Our evaluations show that combining the two sources with data augmentation and retrieval-based data re-sampling improves the current state-of-the-art by up to 2.2% absolute BLEU score on the code generation testbed CoNaLa. The code and resources are available at https://github.com/neulab/external-knowledge-codegen.Comment: Accepted by ACL 202

arXiv.org e-Print Archive

Crossref

Hierarchical Prompting Assists Large Language Model on Web Navigation

Author: Lo Robert
Sridhar Abishek
Xu Frank F.
Zhou Shuyan
Zhu Hao
Publication venue
Publication date: 29/10/2023
Field of study

Large language models (LLMs) struggle on processing complicated observations in interactive decision making tasks. To alleviate this issue, we propose a simple hierarchical prompting approach. Diverging from previous prompting approaches that always put the full observation (e.g. a web page) to the prompt, we propose to first construct an action-aware observation which is more condensed and relevant with a dedicated SUMMARIZER prompt. The ACTOR prompt then predicts the next action based on the summarized observation. While our method has broad applicability, we particularly demonstrate its efficacy in the complex domain of web navigation where a full observation often contains redundant and irrelevant information. Our approach outperforms the previous state-of-the-art prompting mechanics by 6.2% on task success rate, demonstrating its potential on interactive decision making tasks with long observation traces.Comment: EMNLP 2023 Findings; Natural Language Reasoning and Structured Explanations Workshop at ACL 202

arXiv.org e-Print Archive

Elimination of the numerical Cerenkov instability for spectral EM-PIC codes

Author: Decyk Viktor K.
Fiuza F.
Fonseca Ricardo A.
Lu Wei
Mori Warren B.
Silva Luis O.
Tsung Frank S.
Vieira Jorge
Xu Xinlu
Yu Peicheng
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

When using an electromagnetic particle-in-cell (EM-PIC) code to simulate a relativistically drifting plasma, a violent numerical instability known as the numerical Cerenkov instability (NCI) occurs. The NCI is due to the unphysical coupling of electromagnetic waves on a grid to wave-particle resonances, including aliased resonances, i.e.,

\omega + 2\pi\mu/\Delta t=(k_1+ 2\pi\nu_1/\Delta x_1)v_0

, where

\mu

and

\nu_1

refer to the time and space aliases and the plasma is drifting relativistically at velocity

v_0

in the

\hat{1}

-direction. Recent studies have shown that an EM-PIC code which uses a spectral field solver and a low pass filter can eliminate the fastest growing modes of the NCI. Based on these studies a new spectral PIC code for studying laser wakefield acceleration (LWFA) in the Lorentz boosted frame was developed. However, we show that for parameters of relevance for LWFA simulations in the boosted frame, a relativistically drifting plasma is susceptible to a host of additional unstable modes with lower growth rates, and that these modes appear when the fastest growing unstable modes are filtered out. We show that these modes are most easily identified as the coupling between modes which are purely transverse (EM) and purely longitudinal (Langmuir) in the rest frame of the plasma for specific time and space aliases. We rewrite the dispersion relation of the drifting plasma for a general field solver and obtain analytic expressions for the location and growth rate for each unstable mode, i.e, for each time and space aliased resonances. We show for the spectral solver that when the fastest growing mode is eliminated a new mode at the fundamental resonance (

\mu=\nu_1=0

) can be seen. (Please check the whole abstract in the paper).Comment: 36 pages, 12 figure

arXiv.org e-Print Archive

Repositório Institucional do ISCTE-IUL

HiGitClass: Keyword-Driven Hierarchical Classification of GitHub Repositories

Author: Han Jiawei
Li Qi
Li Sha
Meng Yu
Wang Xuan
Xu Frank F.
Zhang Yu
Publication venue
Publication date: 13/11/2021
Field of study

GitHub has become an important platform for code sharing and scientific exchange. With the massive number of repositories available, there is a pressing need for topic-based search. Even though the topic label functionality has been introduced, the majority of GitHub repositories do not have any labels, impeding the utility of search and topic-based analysis. This work targets the automatic repository classification problem as keyword-driven hierarchical classification. Specifically, users only need to provide a label hierarchy with keywords to supply as supervision. This setting is flexible, adaptive to the users' needs, accounts for the different granularity of topic labels and requires minimal human effort. We identify three key challenges of this problem, namely (1) the presence of multi-modal signals; (2) supervision scarcity and bias; (3) supervision format mismatch. In recognition of these challenges, we propose the HiGitClass framework, comprising of three modules: heterogeneous information network embedding; keyword enrichment; topic modeling and pseudo document generation. Experimental results on two GitHub repository collections confirm that HiGitClass is superior to existing weakly-supervised and dataless hierarchical classification methods, especially in its ability to integrate both structured and unstructured data for repository classification.Comment: 10 pages; Accepted to ICDM 2019; Some typos fixe

arXiv.org e-Print Archive