Search CORE

27 research outputs found

Find a Reasonable Ending for Stories: Does Logic Relation Help the Story Cloze Test?

Author: Fu Zhenxin
Shang Mingyue
Tang Bo
Yan Rui
Yin Hongzhi
Zhao Dongyan
Publication venue
Publication date: 13/12/2018
Field of study

Natural language understanding is a challenging problem that covers a wide range of tasks. While previous methods generally train each task separately, we consider combining the cross-task features to enhance the task performance. In this paper, we incorporate the logic information with the help of the Natural Language Inference (NLI) task to the Story Cloze Test (SCT). Previous work on SCT considered various semantic information, such as sentiment and topic, but lack the logic information between sentences which is an essential element of stories. Thus we propose to extract the logic information during the course of the story to improve the understanding of the whole story. The logic information is modeled with the help of the NLI task. Experimental results prove the strength of the logic information.Comment: Student Abstract in AAAI-201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

UQ eSpace (University of Queensland)

Reformulating Sequential Recommendation: Learning Dynamic User Interest with Content-enriched Language Modeling

Author: Cheng Mingyue
Jiang Junzhe
Liu Qi
Qu Shang
Publication venue
Publication date: 19/09/2023
Field of study

Recommender systems are essential for online applications, and sequential recommendation has enjoyed significant prevalence due to its expressive ability to capture dynamic user interests. However, previous sequential modeling methods still have limitations in capturing contextual information. The primary reason for this issue is that language models often lack an understanding of domain-specific knowledge and item-related textual content. To address this issue, we adopt a new sequential recommendation paradigm and propose LANCER, which leverages the semantic understanding capabilities of pre-trained language models to generate personalized recommendations. Our approach bridges the gap between language models and recommender systems, resulting in more human-like recommendations. We demonstrate the effectiveness of our approach through experiments on several benchmark datasets, showing promising results and providing valuable insights into the influence of our model on sequential recommendation tasks. Furthermore, our experimental codes are publicly available

arXiv.org e-Print Archive

Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM

Author: Jain Siddhartha
Ma Xiaofei
Ramanathan Murali Krishna
Ray Baishakhi
Ryan Gabriel
Shang Mingyue
Wang Shiqi
Publication venue
Publication date: 02/04/2024
Field of study

Testing plays a pivotal role in ensuring software quality, yet conventional Search Based Software Testing (SBST) methods often struggle with complex software units, achieving suboptimal test coverage. Recent works using large language models (LLMs) for test generation have focused on improving generation quality through optimizing the test generation context and correcting errors in model outputs, but use fixed prompting strategies that prompt the model to generate tests without additional guidance. As a result LLM-generated testsuites still suffer from low coverage. In this paper, we present SymPrompt, a code-aware prompting strategy for LLMs in test generation. SymPrompt's approach is based on recent work that demonstrates LLMs can solve more complex logical problems when prompted to reason about the problem in a multi-step fashion. We apply this methodology to test generation by deconstructing the testsuite generation process into a multi-stage sequence, each of which is driven by a specific prompt aligned with the execution paths of the method under test, and exposing relevant type and dependency focal context to the model. Our approach enables pretrained LLMs to generate more complete test cases without any additional training. We implement SymPrompt using the TreeSitter parsing framework and evaluate on a benchmark challenging methods from open source Python projects. SymPrompt enhances correct test generations by a factor of 5 and bolsters relative coverage by 26% for CodeGen2. Notably, when applied to GPT-4, SymPrompt improves coverage by over 2x compared to baseline prompting strategies

arXiv.org e-Print Archive

Token Alignment via Character Matching for Subword Completion

Author: Athiwaratkun Ben
Gonugondla Sujan Kumar
Gouda Sanjay Krishna
Kwiatowski Rob
Nallapati Ramesh
Shang Mingyue
Tian Yuchen
Wang Shiqi
Wang Zijian
Xiang Bing
Publication venue
Publication date: 13/03/2024
Field of study

Generative models, widely utilized in various applications, can often struggle with prompts corresponding to partial tokens. This struggle stems from tokenization, where partial tokens fall out of distribution during inference, leading to incorrect or nonsensical outputs. This paper examines a technique to alleviate the tokenization artifact on text completion in generative models, maintaining performance even in regular non-subword cases. The method, termed token alignment, involves backtracking to the last complete tokens and ensuring the model's generation aligns with the prompt. This approach showcases marked improvement across many partial token scenarios, including nuanced cases like space-prefix and partial indentation, with only a minor time increase. The technique and analysis detailed in this paper contribute to the continuous advancement of generative models in handling partial inputs, bearing relevance for applications like code completion and text autocompletion

arXiv.org e-Print Archive

Plant biomass allocation and driving factors of grassland revegetation in a Qinghai-Tibetan Plateau chronosequence

Author: Dong Shikui
Fry Ellen L.
Gao Xiaoxia
Li Shuai
Li Yu
Liu Shiliang
Shang Zhanhuan
Shen Hao
Wu Shengnan
Xiao Jiannan
Xu Yudan
Yang Mingyue
Yeomans Jane C.
Zhang Jing
Zhi Yangliu
Publication venue
Publication date: 28/02/2021
Field of study

Biomass allocation is a key factor in understanding how ecosystems respond to changing environmental conditions. The role of soil chemistry in the above- and belowground plant biomass allocation in restoring grassland is still incompletely characterized. Consequently, it has led to two competing hypotheses for biomass allocation: optimal partitioning, where the plants allocate biomass preferentially to optimize resource use; and the isometric hypothesis, which postulates that biomass allocation between roots and shoots is fixed. Here we tested these hypotheses over a chronosequence of alpine grasslandsion undergoing restoration in the Qinghai-Tibetan Plateau, these range from severely degraded to those with 18 years of revegetation with an intact grassland (as a reference). A high proportion of biomass was allocated to the roots in the revegetated grasslands, and more biomass to shoots in the degraded and intact grasslands. The grasslands gradually decreased their root to shoot ratio as revegetation continued, with the lowest value in year 18 of revegetation. Our results showed that aboveground biomass (AGB) was increased by available phosphorus (P), soil moisture, and negatively related to bulk density, while belowground biomass (BGB) was positively impacted by total P and negatively by nitrate nitrogen (N). The trade-off between them was positively associated with available P and nitrate-N, and soil nutrient availability is more linked to increased AGB relative to BGB. Our study indicates that biomass allocation is highly variable during the revegetation period from degraded grassland, and is linked with soil properties, thus supporting the optimal partitioning hypothesis.</p

Crossref

Edge Hill University Research Information Repository

An integrated chromatin accessibility and transcriptome landscape of human pre-implantation embryos

Author: Chen Fang
Gong Fei
Hu Liang
Leng Lizhi
Li Ge
Liu Chuanyu
Liu Longqi
Lu Changfu
Lu Guangxiu
Shang Zhouchun
Wang Jian
Wang Mingyue
Wei Xiaoyu
Wu Liang
Yang Huanming
Yuan Yue
Zhang Shuoping
Zhao Lei
Zhu Shida
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Early human embryonic development involves extensive changes in chromatin structure and transcriptional activity. Here the authors present LiCAT-seq, a method enabling simultaneous profiling of chromatin accessibility and gene expression with ultra-low input of cells and map chromatin accessibility and transcriptome landscapes for human pre-implantation embryos

Directory of Open Access Journals

Copenhagen University Research Information System

An ATAC-seq atlas of chromatin accessibility in mouse tissues

Author: Chen Ao
Chen Fang
Cheng Mengnan
Dai Xi
Feng Taiqing
Li Jiguang
Liu Chuanyu
Liu Longqi
Peters Brock A.
Shang Zhouchun
Wang Mingyue
Wei Xiaoyu
Wu Liang
Xia Jun
Xu Jiangshan
Yuan Yue
Zhang Pengfan
Zhang Wenwei
Zhang Xiuqing
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Copenhagen University Research Information System

Deconvolution of single-cell multi-omics layers reveals regulatory heterogeneity

Author: Chen Dongsheng
Chen Fang
Chen Xiaowei
Cheng Mengnan
Dong Guoyi
Eils Roland
Fink J. Lynn
Gao Zhengliang
Herrmann Carl
Hou Yong
Leng Lizhi
Li Guibo
Li Rui
Lin Ge
Lin Xinxin
Liu Chuanyu
Liu Longqi
Liu Shiping
Liu Xin
Liu Yang
Lu Haorong
Quintero Andrés
Shang Zhouchun
Wang Hongru
Wang Mingyue
Wang Qi
Wang Quanlei
Wei Xiaoyu
Wu Liang
Xu Jiangshan
Xu Liqin
Xu Xun
Yang Huanming
Ye Yunming
Yuan Yue
Zhou Qing
Zhu Shida
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Heterogeneity in gene expression and epigenetic states exists across individual cells. Here, the authors develop scCAT-seq, a technique for simultaneously performing ATAC-seq and RNA-seq within the same single cell

Directory of Open Access Journals

Copenhagen University Research Information System