Search CORE

177 research outputs found

Enhancing Phrase Representation by Information Bottleneck Guided Text Diffusion Process for Keyphrase Extraction

Author: Luo Yuanzhen
Zhou Feng
Zhou Qingyu
Publication venue
Publication date: 16/08/2023
Field of study

Keyphrase extraction (KPE) is an important task in Natural Language Processing for many scenarios, which aims to extract keyphrases that are present in a given document. Many existing supervised methods treat KPE as sequential labeling, span-level classification, or generative tasks. However, these methods lack the ability to utilize keyphrase information, which may result in biased results. In this study, we propose Diff-KPE, which leverages the supervised Variational Information Bottleneck (VIB) to guide the text diffusion process for generating enhanced keyphrase representations. Diff-KPE first generates the desired keyphrase embeddings conditioned on the entire document and then injects the generated keyphrase embeddings into each phrase representation. A ranking network and VIB are then optimized together with rank loss and classification loss, respectively. This design of Diff-KPE allows us to rank each candidate phrase by utilizing both the information of keyphrases and the document. Experiments show that Diff-KPE outperforms existing KPE methods on a large open domain keyphrase extraction benchmark, OpenKP, and a scientific domain dataset, KP20K.Comment: 10 pages, 2 figure

arXiv.org e-Print Archive

Environmental impact of the tourism industry in China: analyses based on multiple environmental factors using novel Quantile Autoregressive Distributed Lag model

Author: Aziz Noshaba
Jamal Abdul
Luo Yuting
Zhang Qingyu
Zhu Shengdong
Publication venue: Taylor and Francis Group and Juraj Dobrila University of Pula, Faculty of economics and tourism Dr. Mijo Mirković
Publication date: 01/01/2022
Field of study

This study examines the impact of tourism on China’s environmental quality under the framework of the Environment Kuznets Curve. In this study, tourism is measured by the number of tourist arrival and environmental pollution is measured by three proxies: carbon emissions, atmospheric particulate matter, and greenhouse gases. The study additionally controls trade openness effects using annual data from 1995 to 2018. Based on the asymmetric behavior of environmental variables, the study applies the Quantile Autoregressive Distributed Lag model that helps to integrate both dynamic trends and non-linearity. The findings confirmed the validity of Environment Kuznets in the long run and unveiled that tourist arrivals reduce carbon emissions, atmospheric particulate matter, and greenhouse gases in the long run, but in short-run dynamics, tourist arrivals only reduce carbon emissions. Similarly, trade openness increases carbon emissions, atmospheric particulate matter, and greenhouse gases at initial quantiles in the long run. In contrast, in the case of the short run, trade openness reduces atmospheric particulate matter and greenhouse gases. These results imply that the emissions mitigating (contributing) effects of tourism and trade varied across lower and higher quantiles. In conclusion, the findings reveal that the government should take effective measures to implement appropriate strategies required to sustain tourism and trade in China

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Knowledge Graph Reasoning over Entities and Numerical Values

Author: Bai Jiaxin
Li Zheng
Luo Chen
Song Yangqiu
Yin Bing
Yin Qingyu
Publication venue
Publication date: 02/06/2023
Field of study

A complex logic query in a knowledge graph refers to a query expressed in logic form that conveys a complex meaning, such as where did the Canadian Turing award winner graduate from? Knowledge graph reasoning-based applications, such as dialogue systems and interactive search engines, rely on the ability to answer complex logic queries as a fundamental task. In most knowledge graphs, edges are typically used to either describe the relationships between entities or their associated attribute values. An attribute value can be in categorical or numerical format, such as dates, years, sizes, etc. However, existing complex query answering (CQA) methods simply treat numerical values in the same way as they treat entities. This can lead to difficulties in answering certain queries, such as which Australian Pulitzer award winner is born before 1927, and which drug is a pain reliever and has fewer side effects than Paracetamol. In this work, inspired by the recent advances in numerical encoding and knowledge graph reasoning, we propose numerical complex query answering. In this task, we introduce new numerical variables and operations to describe queries involving numerical attribute values. To address the difference between entities and numerical values, we also propose the framework of Number Reasoning Network (NRN) for alternatively encoding entities and numerical values into separate encoding structures. During the numerical encoding process, NRN employs a parameterized density function to encode the distribution of numerical values. During the entity encoding process, NRN uses established query encoding methods for the original CQA problem. Experimental results show that NRN consistently improves various query encoding methods on three different knowledge graphs and achieves state-of-the-art results

arXiv.org e-Print Archive

Three-dimensional regulation of transcription

Author: Fei Wang
Jun Cao
Qianlan Xu
Qingyu Cheng
Xiaoyuan Song
Yan Wu
Yan Zhang
Zhengyu Luo
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

Springer - Publisher Connector

IterAlign: Iterative Constitutional Alignment of Large Language Models

Author: Chen Xiusi
Li Ruirui
Li Zheng
Luo Chen
Nag Sreyashi
Wang Wei
Wen Hongzhi
Yin Qingyu
Publication venue
Publication date: 27/03/2024
Field of study

With the rapid development of large language models (LLMs), aligning LLMs with human values and societal norms to ensure their reliability and safety has become crucial. Reinforcement learning with human feedback (RLHF) and Constitutional AI (CAI) have been proposed for LLM alignment. However, these methods require either heavy human annotations or explicitly pre-defined constitutions, which are labor-intensive and resource-consuming. To overcome these drawbacks, we study constitution-based LLM alignment and propose a data-driven constitution discovery and self-alignment framework called IterAlign. IterAlign leverages red teaming to unveil the weaknesses of an LLM and automatically discovers new constitutions using a stronger LLM. These constitutions are then used to guide self-correction of the base LLM. Such a constitution discovery pipeline can be run iteratively and automatically to discover new constitutions that specifically target the alignment gaps in the current LLM. Empirical results on several safety benchmark datasets and multiple base LLMs show that IterAlign successfully improves truthfulness, helpfulness, harmlessness and honesty, improving the LLM alignment by up to

13.5\%

in harmlessness.Comment: NAACL 202

arXiv.org e-Print Archive

Short Text Pre-training with Extended Token Classification for E-commerce Query Understanding

Author: Cao Tianyu
Goutam Rahul
Jiang Haoming
Li Zheng
Luo Chen
Tang Xianfeng
Yin Bing
Yin Qingyu
Zhang Danqing
Publication venue
Publication date: 08/10/2022
Field of study

E-commerce query understanding is the process of inferring the shopping intent of customers by extracting semantic meaning from their search queries. The recent progress of pre-trained masked language models (MLM) in natural language processing is extremely attractive for developing effective query understanding models. Specifically, MLM learns contextual text embedding via recovering the masked tokens in the sentences. Such a pre-training process relies on the sufficient contextual information. It is, however, less effective for search queries, which are usually short text. When applying masking to short search queries, most contextual information is lost and the intent of the search queries may be changed. To mitigate the above issues for MLM pre-training on search queries, we propose a novel pre-training task specifically designed for short text, called Extended Token Classification (ETC). Instead of masking the input text, our approach extends the input by inserting tokens via a generator network, and trains a discriminator to identify which tokens are inserted in the extended input. We conduct experiments in an E-commerce store to demonstrate the effectiveness of ETC

arXiv.org e-Print Archive

An immunization scheme for ransomware

Author: Luo Chenke
Meng Qingyu
Naik Nitin
Song Jingping
Xu Jian
Publication venue: 'Computers, Materials and Continua (Tech Science Press)'
Publication date: 10/06/2020
Field of study

In recent years, as the popularity of anonymous currencies such as Bitcoin has made the tracking of ransomware attackers more difficult, the amount of ransomware attacks against personal computers and enterprise production servers is increasing rapidly. The ransomware has a wide range of influence and spreads all over the world. It is affecting many industries including internet, education, medical care, traditional industry, etc. This paper uses the idea of virus immunity to design an immunization solution for ransomware viruses to solve the problems of traditional ransomware defense methods (such as anti-virus software, firewalls, etc.), which cannot meet the requirements of rapid detection and immediate prevention of new outbreaks attacks. Our scheme includes two parts: server and client. The server provides an immune configuration file and configuration file management functions, including a configuration file module, a cryptography algorithm module, and a display module. The client obtains the immunization configuration file from server in real time, and performs the corresponding operations according to the configuration file to make the computer have an immune function for a specific ransomware, including an update module, a configuration file module, a cryptography algorithm module, a control module, and a log module. This scheme controls mutexes, services, files and registries respectively, to destroy the triggering conditions of the virus and finally achieve the purpose of immunizing a computer from a specific ransomware

Aston Publications Explorer

High Thermoelectric Performance in Supersaturated Solid Solutions and Nanostructured nâ Type PbTeâ GeTe

Author: Bailey Trevor P.
Dravid Vinayak P.
Hua Xia
Kanatzidis Mercouri G.
Luo Zhong‐zhen
Tan Gangjian
Uher Ctirad
Wolverton Chris
Xu Jianwei
Yan Qingyu
Zhang Xiaomi
Publication venue: 'Wiley'
Publication date: 01/01/2018
Field of study

Sbâ doped and GeTeâ alloyed nâ type thermoelectric materials that show an excellent figure of merit ZT in the intermediate temperature range (400â 800 K) are reported. The synergistic effect of favorable changes to the band structure resulting in high Seebeck coefficient and enhanced phonon scattering by point defects and nanoscale precipitates resulting in reduction of thermal conductivity are demonstrated. The samples can be tuned as singleâ phase solid solution (SS) or twoâ phase system with nanoscale precipitates (Nano) based on the annealing processes. The GeTe alloying results in band structure modification by widening the bandgap and increasing the densityâ ofâ states effective mass of PbTe, resulting in significantly enhanced Seebeck coefficients. The nanoscale precipitates can improve the power factor in the low temperature range and further reduce the lattice thermal conductivity (Îºlat). Specifically, the Seebeck coefficient of Pb0.988Sb0.012Teâ 13%GeTeâ Nano approaches â 280 ÂµV Kâ 1 at 673 K with a low Îºlat of 0.56 W mâ 1 Kâ 1 at 573 K. Consequently, a peak ZT value of 1.38 is achieved at 623 K. Moreover, a high average ZTavg value of â 1.04 is obtained in the temperature range from 300 to 773 K for nâ type Pb0.988Sb0.012Teâ 13%GeTeâ Nano.Both supersaturated solid solutions and nanostructured nâ type Pb1â xGexTe systems with excellent thermoelectric performance can be prepared via a nonequilibrium process. The nanostructured sample enhances the figure of merit ZT via reducing the lattice thermal conductivity. A ZTavg of â 1.04 is obtained, which is among the highest ZTavg values for nâ type PbTe materials reported so far.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/145314/1/adfm201801617-sup-0001-S1.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/145314/2/adfm201801617.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/145314/3/adfm201801617_am.pd

Crossref

DR-NTU (Digital Repository of NTU)

Deep Blue Documents

Modelling of grinding mechanics : a review

Author: Guo Bing
Jackson Mark J.
Li Hao Nan
Linke Barbara S.
Luo Xichun
Meng Qingyu
Zhao Qingliang
Publication venue: 'Elsevier BV'
Publication date: 27/10/2022
Field of study

Grinding is one of the most widely used material removal methods at the end of many process chains. Grinding force is related to almost all grinding parameters, which has a great influence on material removal rate, dimensional and shape accuracy, surface and subsurface integrity, thermodynamics, dynamics, wheel durability, and machining system deformation. Considering that grinding force is related to almost all grinding parameters, grinding force can be used to detect grinding wheel wear, energy calculation, chatter suppression, force control and grinding process simulation. Accurate prediction of grinding forces is important for optimizing grinding parameters and the structure of grinding machines and fixtures. Although there are substantial research papers on grinding mechanics, a comprehensive review on the modeling of grinding mechanics is still absent from the literature. To fill this gap, this work reviews and introduces theoretical methods and applications of mechanics in grinding from the aspects of modeling principles, limitations and possible future trendencies

University of Strathclyde Institutional Repository