177 research outputs found
Enhancing Phrase Representation by Information Bottleneck Guided Text Diffusion Process for Keyphrase Extraction
Keyphrase extraction (KPE) is an important task in Natural Language
Processing for many scenarios, which aims to extract keyphrases that are
present in a given document. Many existing supervised methods treat KPE as
sequential labeling, span-level classification, or generative tasks. However,
these methods lack the ability to utilize keyphrase information, which may
result in biased results. In this study, we propose Diff-KPE, which leverages
the supervised Variational Information Bottleneck (VIB) to guide the text
diffusion process for generating enhanced keyphrase representations. Diff-KPE
first generates the desired keyphrase embeddings conditioned on the entire
document and then injects the generated keyphrase embeddings into each phrase
representation. A ranking network and VIB are then optimized together with rank
loss and classification loss, respectively. This design of Diff-KPE allows us
to rank each candidate phrase by utilizing both the information of keyphrases
and the document. Experiments show that Diff-KPE outperforms existing KPE
methods on a large open domain keyphrase extraction benchmark, OpenKP, and a
scientific domain dataset, KP20K.Comment: 10 pages, 2 figure
Environmental impact of the tourism industry in China: analyses based on multiple environmental factors using novel Quantile Autoregressive Distributed Lag model
This study examines the impact of tourism on China’s environmental quality under the framework of the Environment Kuznets
Curve. In this study, tourism is measured by the number of tourist
arrival and environmental pollution is measured by three proxies:
carbon emissions, atmospheric particulate matter, and greenhouse
gases. The study additionally controls trade openness effects
using annual data from 1995 to 2018. Based on the asymmetric
behavior of environmental variables, the study applies the
Quantile Autoregressive Distributed Lag model that helps to integrate both dynamic trends and non-linearity. The findings confirmed the validity of Environment Kuznets in the long run and
unveiled that tourist arrivals reduce carbon emissions, atmospheric particulate matter, and greenhouse gases in the long run,
but in short-run dynamics, tourist arrivals only reduce carbon
emissions. Similarly, trade openness increases carbon emissions,
atmospheric particulate matter, and greenhouse gases at initial
quantiles in the long run. In contrast, in the case of the short run,
trade openness reduces atmospheric particulate matter and
greenhouse gases. These results imply that the emissions mitigating (contributing) effects of tourism and trade varied across lower
and higher quantiles. In conclusion, the findings reveal that the
government should take effective measures to implement appropriate strategies required to sustain tourism and trade in China
Knowledge Graph Reasoning over Entities and Numerical Values
A complex logic query in a knowledge graph refers to a query expressed in
logic form that conveys a complex meaning, such as where did the Canadian
Turing award winner graduate from? Knowledge graph reasoning-based
applications, such as dialogue systems and interactive search engines, rely on
the ability to answer complex logic queries as a fundamental task. In most
knowledge graphs, edges are typically used to either describe the relationships
between entities or their associated attribute values. An attribute value can
be in categorical or numerical format, such as dates, years, sizes, etc.
However, existing complex query answering (CQA) methods simply treat numerical
values in the same way as they treat entities. This can lead to difficulties in
answering certain queries, such as which Australian Pulitzer award winner is
born before 1927, and which drug is a pain reliever and has fewer side effects
than Paracetamol. In this work, inspired by the recent advances in numerical
encoding and knowledge graph reasoning, we propose numerical complex query
answering. In this task, we introduce new numerical variables and operations to
describe queries involving numerical attribute values. To address the
difference between entities and numerical values, we also propose the framework
of Number Reasoning Network (NRN) for alternatively encoding entities and
numerical values into separate encoding structures. During the numerical
encoding process, NRN employs a parameterized density function to encode the
distribution of numerical values. During the entity encoding process, NRN uses
established query encoding methods for the original CQA problem. Experimental
results show that NRN consistently improves various query encoding methods on
three different knowledge graphs and achieves state-of-the-art results
IterAlign: Iterative Constitutional Alignment of Large Language Models
With the rapid development of large language models (LLMs), aligning LLMs
with human values and societal norms to ensure their reliability and safety has
become crucial. Reinforcement learning with human feedback (RLHF) and
Constitutional AI (CAI) have been proposed for LLM alignment. However, these
methods require either heavy human annotations or explicitly pre-defined
constitutions, which are labor-intensive and resource-consuming. To overcome
these drawbacks, we study constitution-based LLM alignment and propose a
data-driven constitution discovery and self-alignment framework called
IterAlign. IterAlign leverages red teaming to unveil the weaknesses of an LLM
and automatically discovers new constitutions using a stronger LLM. These
constitutions are then used to guide self-correction of the base LLM. Such a
constitution discovery pipeline can be run iteratively and automatically to
discover new constitutions that specifically target the alignment gaps in the
current LLM. Empirical results on several safety benchmark datasets and
multiple base LLMs show that IterAlign successfully improves truthfulness,
helpfulness, harmlessness and honesty, improving the LLM alignment by up to
in harmlessness.Comment: NAACL 202
Short Text Pre-training with Extended Token Classification for E-commerce Query Understanding
E-commerce query understanding is the process of inferring the shopping
intent of customers by extracting semantic meaning from their search queries.
The recent progress of pre-trained masked language models (MLM) in natural
language processing is extremely attractive for developing effective query
understanding models. Specifically, MLM learns contextual text embedding via
recovering the masked tokens in the sentences. Such a pre-training process
relies on the sufficient contextual information. It is, however, less effective
for search queries, which are usually short text. When applying masking to
short search queries, most contextual information is lost and the intent of the
search queries may be changed. To mitigate the above issues for MLM
pre-training on search queries, we propose a novel pre-training task
specifically designed for short text, called Extended Token Classification
(ETC). Instead of masking the input text, our approach extends the input by
inserting tokens via a generator network, and trains a discriminator to
identify which tokens are inserted in the extended input. We conduct
experiments in an E-commerce store to demonstrate the effectiveness of ETC
An immunization scheme for ransomware
In recent years, as the popularity of anonymous currencies such as Bitcoin has made the tracking of ransomware attackers more difficult, the amount of ransomware attacks against personal computers and enterprise production servers is increasing rapidly. The ransomware has a wide range of influence and spreads all over the world. It is affecting many industries including internet, education, medical care, traditional industry, etc. This paper uses the idea of virus immunity to design an immunization solution for ransomware viruses to solve the problems of traditional ransomware defense methods (such as anti-virus software, firewalls, etc.), which cannot meet the requirements of rapid detection and immediate prevention of new outbreaks attacks. Our scheme includes two parts: server and client. The server provides an immune configuration file and configuration file management functions, including a configuration file module, a cryptography algorithm module, and a display module. The client obtains the immunization configuration file from server in real time, and performs the corresponding operations according to the configuration file to make the computer have an immune function for a specific ransomware, including an update module, a configuration file module, a cryptography algorithm module, a control module, and a log module. This scheme controls mutexes, services, files and registries respectively, to destroy the triggering conditions of the virus and finally achieve the purpose of immunizing a computer from a specific ransomware
High Thermoelectric Performance in Supersaturated Solid Solutions and Nanostructured nâ Type PbTeâ GeTe
Sbâ doped and GeTeâ alloyed nâ type thermoelectric materials that show an excellent figure of merit ZT in the intermediate temperature range (400â 800 K) are reported. The synergistic effect of favorable changes to the band structure resulting in high Seebeck coefficient and enhanced phonon scattering by point defects and nanoscale precipitates resulting in reduction of thermal conductivity are demonstrated. The samples can be tuned as singleâ phase solid solution (SS) or twoâ phase system with nanoscale precipitates (Nano) based on the annealing processes. The GeTe alloying results in band structure modification by widening the bandgap and increasing the densityâ ofâ states effective mass of PbTe, resulting in significantly enhanced Seebeck coefficients. The nanoscale precipitates can improve the power factor in the low temperature range and further reduce the lattice thermal conductivity (κlat). Specifically, the Seebeck coefficient of Pb0.988Sb0.012Teâ 13%GeTeâ Nano approaches â 280 µV Kâ 1 at 673 K with a low κlat of 0.56 W mâ 1 Kâ 1 at 573 K. Consequently, a peak ZT value of 1.38 is achieved at 623 K. Moreover, a high average ZTavg value of â 1.04 is obtained in the temperature range from 300 to 773 K for nâ type Pb0.988Sb0.012Teâ 13%GeTeâ Nano.Both supersaturated solid solutions and nanostructured nâ type Pb1â xGexTe systems with excellent thermoelectric performance can be prepared via a nonequilibrium process. The nanostructured sample enhances the figure of merit ZT via reducing the lattice thermal conductivity. A ZTavg of â 1.04 is obtained, which is among the highest ZTavg values for nâ type PbTe materials reported so far.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/145314/1/adfm201801617-sup-0001-S1.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/145314/2/adfm201801617.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/145314/3/adfm201801617_am.pd
Modelling of grinding mechanics : a review
Grinding is one of the most widely used material removal methods at the end of many process chains. Grinding force is related to almost all grinding parameters, which has a great influence on material removal rate, dimensional and shape accuracy, surface and subsurface integrity, thermodynamics, dynamics, wheel durability, and machining system deformation. Considering that grinding force is related to almost all grinding parameters, grinding force can be used to detect grinding wheel wear, energy calculation, chatter suppression, force control and grinding process simulation. Accurate prediction of grinding forces is important for optimizing grinding parameters and the structure of grinding machines and fixtures. Although there are substantial research papers on grinding mechanics, a comprehensive review on the modeling of grinding mechanics is still absent from the literature. To fill this gap, this work reviews and introduces theoretical methods and applications of mechanics in grinding from the aspects of modeling principles, limitations and possible future trendencies
- …