94 research outputs found
Entity Synonym Discovery via Multipiece Bilateral Context Matching
Being able to automatically discover synonymous entities in an open-world
setting benefits various tasks such as entity disambiguation or knowledge graph
canonicalization. Existing works either only utilize entity features, or rely
on structured annotations from a single piece of context where the entity is
mentioned. To leverage diverse contexts where entities are mentioned, in this
paper, we generalize the distributional hypothesis to a multi-context setting
and propose a synonym discovery framework that detects entity synonyms from
free-text corpora with considerations on effectiveness and robustness. As one
of the key components in synonym discovery, we introduce a neural network model
SYNONYMNET to determine whether or not two given entities are synonym with each
other. Instead of using entities features, SYNONYMNET makes use of multiple
pieces of contexts in which the entity is mentioned, and compares the
context-level similarity via a bilateral matching schema. Experimental results
demonstrate that the proposed model is able to detect synonym sets that are not
observed during training on both generic and domain-specific datasets:
Wiki+Freebase, PubMed+UMLS, and MedBook+MKG, with up to 4.16% improvement in
terms of Area Under the Curve and 3.19% in terms of Mean Average Precision
compared to the best baseline method.Comment: In IJCAI 2020 as a long paper. Code and data are available at
https://github.com/czhang99/SynonymNe
TEST: Text Prototype Aligned Embedding to Activate LLM's Ability for Time Series
This work summarizes two strategies for completing time-series (TS) tasks
using today's language model (LLM): LLM-for-TS, design and train a fundamental
large model for TS data; TS-for-LLM, enable the pre-trained LLM to handle TS
data. Considering the insufficient data accumulation, limited resources, and
semantic context requirements, this work focuses on TS-for-LLM methods, where
we aim to activate LLM's ability for TS data by designing a TS embedding method
suitable for LLM. The proposed method is named TEST. It first tokenizes TS,
builds an encoder to embed them by instance-wise, feature-wise, and
text-prototype-aligned contrast, and then creates prompts to make LLM more open
to embeddings, and finally implements TS tasks. Experiments are carried out on
TS classification and forecasting tasks using 8 LLMs with different structures
and sizes. Although its results cannot significantly outperform the current
SOTA models customized for TS tasks, by treating LLM as the pattern machine, it
can endow LLM's ability to process TS data without compromising the language
ability. This paper is intended to serve as a foundational work that will
inspire further research.Comment: 10 pages, 6 figure
- …