8 research outputs found
280 Birds with One Stone: Inducing Multilingual Taxonomies from Wikipedia using Character-level Classification
We propose a simple, yet effective, approach towards inducing multilingual
taxonomies from Wikipedia. Given an English taxonomy, our approach leverages
the interlanguage links of Wikipedia followed by character-level classifiers to
induce high-precision, high-coverage taxonomies in other languages. Through
experiments, we demonstrate that our approach significantly outperforms the
state-of-the-art, heuristics-heavy approaches for six languages. As a
consequence of our work, we release presumably the largest and the most
accurate multilingual taxonomic resource spanning over 280 languages
Taxonomy Induction using Hypernym Subsequences
We propose a novel, semi-supervised approach towards domain taxonomy
induction from an input vocabulary of seed terms. Unlike all previous
approaches, which typically extract direct hypernym edges for terms, our
approach utilizes a novel probabilistic framework to extract hypernym
subsequences. Taxonomy induction from extracted subsequences is cast as an
instance of the minimumcost flow problem on a carefully designed directed
graph. Through experiments, we demonstrate that our approach outperforms
stateof- the-art taxonomy induction approaches across four languages.
Importantly, we also show that our approach is robust to the presence of noise
in the input vocabulary. To the best of our knowledge, no previous approaches
have been empirically proven to manifest noise-robustness in the input
vocabulary
Natural Language Interfaces for Tabular Data Querying and Visualization: A Survey
The emergence of natural language processing has revolutionized the way users
interact with tabular data, enabling a shift from traditional query languages
and manual plotting to more intuitive, language-based interfaces. The rise of
large language models (LLMs) such as ChatGPT and its successors has further
advanced this field, opening new avenues for natural language processing
techniques. This survey presents a comprehensive overview of natural language
interfaces for tabular data querying and visualization, which allow users to
interact with data using natural language queries. We introduce the fundamental
concepts and techniques underlying these interfaces with a particular emphasis
on semantic parsing, the key technology facilitating the translation from
natural language to SQL queries or data visualization commands. We then delve
into the recent advancements in Text-to-SQL and Text-to-Vis problems from the
perspectives of datasets, methodologies, metrics, and system designs. This
includes a deep dive into the influence of LLMs, highlighting their strengths,
limitations, and potential for future improvements. Through this survey, we aim
to provide a roadmap for researchers and practitioners interested in developing
and applying natural language interfaces for data interaction in the era of
large language models.Comment: 20 pages, 4 figures, 5 tables. Submitted to IEEE TKD