Search CORE

43 research outputs found

Adversarial Learning for Chinese NER from Crowd Annotations

Author: Chen Wenliang
Wang Haofen
Yang YaoSheng
Zhang Meishan
Zhang Min
Zhang Wei
Publication venue
Publication date: 16/01/2018
Field of study

To quickly obtain new labeled data, we can choose crowdsourcing as an alternative way at lower cost in a short time. But as an exchange, crowd annotations from non-experts may be of lower quality than those from experts. In this paper, we propose an approach to performing crowd annotation learning for Chinese Named Entity Recognition (NER) to make full use of the noisy sequence labels from multiple annotators. Inspired by adversarial learning, our approach uses a common Bi-LSTM and a private Bi-LSTM for representing annotator-generic and -specific information. The annotator-generic information is the common knowledge for entities easily mastered by the crowd. Finally, we build our Chinese NE tagger based on the LSTM-CRF model. In our experiments, we create two data sets for Chinese NER tasks from two domains. The experimental results show that our system achieves better scores than strong baseline systems.Comment: 8 pages, AAAI-201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis

Author: Guo Qianyu
Hei Nailei
Wang Haofen
Wang Yan
Wang Zihao
Zhang Wenqiang
Publication venue
Publication date: 20/02/2024
Field of study

Well-designed prompts have demonstrated the potential to guide text-to-image models in generating amazing images. Although existing prompt engineering methods can provide high-level guidance, it is challenging for novice users to achieve the desired results by manually entering prompts due to a discrepancy between novice-user-input prompts and the model-preferred prompts. To bridge the distribution gap between user input behavior and model training datasets, we first construct a novel Coarse-Fine Granularity Prompts dataset (CFP) and propose a novel User-Friendly Fine-Grained Text Generation framework (UF-FGTG) for automated prompt optimization. For CFP, we construct a novel dataset for text-to-image tasks that combines coarse and fine-grained prompts to facilitate the development of automated prompt generation methods. For UF-FGTG, we propose a novel framework that automatically translates user-input prompts into model-preferred prompts. Specifically, we propose a prompt refiner that continually rewrites prompts to empower users to select results that align with their unique needs. Meanwhile, we integrate image-related loss functions from the text-to-image model into the training process of text generation to generate model-preferred prompts. Additionally, we propose an adaptive feature extraction module to ensure diversity in the generated results. Experiments demonstrate that our approach is capable of generating more visually appealing and diverse images than previous state-of-the-art methods, achieving an average improvement of 5% across six quality and aesthetic metrics.Comment: Accepted by The 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024

arXiv.org e-Print Archive

Towards Text-to-SQL over Aggregate Tables

Author: Haofen Wang
Jun Ma
Kaibin Zhou
Shuqin Li
Zeyang Zhuang
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2023
Field of study

ABSTRACTText-to-SQL aims at translating textual questions into the corresponding SQL queries. Aggregate tables are widely created for high-frequent queries. Although text-to-SQL has emerged as an important task, recent studies paid little attention to the task over aggregate tables. The increased aggregate tables bring two challenges: (1) mapping of natural language questions and relational databases will suffer from more ambiguity, (2) modern models usually adopt self-attention mechanism to encode database schema and question. The mechanism is of quadratic time complexity, which will make inferring more time-consuming as input sequence length grows. In this paper, we introduce a novel approach named WAGG for text-to-SQL over aggregate tables. To effectively select among ambiguous items, we propose a relation selection mechanism for relation computing. To deal with high computation costs, we introduce a dynamical pruning strategy to discard unrelated items that are common for aggregate tables. We also construct a new large-scale dataset SpiderwAGG extended from Spider dataset for validation, where extensive experiments show the effectiveness and efficiency of our proposed method with 4% increase of accuracy and 15% decrease of inference time w.r.t a strong baseline RAT-SQL

Directory of Open Access Journals

Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

Author: Cimiano Philipp
Ioannidis Yannis E.
Lee Dik Lun
Ng Raymond T.
Rudolph Sebastian
Tran Duc Thanh
Wang Haofen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Tran DT, Wang H, Rudolph S, Cimiano P. Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data. In: Ioannidis YE, Lee DL, Ng RT, eds. Proceedings of the 25th International Conference on Data Engineering (ICDE’09). 2009: 405-416

CiteSeerX

Crossref

Publications at Bielefeld University

GeoBERT: Pre-Training Geospatial Representation Learning on Point-of-Interest

Author: Haofen Wang
Siqi Wang
Yun Xiong
Yunfan Gao
Publication venue: 'MDPI AG'
Publication date: 16/12/2022
Field of study

Thanks to the development of geographic information technology, geospatial representation learning based on POIs (Point-of-Interest) has gained widespread attention in the past few years. POI is an important indicator to reflect urban socioeconomic activities, widely used to extract geospatial information. However, previous studies often focus on a specific area, such as a city or a district, and are designed only for particular tasks, such as land-use classification. On the other hand, large-scale pre-trained models (PTMs) have recently achieved impressive success and become a milestone in artificial intelligence (AI). Against this background, this study proposes the first large-scale pre-training geospatial representation learning model called GeoBERT. First, we collect about 17 million POIs in 30 cities across China to construct pre-training corpora, with 313 POI types as the tokens and the level-7 Geohash grids as the basic units. Second, we pre-train GeoEBRT to learn grid embedding in self-supervised learning by masking the POI type and then predicting. Third, under the paradigm of “pre-training + fine-tuning”, we design five practical downstream tasks. Experiments show that, with just one additional output layer fine-tuning, GeoBERT outperforms previous NLP methods (Word2vec, GloVe) used in geospatial representation learning by 9.21% on average in F1-score for classification tasks, such as store site recommendation and working/living area prediction. For regression tasks, such as POI number prediction, house price prediction, and passenger flow prediction, GeoBERT demonstrates greater performance improvements. The experiment results prove that pre-training on large-scale POI data can significantly improve the ability to extract geospatial information. In the discussion section, we provide a detailed analysis of what GeoBERT has learned from the perspective of attention mechanisms

Multidisciplinary Digital Publishing Institute

A New Operator for ABox Revision in DL-Lite

Author: Gao Sibei
Qi Guilin
Wang Haofen
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 20/09/2021
Field of study

In this paper, we propose a new operator for revising ABoxes in DL-Lite ontologies. We present a graph-based algorithm for ABox revision in DL-Lite, which implements the revision operator and we show it runs in polynomial tim

Association for the Advancement of Artificial Intelligence: AAAI Publications