Search CORE

134 research outputs found

CerFix: A System for Cleaning Data with Certain Fixes

Author: Fan Wenfei
Li Jianzhong
Ma Shuai
Tang Nan
Yu Wenyuan
Publication venue
Publication date: 01/01/2011
Field of study

Towards Certain Fixes with Editing Rules and Master Data

Author: Fan Wenfei
Li Jianzhong
Ma Shuai
Tang Nan
Yu Wenyuan
Publication venue
Publication date: 01/01/2010
Field of study

A variety of integrity constraints have been studied for data cleaning. While these constraints can detect the presence of errors, they fall short of guiding us to correct the errors. Indeed, data repairing based on these constraints may not find certain fixes that are absolutely correct, and worse, may introduce new errors when repairing the data. We propose a method for finding certain fixes, based on master data, a notion of certain regions , and a class of editing rules . A certain region is a set of attributes that are assured correct by the users. Given a certain region and master data, editing rules tell us what attributes to fix and how to update them. We show how the method can be used in data monitoring and enrichment. We develop techniques for reasoning about editing rules, to decide whether they lead to a unique fix and whether they are able to fix all the attributes in a tuple, relative to master data and a certain region. We also provide an algorithm to identify minimal certain regions, such that a certain fix is warranted by editing rules and master data as long as one of the regions is correct. We experimentally verify the effectiveness and scalability of the algorithm. </jats:p

Crossref

Edinburgh Research Explorer

A Causal Model for Safety Assessment Purposes in Opening the Low-Altitude Urban Airspace of Chinese Pilot Cities

Author: Tang Jun
Tang Jun
Yang Wenyuan
Publication venue
Publication date: 09/10/2018
Field of study

China has been gradually relaxing its ban on the use of low-altitude airspace across the country. To guarantee the high reliability of air traffic management (ATM), conflict detection and conflict resolution (CDR) approaches are indispensable to maintain safe separation between neighbouring small fixed-wing aircraft. In this study, we analyse a temporal and spatial integrated strategy for safety assessment purposes in opening the low-altitude urban airspace of Chinese pilot cities. First, we present a detailed mathematical description of the proposed algorithms based on a spatial grid partitioning system (SGPS). For our system, a conflict detection (CD) algorithm is designed to determine if two trajectories pass through the same grid space within overlapping time windows. A conflict resolution (CR) algorithm integrates a proposed time scheduling-based technique (TST) and vertical change-based technique (VCT), which operate under predetermined basic principles. Then, based on our novel CDR algorithms, a causal model is constructed in graphical modelling and analysis software (GMAS) to generate a state space that can provide a global perspective on scenario dynamics and better understanding of induced conflict occurrences. Finally, simulation results demonstrate that the proposed approach is practical and efficient. Document type: Articl

Scipedia

Learning to Correct Noisy Labels for Fine-Grained Entity Typing via Co-Prediction Prompt Tuning

Author: He Yongquan
Lin Yang
Tang Minghao
Xu Hongbo
Xu Yongxiu
Zhang Wenyuan
Publication venue
Publication date: 23/10/2023
Field of study

Fine-grained entity typing (FET) is an essential task in natural language processing that aims to assign semantic types to entities in text. However, FET poses a major challenge known as the noise labeling problem, whereby current methods rely on estimating noise distribution to identify noisy labels but are confused by diverse noise distribution deviation. To address this limitation, we introduce Co-Prediction Prompt Tuning for noise correction in FET, which leverages multiple prediction results to identify and correct noisy labels. Specifically, we integrate prediction results to recall labeled labels and utilize a differentiated margin to identify inaccurate labels. Moreover, we design an optimization objective concerning divergent co-predictions during fine-tuning, ensuring that the model captures sufficient information and maintains robustness in noise identification. Experimental results on three widely-used FET datasets demonstrate that our noise correction approach significantly enhances the quality of various types of training samples, including those annotated using distant supervision, ChatGPT, and crowdsourcing.Comment: Accepted by Findings of EMNLP 2023, 11 page

arXiv.org e-Print Archive

A Boundary Offset Prediction Network for Named Entity Recognition

Author: He Yongquan
Lin Yang
Tang Minghao
Xu Hongbo
Xu Yongxiu
Zhang Wenyuan
Publication venue
Publication date: 23/10/2023
Field of study

Named entity recognition (NER) is a fundamental task in natural language processing that aims to identify and classify named entities in text. However, span-based methods for NER typically assign entity types to text spans, resulting in an imbalanced sample space and neglecting the connections between non-entity and entity spans. To address these issues, we propose a novel approach for NER, named the Boundary Offset Prediction Network (BOPN), which predicts the boundary offsets between candidate spans and their nearest entity spans. By leveraging the guiding semantics of boundary offsets, BOPN establishes connections between non-entity and entity spans, enabling non-entity spans to function as additional positive samples for entity detection. Furthermore, our method integrates entity type and span representations to generate type-aware boundary offsets instead of using entity types as detection targets. We conduct experiments on eight widely-used NER datasets, and the results demonstrate that our proposed BOPN outperforms previous state-of-the-art methods.Comment: Accepted by Findings of EMNLP 2023, 13 page

arXiv.org e-Print Archive