134 research outputs found
Towards Certain Fixes with Editing Rules and Master Data
A variety of integrity constraints have been studied for data cleaning. While these constraints can detect the presence of errors, they fall short of guiding us to correct the errors. Indeed, data repairing based on these constraints may not find
certain fixes
that are absolutely correct, and worse, may introduce new errors when repairing the data. We propose a method for finding certain fixes, based on master data, a notion of
certain regions
, and a class of
editing rules
. A certain region is a set of attributes that are assured correct by the users. Given a certain region and master data, editing rules tell us what attributes to fix and how to update them. We show how the method can be used in data monitoring and enrichment. We develop techniques for reasoning about editing rules, to decide whether they lead to a unique fix and whether they are able to fix all the attributes in a tuple,
relative
to master data and a certain region. We also provide an algorithm to identify minimal certain regions, such that a certain fix is warranted by editing rules and master data as long as one of the regions is correct. We experimentally verify the effectiveness and scalability of the algorithm.
</jats:p
A Causal Model for Safety Assessment Purposes in Opening the Low-Altitude Urban Airspace of Chinese Pilot Cities
China has been gradually relaxing its ban on the use of low-altitude airspace across the country. To guarantee the high reliability of air traffic management (ATM), conflict detection and conflict resolution (CDR) approaches are indispensable to maintain safe separation between neighbouring small fixed-wing aircraft. In this study, we analyse a temporal and spatial integrated strategy for safety assessment purposes in opening the low-altitude urban airspace of Chinese pilot cities. First, we present a detailed mathematical description of the proposed algorithms based on a spatial grid partitioning system (SGPS). For our system, a conflict detection (CD) algorithm is designed to determine if two trajectories pass through the same grid space within overlapping time windows. A conflict resolution (CR) algorithm integrates a proposed time scheduling-based technique (TST) and vertical change-based technique (VCT), which operate under predetermined basic principles. Then, based on our novel CDR algorithms, a causal model is constructed in graphical modelling and analysis software (GMAS) to generate a state space that can provide a global perspective on scenario dynamics and better understanding of induced conflict occurrences. Finally, simulation results demonstrate that the proposed approach is practical and efficient.
Document type: Articl
Learning to Correct Noisy Labels for Fine-Grained Entity Typing via Co-Prediction Prompt Tuning
Fine-grained entity typing (FET) is an essential task in natural language
processing that aims to assign semantic types to entities in text. However, FET
poses a major challenge known as the noise labeling problem, whereby current
methods rely on estimating noise distribution to identify noisy labels but are
confused by diverse noise distribution deviation. To address this limitation,
we introduce Co-Prediction Prompt Tuning for noise correction in FET, which
leverages multiple prediction results to identify and correct noisy labels.
Specifically, we integrate prediction results to recall labeled labels and
utilize a differentiated margin to identify inaccurate labels. Moreover, we
design an optimization objective concerning divergent co-predictions during
fine-tuning, ensuring that the model captures sufficient information and
maintains robustness in noise identification. Experimental results on three
widely-used FET datasets demonstrate that our noise correction approach
significantly enhances the quality of various types of training samples,
including those annotated using distant supervision, ChatGPT, and
crowdsourcing.Comment: Accepted by Findings of EMNLP 2023, 11 page
A Boundary Offset Prediction Network for Named Entity Recognition
Named entity recognition (NER) is a fundamental task in natural language
processing that aims to identify and classify named entities in text. However,
span-based methods for NER typically assign entity types to text spans,
resulting in an imbalanced sample space and neglecting the connections between
non-entity and entity spans. To address these issues, we propose a novel
approach for NER, named the Boundary Offset Prediction Network (BOPN), which
predicts the boundary offsets between candidate spans and their nearest entity
spans. By leveraging the guiding semantics of boundary offsets, BOPN
establishes connections between non-entity and entity spans, enabling
non-entity spans to function as additional positive samples for entity
detection. Furthermore, our method integrates entity type and span
representations to generate type-aware boundary offsets instead of using entity
types as detection targets. We conduct experiments on eight widely-used NER
datasets, and the results demonstrate that our proposed BOPN outperforms
previous state-of-the-art methods.Comment: Accepted by Findings of EMNLP 2023, 13 page
- …