243 research outputs found

    Learning to Correct Noisy Labels for Fine-Grained Entity Typing via Co-Prediction Prompt Tuning

    Full text link
    Fine-grained entity typing (FET) is an essential task in natural language processing that aims to assign semantic types to entities in text. However, FET poses a major challenge known as the noise labeling problem, whereby current methods rely on estimating noise distribution to identify noisy labels but are confused by diverse noise distribution deviation. To address this limitation, we introduce Co-Prediction Prompt Tuning for noise correction in FET, which leverages multiple prediction results to identify and correct noisy labels. Specifically, we integrate prediction results to recall labeled labels and utilize a differentiated margin to identify inaccurate labels. Moreover, we design an optimization objective concerning divergent co-predictions during fine-tuning, ensuring that the model captures sufficient information and maintains robustness in noise identification. Experimental results on three widely-used FET datasets demonstrate that our noise correction approach significantly enhances the quality of various types of training samples, including those annotated using distant supervision, ChatGPT, and crowdsourcing.Comment: Accepted by Findings of EMNLP 2023, 11 page

    A Boundary Offset Prediction Network for Named Entity Recognition

    Full text link
    Named entity recognition (NER) is a fundamental task in natural language processing that aims to identify and classify named entities in text. However, span-based methods for NER typically assign entity types to text spans, resulting in an imbalanced sample space and neglecting the connections between non-entity and entity spans. To address these issues, we propose a novel approach for NER, named the Boundary Offset Prediction Network (BOPN), which predicts the boundary offsets between candidate spans and their nearest entity spans. By leveraging the guiding semantics of boundary offsets, BOPN establishes connections between non-entity and entity spans, enabling non-entity spans to function as additional positive samples for entity detection. Furthermore, our method integrates entity type and span representations to generate type-aware boundary offsets instead of using entity types as detection targets. We conduct experiments on eight widely-used NER datasets, and the results demonstrate that our proposed BOPN outperforms previous state-of-the-art methods.Comment: Accepted by Findings of EMNLP 2023, 13 page

    On the Feasibility of Distributed Kernel Regression for Big Data

    Get PDF
    Abstract In modern scientific research, massive datasets with huge numbers of observations are frequently encountered. To facilitate the computational process, a divide-and-conquer scheme is often used for the analysis of big data. In such a strategy, a full dataset is first split into several manageable segments; the final output is then averaged from the individual outputs of the segments. Despite its popularity in practice, it remains largely unknown that whether such a distributive strategy provides valid theoretical inferences to the original data. In this paper, we address this fundamental issue for the distributed kernel regression (DKR), where the algorithmic feasibility is measured by the generalization performance of the resulting estimator. To justify DKR, a uniform convergence rate is needed for bounding the generalization error over the individual outputs, which brings new and challenging issues in the big data setup. Under mild conditions, we show that, with a proper number of segments, DKR leads to an estimator that is generalization consistent to the unknown regression function. The obtained results justify the method of DKR and shed light on the feasibility of using other distributed algorithms for processing big data. The promising preference of the method is supported by both simulation and real data examples

    Progress of Exogenous Enzymes Application in Black Tea Processing

    Get PDF
    Black tea is the most important in the world tea production, and is very popular in the world tea market. China has abundant tea resources, especially unused tea leaves in summer and autumn, and there are a few new ways to develop them, such as processing black tea or deep processed products. Application of exogenous enzymes with good functional characteristics, can improve the quality of summer and autumn black teas, and their deep processed products. This article reviews that, the current status and existing problems of exogenous enzymes application in black tea processing in recent years, as well as the types and acting mechanisms of frequently-used exogenous enzymes, transformation of tea components catalyzed by exogenous enzymes during withering, rolling, and fermentation in black tea processing. Utilizing exogenous enzymes to promote the synthesis of theaflavins, a unique component of black tea, and the quality changes in deep-processing of black tea, are also summarized. Exploring the enzymatic processing methods, could promote the efficient utilization of tea resources, and regulate quality of summer and autumn black tea, as well as their deep-processed products, in the future
    • …
    corecore