1,543 research outputs found

    XML Schema Clustering with Semantic and Hierarchical Similarity Measures

    Get PDF
    With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis

    Proceedings of the Workshop Semantic Content Acquisition and Representation (SCAR) 2007

    Get PDF
    This is the proceedings of the Workshop on Semantic Content Acquisition and Representation, held in conjunction with NODALIDA 2007, on May 24 2007 in Tartu, Estonia.</p

    LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model

    Full text link
    Universally modeling all typical information extraction tasks (UIE) with one generative language model (GLM) has revealed great potential by the latest study, where various IE predictions are unified into a linearized hierarchical expression under a GLM. Syntactic structure information, a type of effective feature which has been extensively utilized in IE community, should also be beneficial to UIE. In this work, we propose a novel structure-aware GLM, fully unleashing the power of syntactic knowledge for UIE. A heterogeneous structure inductor is explored to unsupervisedly induce rich heterogeneous structural representations by post-training an existing GLM. In particular, a structural broadcaster is devised to compact various latent trees into explicit high-order forests, helping to guide a better generation during decoding. We finally introduce a task-oriented structure fine-tuning mechanism, further adjusting the learned structures to most coincide with the end-task's need. Over 12 IE benchmarks across 7 tasks our system shows significant improvements over the baseline UIE system. Further in-depth analyses show that our GLM learns rich task-adaptive structural bias that greatly resolves the UIE crux, the long-range dependence issue and boundary identifying. Source codes are open at https://github.com/ChocoWu/LasUIE.Comment: NeurIPS2022 conference pape

    A recurrent neural network architecture for biomedical event trigger classification

    Get PDF
    A “biomedical event” is a broad term used to describe the roles and interactions between entities (such as proteins, genes and cells) in a biological system. The task of biomedical event extraction aims at identifying and extracting these events from unstructured texts. An important component in the early stage of the task is biomedical trigger classification which involves identifying and classifying words/phrases that indicate an event. In this thesis, we present our work on biomedical trigger classification developed using the multi-level event extraction dataset. We restrict the scope of our classification to 19 biomedical event types grouped under four broad categories - Anatomical, Molecular, General and Planned. While most of the existing approaches are based on traditional machine learning algorithms which require extensive feature engineering, our model relies on neural networks to implicitly learn important features directly from the text. We use natural language processing techniques to transform the text into vectorized inputs that can be used in a neural network architecture. As per our knowledge, this is the first time neural attention strategies are being explored in the area of biomedical trigger classification. Our best results were obtained from an ensemble of 50 models which produced a micro F-score of 79.82%, an improvement of 1.3% over the previous best score

    Automatic construction of wrappers for semi-structured documents.

    Get PDF
    Lin Wai-yip.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 114-123).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Information Extraction --- p.1Chapter 1.2 --- IE from Semi-structured Documents --- p.3Chapter 1.3 --- Thesis Contributions --- p.7Chapter 1.4 --- Thesis Organization --- p.9Chapter 2 --- Related Work --- p.11Chapter 2.1 --- Existing Approaches --- p.11Chapter 2.2 --- Limitations of Existing Approaches --- p.18Chapter 2.3 --- Our HISER Approach --- p.20Chapter 3 --- System Overview --- p.23Chapter 3.1 --- Hierarchical record Structure and Extraction Rule learning (HISER) --- p.23Chapter 3.2 --- Hierarchical Record Structure --- p.29Chapter 3.3 --- Extraction Rule --- p.29Chapter 3.4 --- Wrapper Adaptation --- p.32Chapter 4 --- Automatic Hierarchical Record Structure Construction --- p.34Chapter 4.1 --- Motivation --- p.34Chapter 4.2 --- Hierarchical Record Structure Representation --- p.36Chapter 4.3 --- Constructing Hierarchical Record Structure --- p.38Chapter 5 --- Extraction Rule Induction --- p.43Chapter 5.1 --- Rule Representation --- p.43Chapter 5.2 --- Extraction Rule Induction Algorithm --- p.47Chapter 6 --- Experimental Results of Wrapper Learning --- p.54Chapter 6.1 --- Experimental Methodology --- p.54Chapter 6.2 --- Results on Electronic Appliance Catalogs --- p.56Chapter 6.3 --- Results on Book Catalogs --- p.60Chapter 6.4 --- Results on Seminar Announcements --- p.62Chapter 7 --- Adapting Wrappers to Unseen Information Sources --- p.69Chapter 7.1 --- Motivation --- p.69Chapter 7.2 --- Support Vector Machines --- p.72Chapter 7.3 --- Feature Selection --- p.76Chapter 7.4 --- Automatic Annotation of Training Examples --- p.80Chapter 7.4.1 --- Building SVM Models --- p.81Chapter 7.4.2 --- Seeking Potential Training Example Candidates --- p.82Chapter 7.4.3 --- Classifying Potential Training Examples --- p.84Chapter 8 --- Experimental Results of Wrapper Adaptation --- p.86Chapter 8.1 --- Experimental Methodology --- p.86Chapter 8.2 --- Results on Electronic Appliance Catalogs --- p.89Chapter 8.3 --- Results on Book Catalogs --- p.93Chapter 9 --- Conclusions and Future Work --- p.97Chapter 9.1 --- Conclusions --- p.97Chapter 9.2 --- Future Work --- p.100Chapter A --- Sample Experimental Pages --- p.101Chapter B --- Detailed Experimental Results of Wrapper Adaptation of HISER --- p.109Bibliography --- p.11

    Using Ontologies for Extracting Product Features from Web Pages

    Full text link
    Abstract. In this paper, we show how to use ontologies to bootstrap a knowledge acquisition process that extracts product information from tabular data on Web pages. Furthermore, we use logical rules to reason about product specific properties and to derive higher-order knowledge about product features. We will also explain the knowledge acquisition process, covering both ontological and procedural aspects. Finally, we will give an qualitative and quantitative evaluation of our results.
    • …
    corecore