4,480 research outputs found

    Ekstraksi Informasi Halaman Web Menggunakan Pendekatan Bootstrapping pada Ontology-Based Information Extraction

    Get PDF
    AbstrakEkstraksi  informasi  merupakan suatu bidang ilmu untuk pengolahan bahasa alami, dengan cara mengubah teks tidak terstruktur menjadi informasi dalam bentuk terstruktur. Berbagai jenis informasi di Internet ditransmisikan secara tidak terstruktur melalui website, menyebabkan munculnya kebutuhan akan suatu teknologi untuk menganalisa teks dan menemukan pengetahuan yang relevan dalam bentuk informasi terstruktur. Contoh informasi tidak terstruktur adalah informasi utama yang ada pada konten halaman web. Bermacam pendekatan untuk ekstraksi informasi telah dikembangkan oleh berbagai peneliti, baik menggunakan metode manual atau otomatis, namun masih perlu ditingkatkan kinerjanya terkait akurasi dan kecepatan ekstraksi. Pada penelitian ini diusulkan suatu penerapan pendekatan ekstraksi informasi dengan mengkombinasikan pendekatan bootstrapping dengan Ontology-based Information Extraction (OBIE). Pendekatan bootstrapping dengan menggunakan sedikit contoh data berlabel, digunakan untuk memimalkan keterlibatan manusia dalam proses ekstraksi informasi, sedangkan penggunakan panduan ontologi untuk mengekstraksi classes (kelas), properties dan instance digunakan untuk menyediakan konten semantik untuk web semantik. Pengkombinasian kedua pendekatan tersebut diharapkan dapat meningkatan kecepatan proses ekstraksi dan akurasi hasil ekstraksi. Studi kasus untuk penerapan sistem ekstraksi informasi menggunakan dataset “LonelyPlanet”. Kata kunci—Ekstraksi informasi, ontologi, bootstrapping, Ontology-Based Information Extraction, OBIE, kinerja Abstract Information extraction is a field study of natural language processing by converting unstructured text into structured information. Several types of information on the Internet is transmitted through unstructured information via websites, led to emergence of the need a technology to analyze text and found relevant knowledge into structured information. For example of unstructured information is existing main information on the content of web pages. Various approaches  for information extraction have been developed by many researchers, either using manual or automatic method, but still need to be improved performance related accuracy and speed of extraction. This research proposed an approach of information extraction that combines bootstrapping approach with Ontology-Based Information Extraction (OBIE). Bootstrapping approach using small seed of labelled data, is used to minimize human intervention on information extraction process, while the use of guide ontology for extracting classes, properties and instances, using for provide semantic content for semantic web. Combining both approaches expected to increase speed of extraction process and accuracy of extraction results. Case study to apply information extraction system using “LonelyPlanet” datasets. Keywords— Information extraction, ontology, bootstrapping, Ontology-Based Information Extraction, OBIE, performanc

    Towards a relation extraction framework for cyber-security concepts

    Full text link
    In order to assist security analysts in obtaining information pertaining to their network, such as novel vulnerabilities, exploits, or patches, information retrieval methods tailored to the security domain are needed. As labeled text data is scarce and expensive, we follow developments in semi-supervised Natural Language Processing and implement a bootstrapping algorithm for extracting security entities and their relationships from text. The algorithm requires little input data, specifically, a few relations or patterns (heuristics for identifying relations), and incorporates an active learning component which queries the user on the most important decisions to prevent drifting from the desired relations. Preliminary testing on a small corpus shows promising results, obtaining precision of .82.Comment: 4 pages in Cyber & Information Security Research Conference 2015, AC

    Initiating organizational memories using ontology-based network analysis as a bootstrapping tool

    Get PDF
    An important problem for many kinds of knowledge systems is their initial set-up. It is difficult to choose the right information to include in such systems, and the right information is also a prerequisite for maximizing the uptake and relevance. To tackle this problem, most developers adopt heavyweight solutions and rely on a faithful continuous interaction with users to create and improve content. In this paper, we explore the use of an automatic, lightweight ontology-based solution to the bootstrapping problem, in which domain-describing ontologies are analysed to uncover significant yet implicit relationships between instances. We illustrate the approach by using such an analysis to provide content automatically for the initial set-up of an organizational memory

    Using Neural Networks for Relation Extraction from Biomedical Literature

    Full text link
    Using different sources of information to support automated extracting of relations between biomedical concepts contributes to the development of our understanding of biological systems. The primary comprehensive source of these relations is biomedical literature. Several relation extraction approaches have been proposed to identify relations between concepts in biomedical literature, namely, using neural networks algorithms. The use of multichannel architectures composed of multiple data representations, as in deep neural networks, is leading to state-of-the-art results. The right combination of data representations can eventually lead us to even higher evaluation scores in relation extraction tasks. Thus, biomedical ontologies play a fundamental role by providing semantic and ancestry information about an entity. The incorporation of biomedical ontologies has already been proved to enhance previous state-of-the-art results.Comment: Artificial Neural Networks book (Springer) - Chapter 1

    Requirements for Information Extraction for Knowledge Management

    Get PDF
    Knowledge Management (KM) systems inherently suffer from the knowledge acquisition bottleneck - the difficulty of modeling and formalizing knowledge relevant for specific domains. A potential solution to this problem is Information Extraction (IE) technology. However, IE was originally developed for database population and there is a mismatch between what is required to successfully perform KM and what current IE technology provides. In this paper we begin to address this issue by outlining requirements for IE based KM

    Web based knowledge extraction and consolidation for automatic ontology instantiation

    Get PDF
    The Web is probably the largest and richest information repository available today. Search engines are the common access routes to this valuable source. However, the role of these search engines is often limited to the retrieval of lists of potentially relevant documents. The burden of analysing the returned documents and identifying the knowledge of interest is therefore left to the user. The Artequakt system aims to deploy natural language tools to automatically ex-tract and consolidate knowledge from web documents and instantiate a given ontology, which dictates the type and form of knowledge to extract. Artequakt focuses on the domain of artists, and uses the harvested knowledge to gen-erate tailored biographies. This paper describes the latest developments of the system and discusses the problem of knowledge consolidation

    Generating and visualizing a soccer knowledge base

    Get PDF
    This demo abstract describes the SmartWeb Ontology-based Information Extraction System (SOBIE). A key feature of SOBIE is that all information is extracted and stored with respect to the SmartWeb ontology. In this way, other components of the systems, which use the same ontology, can access this information in a straightforward way. We will show how information extracted by SOBIE is visualized within its original context, thus enhancing the browsing experience of the end user
    • …
    corecore