Search CORE

18,976 research outputs found

KnowText: Auto-generated Knowledge Graphs for custom domain applications

Author: Bozic Bojan
Matthews Tamara
Sasikumar Jayadeep Kumar
Publication venue: Technological University Dublin
Publication date: 01/12/2021
Field of study

While industrial Knowledge Graphs enable information extraction from massive data volumes creating the backbone of the Semantic Web, the specialised, custom designed knowledge graphs focused on enterprise specific information are an emerging trend. We present “KnowText”, an application that performs automatic generation of custom Knowledge Graphs from unstructured text and enables fast information extraction based on graph visualisation and free text query methods designed for non-specialist users. An OWL ontology automatically extracted from text is linked to the knowledge graph and used as a knowledge base. A basic ontological schema is provided including 16 Classes and Data type Properties. The extracted facts and the OWL ontology can be downloaded and further refined. KnowText is designed for applications in business (CRM, HR, banking). Custom KG can serve for locally managing existing data, often stored as “sensitive” information or proprietary accounts, which are not on open web access. KnowText deploys a custom KG from a collection of text documents and enable fast information extraction based on its graph based visualisation and text based query methods

Arrow@TUDublin

Autonomous Consolidation of Heterogeneous Record-Structured HTML Data in Chameleon

Author: Chouvarine Philippe
Publication venue: Scholars Junction
Publication date: 30/08/2004
Field of study

While progress has been made in querying digital information contained in XML and HTML documents, success in retrieving information from the so called hidden Web (data behind Web forms) has been modest. There has been a nascent trend of developing autonomous tools for extracting information from the hidden Web. Automatic tools for ontology generation, wrapper generation, Weborm querying, response gathering, etc., have been reported in recent research. This thesis presents a system called Chameleon for automatic querying of and response gathering from the hidden Web. The approach to response gathering is based on automatic table structure identification, since most information repositories of the hidden Web are structured databases, and so the information returned in response to a query will have regularities. Information extraction from the identified record structures is performed based on domain knowledge corresponding to the domain specified in a query. So called domain plug-ins are used to make the dynamically generated wrappers domain-specific, rather than conventionally used document-specific

Mississippi State University Libraries ETD database

Scholars Junction - Mississippi State University Institutional Repository

Autonomous Consolidation of Heterogeneous Record-Structured HTML Data in Chameleon

Author: Chouvarine Philippe
Publication venue: Scholars Junction
Publication date: 07/05/2005
Field of study

Scholars Junction - Mississippi State University Institutional Repository

Turning Text into Research Networks: Information Retrieval and Computational Ontologies in the Creation of Scientific Databases

Author: Alexandre Leopoldo Gonçalves
Andrey Rzhetsky
Flávio Ceci
Ricardo Pietrobon
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

BACKGROUND: Web-based, free-text documents on science and technology have been increasing growing on the web. However, most of these documents are not immediately processable by computers slowing down the acquisition of useful information. Computational ontologies might represent a possible solution by enabling semantically machine readable data sets. But, the process of ontology creation, instantiation and maintenance is still based on manual methodologies and thus time and cost intensive. METHOD: We focused on a large corpus containing information on researchers, research fields, and institutions. We based our strategy on traditional entity recognition, social computing and correlation. We devised a semi automatic approach for the recognition, correlation and extraction of named entities and relations from textual documents which are then used to create, instantiate, and maintain an ontology. RESULTS: We present a prototype demonstrating the applicability of the proposed strategy, along with a case study describing how direct and indirect relations can be extracted from academic and professional activities registered in a database of curriculum vitae in free-text format. We present evidence that this system can identify entities to assist in the process of knowledge extraction and representation to support ontology maintenance. We also demonstrate the extraction of relationships among ontology classes and their instances. CONCLUSION: We have demonstrated that our system can be used for the conversion of research information in free text format into database with a semantic structure. Future studies should test this system using the growing number of free-text information available at the institutional and national levels

Public Library of Science (PLOS)

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

PubMed Central

RIUNI Institucional Repository

FigShare

Automatic extraction of knowledge from web documents

Author: Alani Harith
Hall Wendy
Kim Sanghee
Lewis Paul H.
Millard David E.
Shadbolt Nigel R.
Weal Mark J.
Publication venue
Publication date: 01/01/2003
Field of study

A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology. The ontology represents the type and form of knowledge to extract. This knowledge is then used to generate tailored biographies. The information extraction process of Artequakt is detailed and evaluated in this paper

CiteSeerX

Southampton (e-Prints Soton)

Open Research Online (The Open University)

Web based knowledge extraction and consolidation for automatic ontology instantiation

Author: Alani Harith
Hall Wendy
Kim Sanghee
Lewis Paul H.
Millard David E.
Shadbolt Nigel
Weal Mark J.
Publication venue
Publication date: 01/01/2003
Field of study

The Web is probably the largest and richest information repository available today. Search engines are the common access routes to this valuable source. However, the role of these search engines is often limited to the retrieval of lists of potentially relevant documents. The burden of analysing the returned documents and identifying the knowledge of interest is therefore left to the user. The Artequakt system aims to deploy natural language tools to automatically ex-tract and consolidate knowledge from web documents and instantiate a given ontology, which dictates the type and form of knowledge to extract. Artequakt focuses on the domain of artists, and uses the harvested knowledge to gen-erate tailored biographies. This paper describes the latest developments of the system and discusses the problem of knowledge consolidation

CiteSeerX

Southampton (e-Prints Soton)

Open Research Online (The Open University)

Requirements for Information Extraction for Knowledge Management

Author: Cimiano Philipp
Ciravegna Fabio
Domingue John
Handschuh Siegfried
Lavelli Alberto
Staab Steffen
Stevenson Mark
Publication venue
Publication date: 01/01/2003
Field of study

Knowledge Management (KM) systems inherently suffer from the knowledge acquisition bottleneck - the difficulty of modeling and formalizing knowledge relevant for specific domains. A potential solution to this problem is Information Extraction (IE) technology. However, IE was originally developed for database population and there is a mismatch between what is required to successfully perform KM and what current IE technology provides. In this paper we begin to address this issue by outlining requirements for IE based KM

Archivio della ricerca - Fondazione Bruno Kessler

Open Research Online (The Open University)

Artequakt: Generating tailored biographies from automatically annotated fragments from the web

Author: Alani Harith
Hall Wendy
Kim Sanghee
Lewis Paul
Millard David
Shadbolt Nigel
Weal Mark
Publication venue
Publication date: 01/01/2002
Field of study

The Artequakt project seeks to automatically generate narrativebiographies of artists from knowledge that has been extracted from the Web and maintained in a knowledge base. An overview of the system architecture is presented here and the three key components of that architecture are explained in detail, namely knowledge extraction, information management and biography construction. Conclusions are drawn from the initial experiences of the project and future progress is detailed

Southampton (e-Prints Soton)

Open Research Online (The Open University)