Search CORE

12,120 research outputs found

A topic modeling approach for Web service annotation

Author: Carlos Corrales Juan
Ordonez-Ante Leandro
Verborgh Ruben
Publication venue
Publication date: 01/01/2014
Field of study

Automatic extraction of knowledge from web documents

Author: Alani Harith
Hall Wendy
Kim Sanghee
Lewis Paul H.
Millard David E.
Shadbolt Nigel R.
Weal Mark J.
Publication venue
Publication date: 01/01/2003
Field of study

A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology. The ontology represents the type and form of knowledge to extract. This knowledge is then used to generate tailored biographies. The information extraction process of Artequakt is detailed and evaluated in this paper

CiteSeerX

Southampton (e-Prints Soton)

Open Research Online (The Open University)

A geo-temporal information extraction service for processing descriptive metadata in digital libraries

Author: Borbinha José
Manguinhas H.
Martins Bruno
Siabato Vaca Willington Libardo
Publication venue: E.T.S.I. en Topografía, Geodesia y Cartografía (UPM)
Publication date: 01/01/2009
Field of study

In the context of digital map libraries, resources are usually described according to metadata records that define the relevant subject, location, time-span, format and keywords. On what concerns locations and time-spans, metadata records are often incomplete or they provide information in a way that is not machine-understandable (e.g. textual descriptions). This paper presents techniques for extracting geotemporal information from text, using relatively simple text mining methods that leverage on a Web gazetteer service. The idea is to go from human-made geotemporal referencing (i.e. using place and period names in textual expressions) into geo-spatial coordinates and time-spans. A prototype system, implementing the proposed methods, is described in detail. Experimental results demonstrate the efficiency and accuracy of the proposed approaches

Archivo Digital UPM

The State-of-the-arts in Focused Search

Author: Li Rongmei
Publication venue: University of Twente, Centre for Telematics and Information Technology
Publication date: 01/01/2009
Field of study

The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a user’s topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems

University of Twente Research Information

Neogeography: The Challenge of Channelling Large and Ill-Behaved Data Streams

Author: Habib Mena B.
Keulen Maurice van
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2011
Field of study

Neogeography is the combination of user generated data and experiences with mapping technologies. In this article we present a research project to extract valuable structured information with a geographic component from unstructured user generated text in wikis, forums, or SMSes. The extracted information should be integrated together to form a collective knowledge about certain domain. This structured information can be used further to help users from the same domain who want to get information using simple question answering system. The project intends to help workers communities in developing countries to share their knowledge, providing a simple and cheap way to contribute and get benefit using the available communication technology

Maastricht University Research Portal

University of Twente Research Information

Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective

Author: Luo Yuan
Szolovits Peter
Publication venue
Publication date: 14/11/2018
Field of study

This paper presents a Lisp architecture for a portable NLP system, termed LAPNLP, for processing clinical notes. LAPNLP integrates multiple standard, customized and in-house developed NLP tools. Our system facilitates portability across different institutions and data systems by incorporating an enriched Common Data Model (CDM) to standardize necessary data elements. It utilizes UMLS to perform domain adaptation when integrating generic domain NLP tools. It also features stand-off annotations that are specified by positional reference to the original document. We built an interval tree based search engine to efficiently query and retrieve the stand-off annotations by specifying positional requirements. We also developed a utility to convert an inline annotation format to stand-off annotations to enable the reuse of clinical text datasets with inline annotations. We experimented with our system on several NLP facilitated tasks including computational phenotyping for lymphoma patients and semantic relation extraction for clinical notes. These experiments showcased the broader applicability and utility of LAPNLP.Comment: 6 pages, accepted by IEEE BIBM 2018 as regular pape

arXiv.org e-Print Archive

DSpace@MIT

Crossref