A novel fact extraction task is defined to fill a gap between current information retrieval and information extraction technologies. It is shown that it is possible to extract useful partially structured facts about different kinds of entities in a broad domain, i.e. all kinds of places depicted in tourist images. Importantly the approach does not rely on existing linguistic resources (gazetteers, taggers, parsers, etc.) and it ported easily and cheaply between two very different languages (English and Latvian). Previous fact extraction from the web has focused on the extraction of structured data, e.g. (Building-LocatedIn-Town). In contrast we extract richer and more interesting facts, such as a fact explaining why a building was built. Enough structure is maintained to facilitate subsequent processing of the information. For example, this partial structure enables straightforward template-based text generation. We report positive results for the correctness and interest of English and Latvian facts and for the utility of the extracted facts in enhancing image captions

Jones, Gareth J.F.

Kelly, Liadh

Salway, Andrew

Skadiņa, Inguna

English

Andrew Salway

Liadh Kelly

Inguna Skadiņa

Gareth J. F. Jones

Crossref

Portable Extraction of Partially Structured Facts from the Web

Name not available

Portable extraction of partially structured facts from the web

Irish Universities

Abstract. A novel fact extraction task is defined to fill a gap between current information retrieval and information extraction technologies. It is shown that it is possible to extract useful partially structured facts about different kinds of entities in a broad domain, i.e. all kinds of places depicted in tourist images. Importantly the approach does not rely on existing linguistic resources (gazetteers, taggers, parsers, etc.) and it ported easily and cheaply between two very different languages (English and Latvian). Previous fact extraction from the web has focused on the extraction of structured data, e.g. (Building-LocatedIn-Town). In contrast we extract richer and more interesting facts, such as a fact explaining why a building was built. Enough structure is maintained to facilitate subsequent processing of the information. For example, the partial structure enables straightforward template-based text generation. We report positive results for the correctness and interest of English and Latvian facts and for the utility of the extracted facts in enhancing image captions

Gareth Jones

CiteSeerX

DCU Online Research Access Service

An Exploration of the Principles Underlying Redundancy-based Factoid Question Answering.

Approaches to passage retrieval in full text information systems.

Describing the Where – improving image annotation and search through geography. In: First Intl. Workshop on Metadata Mining for Image Understanding

et al: Open Information Extraction from the Web.

et al.: Multi-document Summarization by Sentence Extraction. In:

et al.: Web Question Answering: Is More Always Better? In:

Information Extraction. Foundations and Trends in Databases.

Modern Information Retrieval.

Organizing and Searching the WorldWideWeb of Facts - Step One: the One-Million Fact Extraction Challenge. In:

The Tradeoffs Between Open and Traditional Relation Extraction.

http://www.bbrel.co.uk/pdfs/PortableExtraction.pdf

Portable extraction of partially structured facts from the web

Abstract

Similar works

Full text

Available Versions

Crossref

Name not available

Irish Universities

CiteSeerX

DCU Online Research Access Service