Skip to main content
Article thumbnail
Location of Repository

Open-Domain Attribute-Value Acquisition from Semi-Structured Texts

By Naoki Yoshinaga and Kentaro Torisawa

Abstract

Abstract. This paper proposes an unsupervised method that acquires a set of attribute-value pairs (avps, e.g., ⟨director, W. Wyler⟩) for a given object (e.g., “Ben-Hur”) from semi-structured HTML documents. The objects ’ avps are one of the principal components of domain ontologies. We first acquire class attributes that are used by many web authors to describe the objects ’ avps. Then, we exploit the acquired class attributes to induce patterns for extracting avps from web pages. Experimental results show that, with our method, at least one set of correct avps are acquired for 67.7 % of objects among open-domain class-object pairs whose source documents (web pages) include the objects ’ avps in layouts. Key words: open-domain attribute-value acquisition, semi-structured texts, question answering, faceted search

Year: 2008
OAI identifier: oai:CiteSeerX.psu:10.1.1.134.511
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.jaist.ac.jp/~n-yosh... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.