Article thumbnail
Location of Repository

Intelligent Content Based Title and Author Name Extraction from Formatted Documents

By Eric Berkowitz, Mohamed Reda Elkhadiri, Tim Sahouri and Michel Abraham

Abstract

This paper describes the development of algorithms for extracting the title and the names of the authors from documents available on the World Wide Web. In this paper we describe several algorithms for doing so in a manner designed not to rely on specific stylistic dictates of any document formatting standard. Rather, they are designed to rely on a combination of overt and subtle cues that form a generalized, common standard for placing this information in a document and its easy extraction by readers

Topics: Language, Archives
Publisher: Omnipress
Year: 2004
OAI identifier: oai:cogprints.org:3663

Suggested articles

Citations

  1. (2000). Automated Digital Libraries: How Effectively Can Computers Be Used for the Skilled Tasks
  2. (2002). Open Citation Linking: The Way Forward” D-Lib Magazine,

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.