Skip to main content
Article thumbnail
Location of Repository

Document analysis at DFKI. Pt. 2 Information extraction

By S. Baumann, M. Malburg, H.G. Hein, R. Hoch, T. Kieninger, N. Kuhn, Kaiserslautern (Germany) Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH (DFKI) and Saarbruecken (Germany) Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH (DFKI)


Document analysis is responsible for an essential progress in office automation. This paper is part of an overview about the combined research efforts in document analysis at DFKI. Common to all document analysis projects is the global goal of providing a high level electronic representation of documents in terms of iconic, structural, textual, and semantic information. These symbolic document descriptions enable an 'intelligent' access to a document database. Currently there are three ongoing document analysis projects at DFKI: INCA, OMEGA, and PASCAL2000/PASCAL+. Although the projects pursue different goals in different application domains, they all share the same problems which have to be resolved with similar techniques. For that reason the activities in these projects are bundled to avoid redundant work. At DFKI we have divided the problem of document analysis into two main tasks, text recognition and information extraction, which themselves are divided into a set of subtasks. In a series of three research reports the work of the document analysis and office automation department at DFKI is presented. The first report discusses the problem of text recognition, the second that of information extraction. In a third report we describe our concept for a specialized knowledge representation language for document analysis. The report in hand describes the activities dealing with the information extraction task. Information extraction covers the phases text analysis, message type identification and file integration. (orig.)Available from TIB Hannover: RR 1812(95-03) / FIZ - Fachinformationszzentrum Karlsruhe / TIB - Technische InformationsbibliothekSIGLEBundesministerium fuer Bildung, Wissenschaft, Forschung und Technologie, Bonn (Germany)DEGerman

Topics: 05B - Information science, librarianship, 09H - Computer software, programming, DOCUMENT ANALYSIS, INFORMATION EXTRACTION, TEXT ANALYSIS, MESSAGE TYPE IDENTIFICATION, FILE INTEGRATION
Year: 1995
OAI identifier:
Provided by: OpenGrey Repository
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.