2 research outputs found

    Building Discerning Knowledge Bases from Multiple Source Documents, with Novel Fact Filtering

    Get PDF
    Information extraction systems that remember only novel information (facts that differ semantically from those previously extracted) can be used to build lean knowledge bases fed from multiple, possibly overlapping sources. In previous research by the authors, natural language processing techniques were used to build a system to extract financial facts from international corporate reports of the Wall Street Journal. We will enhance that system to extract the same types of financial facts from a second source of corporate financial reports: Reuters. The improved system will provide more generality through its ability to extract from multiple sources rather than just one. In addition, it will provide novelty filtering of extracted information, admitting only novel facts into the database, while remembering all sources that a redundant fact came from

    Automatic Extraction and Generation of XML Documents from Financial Reports

    Get PDF
    Web services require XML formatted data. Human translation of business information from the rapidly expanding volume of documents to XML is labor-intensive and impractical. Computer programs can be built to extract domain-specific facts from web documents and convert them into an XML format. With a continual feed of web articles, such a system could be used to maintain an up-to-date XML knowledge base that could power web services for businesses. In this research, we build a system to automatically extract information from electronic international corporate financial reports, and translate this information into XML or XBRL (a well-known XML extension for accounting and financial data)
    corecore