Skip to main content
Article thumbnail
Location of Repository

The SystemT IDE: An Integrated Development Environment for Information Extraction Rules

By Laura Chiticariu, Vivian Chu, Sajib Dasgupta, Thilo W. Goetz, Howard Ho, Rajasekar Krishnamurthy Alex, Sriram Raghavan, Frederick R. Reiss, Shivakumar Vaithyanathan and Huaiyu Zhu

Abstract

Information Extraction (IE) — the problem of extracting structured information from unstructured text — has become the key enabler for many enterprise applications such as semantic search, business analytics and regulatory compliance. While rule-based IE systems are widely used in practice due to their well-known“explainability,”developing high-quality information extraction rules is known to be a labor-intensive and time-consuming iterative process. Our demonstration showcases SystemT IDE, the integrated development environment for SystemT, a state-of-the-art rulebased IE system from IBM Research that has been successfully embedded in multiple IBM enterprise products. SystemT IDE facilitates the development, test and analysis of high-quality IE rules by means of sophisticated techniques, ranging from data management to machine learning. We show how to build high-quality IE annotators using a suite of tools provided by SystemT IDE, including computing data provenance, learning basic features such as regular expressions and dictionaries, and automatically refining rules based on labeled examples

Topics: H.2.4 [Systems, Textual Databases, I.2.7 [Natural Language Processing, Text Analysis General Terms Algorithms, Design, Human Factors Keywords Information Extraction, AQL, SystemT, Provenance, Pattern Discovery, Rule Learning
Year: 2013
OAI identifier: oai:CiteSeerX.psu:10.1.1.352.8366
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.hlt.utdallas.edu/~s... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.