240 research outputs found
CFO: A Framework for Building Production NLP Systems
This paper introduces a novel orchestration framework, called CFO
(COMPUTATION FLOW ORCHESTRATOR), for building, experimenting with, and
deploying interactive NLP (Natural Language Processing) and IR (Information
Retrieval) systems to production environments. We then demonstrate a question
answering system built using this framework which incorporates state-of-the-art
BERT based MRC (Machine Reading Comprehension) with IR components to enable
end-to-end answer retrieval. Results from the demo system are shown to be high
quality in both academic and industry domain specific settings. Finally, we
discuss best practices when (pre-)training BERT based MRC models for production
systems.Comment: http://ibm.biz/cfo_framewor
TeXTracT: a Web-based Tool for Building NLP-enabled Applications
Over the last few years, the software industry has showed an increasing interest for applications with Natural Language Processing (NLP) capabilities. Several cloud-based solutions have emerged with the purpose of simplifying and streamlining the integration of NLP techniques via Web services. These NLP techniques cover tasks such as language detection, entity recognition, sentiment analysis, classification, among others. However, the services provided are not always as extensible and configurable as a developer may want, preventing their use in industry-grade developments and limiting their adoption in specialized domains (e.g., for analyzing technical documentation). In this context, we have developed a tool called TeXTracT that is designed to be composable, extensible, configurable and accessible. In our tool, NLP techniques can be accessed independently and orchestrated in a pipeline via RESTful Web services. Moreover, the architecture supports the setup and deployment of NLP techniques on demand. The NLP infrastructure is built upon the UIMA framework, which defines communication protocols and uniform service interfaces for text analysis modules. TeXTracT has been evaluated in two case-studies to assess its pros and cons.Sociedad Argentina de Informática e Investigación Operativa (SADIO
Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference
No abstract available
TeXTracT: a Web-based Tool for Building NLP-enabled Applications
Over the last few years, the software industry has showed an increasing interest for applications with Natural Language Processing (NLP) capabilities. Several cloud-based solutions have emerged with the purpose of simplifying and streamlining the integration of NLP techniques via Web services. These NLP techniques cover tasks such as language detection, entity recognition, sentiment analysis, classification, among others. However, the services provided are not always as extensible and configurable as a developer may want, preventing their use in industry-grade developments and limiting their adoption in specialized domains (e.g., for analyzing technical documentation). In this context, we have developed a tool called TeXTracT that is designed to be composable, extensible, configurable and accessible. In our tool, NLP techniques can be accessed independently and orchestrated in a pipeline via RESTful Web services. Moreover, the architecture supports the setup and deployment of NLP techniques on demand. The NLP infrastructure is built upon the UIMA framework, which defines communication protocols and uniform service interfaces for text analysis modules. TeXTracT has been evaluated in two case-studies to assess its pros and cons.Sociedad Argentina de Informática e Investigación Operativa (SADIO
Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference
No abstract available
NOBLE - Flexible concept recognition for large-scale biomedical natural language processing
Background: Natural language processing (NLP) applications are increasingly important in biomedical data analysis, knowledge engineering, and decision support. Concept recognition is an important component task for NLP pipelines, and can be either general-purpose or domain-specific. We describe a novel, flexible, and general-purpose concept recognition component for NLP pipelines, and compare its speed and accuracy against five commonly used alternatives on both a biological and clinical corpus. NOBLE Coder implements a general algorithm for matching terms to concepts from an arbitrary vocabulary set. The system's matching options can be configured individually or in combination to yield specific system behavior for a variety of NLP tasks. The software is open source, freely available, and easily integrated into UIMA or GATE. We benchmarked speed and accuracy of the system against the CRAFT and ShARe corpora as reference standards and compared it to MMTx, MGrep, Concept Mapper, cTAKES Dictionary Lookup Annotator, and cTAKES Fast Dictionary Lookup Annotator. Results: We describe key advantages of the NOBLE Coder system and associated tools, including its greedy algorithm, configurable matching strategies, and multiple terminology input formats. These features provide unique functionality when compared with existing alternatives, including state-of-the-art systems. On two benchmarking tasks, NOBLE's performance exceeded commonly used alternatives, performing almost as well as the most advanced systems. Error analysis revealed differences in error profiles among systems. Conclusion: NOBLE Coder is comparable to other widely used concept recognition systems in terms of accuracy and speed. Advantages of NOBLE Coder include its interactive terminology builder tool, ease of configuration, and adaptability to various domains and tasks. NOBLE provides a term-to-concept matching system suitable for general concept recognition in biomedical NLP pipelines
Jack the Reader - A Machine Reading Framework
Many Machine Reading and Natural Language Understanding tasks require reading
supporting text in order to answer questions. For example, in Question
Answering, the supporting text can be newswire or Wikipedia articles; in
Natural Language Inference, premises can be seen as the supporting text and
hypotheses as questions. Providing a set of useful primitives operating in a
single framework of related tasks would allow for expressive modelling, and
easier model comparison and replication. To that end, we present Jack the
Reader (Jack), a framework for Machine Reading that allows for quick model
prototyping by component reuse, evaluation of new models on existing datasets
as well as integrating new datasets and applying them on a growing set of
implemented baseline models. Jack is currently supporting (but not limited to)
three tasks: Question Answering, Natural Language Inference, and Link
Prediction. It is developed with the aim of increasing research efficiency and
code reuse.Comment: Proceedings of the Annual Meeting of the Association for
Computational Linguistics (ACL 2018), System Demonstration
- …