Search CORE

240 research outputs found

CFO: A Framework for Building Production NLP Systems

Author: Castelli Vittorio
Chakravarti Rishav
Ferritto Anthony
Florian Radu
Glass Michael
Murdock J. William
Pan Lin
Pendus Cezar
Roukos Salim
Sakrajda Andrzej
Sil Avirup
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

This paper introduces a novel orchestration framework, called CFO (COMPUTATION FLOW ORCHESTRATOR), for building, experimenting with, and deploying interactive NLP (Natural Language Processing) and IR (Information Retrieval) systems to production environments. We then demonstrate a question answering system built using this framework which incorporates state-of-the-art BERT based MRC (Machine Reading Comprehension) with IR components to enable end-to-end answer retrieval. Results from the demo system are shown to be high quality in both academic and industry domain specific settings. Finally, we discuss best practices when (pre-)training BERT based MRC models for production systems.Comment: http://ibm.biz/cfo_framewor

arXiv.org e-Print Archive

Crossref

SODA: A Service Oriented Data Acquisition Framework

Author: Diosteanu A
Stellato A
Turbati A
Publication venue: IGI Global
Publication date: 01/01/2012
Field of study

ART

TeXTracT: a Web-based Tool for Building NLP-enabled Applications

Author: Díaz Pace J. Andrés
Marcos Claudio
Rago Alejandro
Ramos Facundo M.
Vélez Juan I.
Publication venue
Publication date: 01/09/2016
Field of study

Over the last few years, the software industry has showed an increasing interest for applications with Natural Language Processing (NLP) capabilities. Several cloud-based solutions have emerged with the purpose of simplifying and streamlining the integration of NLP techniques via Web services. These NLP techniques cover tasks such as language detection, entity recognition, sentiment analysis, classification, among others. However, the services provided are not always as extensible and configurable as a developer may want, preventing their use in industry-grade developments and limiting their adoption in specialized domains (e.g., for analyzing technical documentation). In this context, we have developed a tool called TeXTracT that is designed to be composable, extensible, configurable and accessible. In our tool, NLP techniques can be accessed independently and orchestrated in a pipeline via RESTful Web services. Moreover, the architecture supports the setup and deployment of NLP techniques on demand. The NLP infrastructure is built upon the UIMA framework, which defines communication protocols and uniform service interfaces for text analysis modules. TeXTracT has been evaluated in two case-studies to assess its pros and cons.Sociedad Argentina de Informática e Investigación Operativa (SADIO

Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

Author
Publication venue: European Language Resources Association
Publication date: 01/01/2016
Field of study

No abstract available

Enlighten

TeXTracT: a Web-based Tool for Building NLP-enabled Applications

Author: Díaz Pace J. Andrés
Marcos Claudio
Rago Alejandro
Ramos Facundo M.
Vélez Juan I.
Publication venue
Publication date: 02/12/2016
Field of study

Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

Author
Publication venue: European Language Resources Association
Publication date: 01/01/2016
Field of study

No abstract available

NOBLE - Flexible concept recognition for large-scale biomedical natural language processing

Author: Chavan G
Corrigan J
Jacobson RS
Legowski E
Mitchell K
Tseytlin E
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/01/2016
Field of study

Background: Natural language processing (NLP) applications are increasingly important in biomedical data analysis, knowledge engineering, and decision support. Concept recognition is an important component task for NLP pipelines, and can be either general-purpose or domain-specific. We describe a novel, flexible, and general-purpose concept recognition component for NLP pipelines, and compare its speed and accuracy against five commonly used alternatives on both a biological and clinical corpus. NOBLE Coder implements a general algorithm for matching terms to concepts from an arbitrary vocabulary set. The system's matching options can be configured individually or in combination to yield specific system behavior for a variety of NLP tasks. The software is open source, freely available, and easily integrated into UIMA or GATE. We benchmarked speed and accuracy of the system against the CRAFT and ShARe corpora as reference standards and compared it to MMTx, MGrep, Concept Mapper, cTAKES Dictionary Lookup Annotator, and cTAKES Fast Dictionary Lookup Annotator. Results: We describe key advantages of the NOBLE Coder system and associated tools, including its greedy algorithm, configurable matching strategies, and multiple terminology input formats. These features provide unique functionality when compared with existing alternatives, including state-of-the-art systems. On two benchmarking tasks, NOBLE's performance exceeded commonly used alternatives, performing almost as well as the most advanced systems. Error analysis revealed differences in error profiles among systems. Conclusion: NOBLE Coder is comparable to other widely used concept recognition systems in terms of accuracy and speed. Advantages of NOBLE Coder include its interactive terminology builder tool, ease of configuration, and adaptability to various domains and tasks. NOBLE provides a term-to-concept matching system suitable for general concept recognition in biomedical NLP pipelines

Crossref

Springer - Publisher Connector

PubMed Central

D-Scholarship@Pitt

Jack the Reader - A Machine Reading Framework

Author: Augenstein Isabelle
Bošnjak Matko
Demeester Thomas
Dettmers Tim
Minervini Pasquale
Mitchell Jeff
Riedel Sebastian
Rocktäschel Tim
Stenetorp Pontus
Weissenborn Dirk
Welbl Johannes
Publication venue
Publication date: 01/01/2018
Field of study

Many Machine Reading and Natural Language Understanding tasks require reading supporting text in order to answer questions. For example, in Question Answering, the supporting text can be newswire or Wikipedia articles; in Natural Language Inference, premises can be seen as the supporting text and hypotheses as questions. Providing a set of useful primitives operating in a single framework of related tasks would allow for expressive modelling, and easier model comparison and replication. To that end, we present Jack the Reader (Jack), a framework for Machine Reading that allows for quick model prototyping by component reuse, evaluation of new models on existing datasets as well as integrating new datasets and applying them on a growing set of implemented baseline models. Jack is currently supporting (but not limited to) three tasks: Question Answering, Natural Language Inference, and Link Prediction. It is developed with the aim of increasing research efficiency and code reuse.Comment: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL 2018), System Demonstration

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

UCL Discovery

Copenhagen University Research Information System