Search CORE

5,793 research outputs found

Building trainable taggers in a web-based, UIMA-supported NLP workbench

Author: Ananiadou S
Kolluru B
Rak R
Publication venue
Publication date: 01/01/2012
Field of study

Argo is a web-based NLP and text mining workbench with a convenient graphical user interface for designing and executing processing workflows of various complexity. The workbench is intended for specialists and nontechnical audiences alike, and provides the ever expanding library of analytics compliant with the Unstructured Information Management Architecture, a widely adopted interoperability framework. We explore the flexibility of this framework by demonstrating workflows involving three processing components capable of performing self-contained machine learning-based tagging. The three components are responsible for the three distinct tasks of 1) generating observations or features, 2) training a statistical model based on the generated features, and 3) tagging unlabelled data with the model. The learning and tagging components are based on an implementation of conditional random fields (CRF); whereas the feature generation component is an analytic capable of extending basic token information to a comprehensive set of features. Users define the features of their choice directly from Argo’s graphical interface, without resorting to programming (a commonly used approach to feature engineering). The experimental results performed on two tagging tasks, chunking and named entity recognition, showed that a tagger with a generic set of features built in Argo is capable of competing with taskspecific solutions.

CiteSeerX

The University of Manchester - Institutional Repository

Collaborative Development and Evaluation of Text-processing Workflows in a UIMA-supported Web-based Workbench

Author: Ananiadou S
Rak R
Rowley A
Publication venue
Publication date: 01/05/2012
Field of study

Challenges in creating comprehensive text-processing worklows include a lack of the interoperability of individual components coming from different providers and/or a requirement imposed on the end users to know programming techniques to compose such workflows. In this paper we demonstrate Argo, a web-based system that addresses these issues in several ways. It supports the widely adopted Unstructured Information Management Architecture (UIMA), which handles the problem of interoperability; it provides a web browser-based interface for developing workflows by drawing diagrams composed of a selection of available processing components; and it provides novel user-interactive analytics such as the annotation editor which constitutes a bridge between automatic processing and manual correction. These features extend the target audience of Argo to users with a limited or no technical background. Here, we focus specifically on the construction of advanced workflows, involving multiple branching and merging points, to facilitate various comparative evalutions. Together with the use of user-collaboration capabilities supported in Argo, we demonstrate several use cases including visual inspections, comparisions of multiple processing segments or complete solutions against a reference standard, inter-annotator agreement, and shared task mass evaluations. Ultimetely, Argo emerges as a one-stop workbench for defining, processing, editing and evaluating text processing tasks

CiteSeerX

The University of Manchester - Institutional Repository

Managing contextual information in semantically-driven temporal information systems

Author: Isiaq SO
Osman T
Peytchev E
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Context-aware (CA) systems have demonstrated the provision of a robust solution for personalized information delivery in the current content-rich and dynamic information age we live in. They allow software agents to autonomously interact with users by modeling the user’s environment (e.g. profile, location, relevant public information etc.) as dynamically-evolving and interoperable contexts. There is a flurry of research activities in a wide spectrum at context-aware research areas such as managing the user’s profile, context acquisition from external environments, context storage, context representation and interpretation, context service delivery and matching of context attributes to users‘ queries etc. We propose SDCAS, a Semantic-Driven Context Aware System that facilitates public services recommendation to users at temporal location. This paper focuses on information management and service recommendation using semantic technologies, taking into account the challenges of relationship complexity in temporal and contextual information

Crossref

Nottingham Trent Institutional Repository (IRep)

Solent University Research Portal

Recommended from our members

Automated recognition and post-coordination of complex clinical terms

Author: Gooch P.
Roudsari A.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

One of the key tasks in integrating guideline-based decision support systems with the electronic patient record is the mapping of clinical terms contained in both guidelines and patient notes to a common, controlled terminology. However, a vocabulary of pre-coordinated terms cannot cover every possible variation - clinical terms are often highly compositional and complex. We present a rule-based approach for automated recognition and post-coordination of clinical terms using minimal, morpheme-based thesauri, neoclassical combining forms and part-of-speech analysis. The process integrates MetaMap with the open-source GATE framework

City Research Online

The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme

Author: Heiden Serge
Publication venue: Institute of Digital Enhancement of Cognitive Processing, Waseda University
Publication date: 01/11/2010
Field of study

International audienceAbstract. This paper describes the rationale and design of an XML-TEI encoded corpora compatible analysis platform for text mining called TXM.The design of this platform is based on a synthesis of the best available algorithms in existing textometry software. It also relies on identifying the most relevant open-source technologies for processing textual resources encoded in XML and Unicode, for efficient full-text search on annotated corpora and for statistical data analysis.The architecture is based on a Java toolbox articulating a full-text search engine component with a statistical computing environment and with an original import environment able to process a large variety of data sources, including XML-TEI, and to apply embedded NLP tools to them.The platform is distributed as an open-source Eclipse project for developers and in the form of two demonstrator applications for end users: a standard application to install on a workstation and an online web application framework

HAL-ENS-LYON

Waseda University Repository

HAL