Search CORE

1,037 research outputs found

Software Infrastructure for Natural Language Processing

Author: Cunningham Hamish
Gaizauskas Robert
Humphreys Kevin
Wilks Yorick
Publication venue
Publication date: 01/01/1997
Field of study

We classify and review current approaches to software infrastructure for research, development and delivery of NLP systems. The task is motivated by a discussion of current trends in the field of NLP and Language Engineering. We describe a system called GATE (a General Architecture for Text Engineering) that provides a software infrastructure on top of which heterogeneous NLP processing modules may be evaluated and refined individually, or may be combined into larger application systems. GATE aims to support both researchers and developers working on component technologies (e.g. parsing, tagging, morphological analysis) and those working on developing end-user applications (e.g. information extraction, text summarisation, document generation, machine translation, and second language learning). GATE promotes reuse of component technology, permits specialisation and collaboration in large-scale projects, and allows for the comparison and evaluation of alternative technologies. The first release of GATE is now available - see http://www.dcs.shef.ac.uk/research/groups/nlp/gate/Comment: LaTeX, uses aclap.sty, 8 page

arXiv.org e-Print Archive

CiteSeerX

Supporting text mining for e-Science: the challenges for Grid-enabled natural language processing

Author: Carroll John
Evans Roger
Klein Ewan
Publication venue
Publication date: 01/01/2005
Field of study

Over the last few years, language technology has moved rapidly from 'applied research' to 'engineering', and from small-scale to large-scale engineering. Applications such as advanced text mining systems are feasible, but very resource-intensive, while research seeking to address the underlying language processing questions faces very real practical and methodological limitations. The e-Science vision, and the creation of the e-Science Grid, promises the level of integrated large-scale technological support required to sustain this important and successful new technology area. In this paper, we discuss the foundations for the deployment of text mining and other language technology on the Grid - the protocols and tools required to build distributed large-scale language technology systems, meeting the needs of users, application builders and researchers

CiteSeerX

University of Brighton Research Portal

Sussex Research Online

Special Libraries, Spring 1995

Author: Special Libraries Association
Publication venue: SJSU ScholarWorks
Publication date: 01/04/1995
Field of study

Volume 86, Issue 2https://scholarworks.sjsu.edu/sla_sl_1995/1001/thumbnail.jp

SJSU ScholarWorks

Compilation and Exploitation of Parallel Corpora

Author: Tomaž Erjavec
Publication venue: 'University of Zagreb - University Computing Centre'
Publication date: 01/01/2003
Field of study

With more and more text being available in electronic form, it is becoming relatively easy to obtain digital texts together with their translations. The paper presents the processing steps necessary to compile such texts into parallel corpora, an extremely useful language resource. Parallel corpora can be used as a translation aid for second-language learners, for translators and lexicographers, or as a data-source for various language technology tools. We present our work in this direction, which is characterised by the use of open standards for text annotation, the use of publicly available third-party tools and wide availability of the produced resources. Explained is the corpus annotation chain involving normalisation, tokenisation, segmentation, alignment, word-class syntactic tagging, and lemmatisation. Two exploitation results over our annotated corpora are also presented, namely aWeb concordancer and the extraction of bi-lingual lexica

Crossref

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Processing Structured Hypermedia : A Matter of Style

Author: Ossenbruggen J.R. (Jacco) van
Publication venue
Publication date: 10/04/2001
Field of study

With the introduction of the World Wide Web in the early nineties, hypermedia has become the uniform interface to the wide variety of information sources available over the Internet. The full potential of the Web, however, can only be realized by building on the strengths of its underlying research fields. This book describes the areas of hypertext, multimedia, electronic publishing and the World Wide Web and points out fundamental similarities and differences in approaches towards the processing of information. It gives an overview of the dominant models and tools developed in these fields and describes the key interrelationships and mutual incompatibilities. In addition to a formal specification of a selection of these models, the book discusses the impact of the models described on the software architectures that have been developed for processing hypermedia documents. Two example hypermedia architectures are described in more detail: the DejaVu object-oriented hypermedia framework, developed at the VU, and CWI's Berlage environment for time-based hypermedia document transformations

CWI's Institutional Repository

Unstructured documents in EDIFACT:A survey of the inclusion of unstructured documents in the EDIFACT standard, with an introduction to CALS

Author: Marchal Benoît
Publication venue
Publication date: 01/01/1994
Field of study

Repository of the University of Namur