1,037 research outputs found
Software Infrastructure for Natural Language Processing
We classify and review current approaches to software infrastructure for
research, development and delivery of NLP systems. The task is motivated by a
discussion of current trends in the field of NLP and Language Engineering. We
describe a system called GATE (a General Architecture for Text Engineering)
that provides a software infrastructure on top of which heterogeneous NLP
processing modules may be evaluated and refined individually, or may be
combined into larger application systems. GATE aims to support both researchers
and developers working on component technologies (e.g. parsing, tagging,
morphological analysis) and those working on developing end-user applications
(e.g. information extraction, text summarisation, document generation, machine
translation, and second language learning). GATE promotes reuse of component
technology, permits specialisation and collaboration in large-scale projects,
and allows for the comparison and evaluation of alternative technologies. The
first release of GATE is now available - see
http://www.dcs.shef.ac.uk/research/groups/nlp/gate/Comment: LaTeX, uses aclap.sty, 8 page
Supporting text mining for e-Science: the challenges for Grid-enabled natural language processing
Over the last few years, language technology has moved rapidly from 'applied research' to 'engineering', and from small-scale to large-scale engineering. Applications such as advanced text mining systems are feasible, but very resource-intensive, while research seeking to address the underlying language processing questions faces very real practical and methodological limitations. The e-Science vision, and the creation of the e-Science Grid, promises the level of integrated large-scale technological support required to sustain this important and successful new technology area. In this paper, we discuss the foundations for the deployment of text mining and other language technology on the Grid - the protocols and tools required to build distributed large-scale language technology systems, meeting the needs of users, application builders and researchers
Special Libraries, Spring 1995
Volume 86, Issue 2https://scholarworks.sjsu.edu/sla_sl_1995/1001/thumbnail.jp
Compilation and Exploitation of Parallel Corpora
With more and more text being available in electronic form, it is becoming relatively easy to obtain digital texts together with their translations. The paper presents the processing steps necessary to compile such texts into parallel corpora, an extremely useful language resource. Parallel corpora can be used as a translation aid for second-language learners, for translators and lexicographers, or as a data-source for various language technology tools. We present our work in this direction, which is characterised by the use of open standards for text annotation, the use of publicly available third-party tools and wide availability of the produced resources. Explained is the corpus annotation chain involving normalisation, tokenisation, segmentation, alignment, word-class syntactic tagging, and lemmatisation. Two exploitation results over our annotated corpora are also presented, namely aWeb concordancer and the extraction of bi-lingual lexica
Processing Structured Hypermedia : A Matter of Style
With the introduction of the World Wide Web in the early nineties, hypermedia has become the uniform interface to the wide variety of information sources available over the Internet. The full potential of the Web, however, can only be realized by building on the strengths of its underlying research fields. This book describes the areas of hypertext, multimedia, electronic publishing and the World Wide Web and points out fundamental similarities and differences in approaches towards the processing of information. It gives an overview of the dominant models and tools developed in these fields and describes the key interrelationships and mutual incompatibilities. In addition to a formal specification of a selection of these models, the book discusses the impact of the models described on the software architectures that have been developed for processing hypermedia documents. Two example hypermedia architectures are described in more detail: the DejaVu object-oriented hypermedia framework, developed at the VU, and CWI's Berlage environment for time-based hypermedia document transformations
- …