Search CORE

146,687 research outputs found

Software Infrastructure for Natural Language Processing

Author: Cunningham Hamish
Gaizauskas Robert
Humphreys Kevin
Wilks Yorick
Publication venue
Publication date: 01/01/1997
Field of study

We classify and review current approaches to software infrastructure for research, development and delivery of NLP systems. The task is motivated by a discussion of current trends in the field of NLP and Language Engineering. We describe a system called GATE (a General Architecture for Text Engineering) that provides a software infrastructure on top of which heterogeneous NLP processing modules may be evaluated and refined individually, or may be combined into larger application systems. GATE aims to support both researchers and developers working on component technologies (e.g. parsing, tagging, morphological analysis) and those working on developing end-user applications (e.g. information extraction, text summarisation, document generation, machine translation, and second language learning). GATE promotes reuse of component technology, permits specialisation and collaboration in large-scale projects, and allows for the comparison and evaluation of alternative technologies. The first release of GATE is now available - see http://www.dcs.shef.ac.uk/research/groups/nlp/gate/Comment: LaTeX, uses aclap.sty, 8 page

arXiv.org e-Print Archive

CiteSeerX

GATE -- an Environment to Support Research and Development in Natural Language Engineering

Author: Cunningham Hamish
Gaizauskas Robert
Humphreys Kevin
Rodgers Peter
Wilks Yorick
Publication venue: IEEE Computer Society
Publication date: 01/01/1996
Field of study

We describe a software environment to support research and development in natural language (NL) engineering. This environment -- GATE (General Architecture for Text Engineering) -- aims to advance research in the area of machine processing of natural languages by providing a software infrastructure on top of which heterogeneous NL component modules may be evaluated and refined individually or may be combined into larger application systems. Thus, GATE aims to support both researchers and developers working on component technologies (e.g. parsing, tagging, morphological analysis) and those working on developing end-user applications (e.g. information extraction, text summarisation, document generation, machine translation, and second language learning). GATE will promote reuse of component technology, permit specialisation and collaboration in large-scale projects, and allow for the comparison and evaluation of alternative technologies. The first release of GATE is now available

CiteSeerX

Kent Academic Repository

Ontologies for a Global Language Infrastructure

Author: Buitelaar Paul
Declerck Thierry
Hayashi Yoshihiko
Monachini Monica
Publication venue: Vassar, USA
Publication date
Field of study

Given a situation where human language technologies have been maturing considerably and a rapidly growing range of language data resources being now available, together with natural language processing (NLP) tools/systems, a strong need for a global language infrastructure (GLI) is becoming more and more evident, if one wants to ensure re-usability of the resources. A GLI is essentially an open and web-based software platform on which tailored language services can be efficiently composed, disseminated and consumed. An infrastructure of this sort is also expected to facilitate further development of language data resources and NLP functionalities. The aims of this paper are twofold: (1) to discuss necessity of ontologies for a GLI, and (2) to draw a high-level configuration of the ontologies, which are integrated into a comprehensive language service ontology. To these ends, this paper first explores dimensions of GLI, and then draws a triangular view of a language service, from which necessary ontologies are derived. This paper also examines relevant ongoing international standardization efforts such as LAF, MAF, SynAF, DCR and LMF, and discusses how these frameworks are incorporated into our comprehensive language service ontology. The paper concludes in stressing the need for an international collaboration on the development of a standardized language service ontology

PUblication MAnagement

Social Web Communities

Author: Alani Harith
Staab Steffen
Stumme Gerd
Publication venue
Publication date: 01/01/2008
Field of study

Blogs, Wikis, and Social Bookmark Tools have rapidly emerged onthe Web. The reasons for their immediate success are that people are happy to share information, and that these tools provide an infrastructure for doing so without requiring any specific skills. At the moment, there exists no foundational research for these systems, and they provide only very simple structures for organising knowledge. Individual users create their own structures, but these can currently not be exploited for knowledge sharing. The objective of the seminar was to provide theoretical foundations for upcoming Web 2.0 applications and to investigate further applications that go beyond bookmark- and file-sharing. The main research question can be summarized as follows: How will current and emerging resource sharing systems support users to leverage more knowledge and power from the information they share on Web 2.0 applications? Research areas like Semantic Web, Machine Learning, Information Retrieval, Information Extraction, Social Network Analysis, Natural Language Processing, Library and Information Sciences, and Hypermedia Systems have been working for a while on these questions. In the workshop, researchers from these areas came together to assess the state of the art and to set up a road map describing the next steps towards the next generation of social software

Southampton (e-Prints Soton)

Open Research Online (The Open University)

Social Web Communities

Author: Alani Harith
Staab Steffen
Stumme Gerd
Publication venue
Publication date: 01/01/2008
Field of study

Blogs, Wikis, and Social Bookmark Tools have rapidly emerged on the Web. The reasons for their immediate success are that people are happy to share information, and that these tools provide an infrastructure for doing so without requiring any specific skills. At the moment, there exists no foundational research for these systems, and they provide only very simple structures for organising knowledge. Individual users create their own structures, but these can currently not be exploited for knowledge sharing. The objective of the seminar was to provide theoretical foundations for upcoming Web 2.0 applications and to investigate further applications that go beyond bookmark- and file-sharing. The main research question can be summarized as follows: How will current and emerging resource sharing systems support users to leverage more knowledge and power from the information they share on Web 2.0 applications? Research areas like Semantic Web, Machine Learning, Information Retrieval, Information Extraction, Social Network Analysis, Natural Language Processing, Library and Information Sciences, and Hypermedia Systems have been working for a while on these questions. In the workshop, researchers from these areas came together to assess the state of the art and to set up a road map describing the next steps towards the next generation of social software

Southampton (e-Prints Soton)

The Spanish DELPH-IN grammar

Author: Marimon Felipe Montserrat
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/02/2016
Field of study

In this article we present a Spanish grammar implemented in the Linguistic Knowledge Builder system and grounded in the theoretical framework of Head-driven Phrase Structure Grammar. The grammar is being developed in an international multilingual context, the DELPH-IN Initiative, contributing to an open-source repository of software and linguistic resources for various Natural Language Processing applications. We will show how we have refined and extended a core grammar, derived from the LinGO Grammar Matrix, to achieve a broad-coverage grammar. The Spanish DELPH-IN grammar is the most comprehensive grammar for Spanish deep processing, and it is being deployed in the construction of a treebank for Spanish of 60,000 sentences based in a technical corpus in the framework of the European project METANET4U (Enhancing the European Linguistic Infrastructure, GA 270893GA; http://www.meta-net.eu/projects/METANET4U/.) and a smaller treebank of about 15,000 sentences based in a corpus from the pres

Diposit Digital de la Universitat de Barcelona

Automated Identification of Security-Relevant Configuration Settings Using NLP

Author: Grobauer Bernd
Pretschner Alexander
Stöckle Patrick
Wasserer Theresa
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/09/2022
Field of study

To secure computer infrastructure, we need to configure all security-relevant settings. We need security experts to identify security-relevant settings, but this process is time-consuming and expensive. Our proposed solution uses state-of-the-art natural language processing to classify settings as security-relevant based on their description. Our evaluation shows that our trained classifiers do not perform well enough to replace the human security experts but can help them classify the settings. By publishing our labeled data sets and the code of our trained model, we want to help security experts analyze configuration settings and enable further research in this area.Comment: Peer-reviewed version accepted for publication in the Industry Showcase track at the 37th IEEE/ACM International Conference on Automated Software Engineering (ASE '22), October 10--14, 2022, Rochester, MI, US

arXiv.org e-Print Archive

Extraction of Clinical Information from Clinical Reports: an Application to the Study of Medication Overuse Headaches in Italy.

Author: Bellazzi Riccardo
Cereda Cristina
Gabetta Matteo
Guaschino Elena
Larizza Cristiana
Milani Giusseppe
Rojas Barahona Lina Maria
Sances Grazia
Publication venue: HAL CCSD
Publication date: 10/03/2010
Field of study

International audienceA i2b2-Pavia pilot project has been recently activated at the Headache Centre of the C. Mondino Institute of Neurology, in Pavia, with the aim of investigating Medical Overuse Headaches. The software infrastructure so far implemented automatically extracts and integrates data coming from different sources into a repository purposely designed for multidimensional inspection. A great effort has been devoted to train a Natural Language Processing system able to extract medical concepts from Italian clinical reports

INRIA a CCSD electronic archive server