2,718 research outputs found
Recovering Grammar Relationships for the Java Language Specification
Grammar convergence is a method that helps discovering relationships between
different grammars of the same language or different language versions. The key
element of the method is the operational, transformation-based representation
of those relationships. Given input grammars for convergence, they are
transformed until they are structurally equal. The transformations are composed
from primitive operators; properties of these operators and the composed chains
provide quantitative and qualitative insight into the relationships between the
grammars at hand. We describe a refined method for grammar convergence, and we
use it in a major study, where we recover the relationships between all the
grammars that occur in the different versions of the Java Language
Specification (JLS). The relationships are represented as grammar
transformation chains that capture all accidental or intended differences
between the JLS grammars. This method is mechanized and driven by nominal and
structural differences between pairs of grammars that are subject to
asymmetric, binary convergence steps. We present the underlying operator suite
for grammar transformation in detail, and we illustrate the suite with many
examples of transformations on the JLS grammars. We also describe the
extraction effort, which was needed to make the JLS grammars amenable to
automated processing. We include substantial metadata about the convergence
process for the JLS so that the effort becomes reproducible and transparent
Towards digital library service integration
Digital Library Service Integration (DLSI) aims to provide a systematic approach in integrating the services and collections of National Science and Digital Library. The National Science and Digital Library collections can share the services among themselves in a totally integrated environinent. Collections as such will require no change to plug into the DLSI architecture. Collections will keep using the services of NSDL in the similar manner as before. These services will in turn pass few parameters to the services of DLSI. With the help of these parameters, wrappers will fetch the details and priority of the users. These wrappers will be using the services of Search and Discovery module, Metadata Management services, and Access Management services. Users will see a totally integrated environment. They will see their digital library system just as before. In addition to that, they will find some extra link anchors on the document. These links serve to provide the supplemental information or arrange the information in the user preferred way. For this matter, the DLSI maintains basic user\u27s information and preferences. Other contributions include incorporating collaborative filtering for customizing large sets of links, and advance lexical analysis tool to identify the objects of interest in a document
Polyglot: Distributed Word Representations for Multilingual NLP
Distributed word representations (word embeddings) have recently contributed
to competitive performance in language modeling and several NLP tasks. In this
work, we train word embeddings for more than 100 languages using their
corresponding Wikipedias. We quantitatively demonstrate the utility of our word
embeddings by using them as the sole features for training a part of speech
tagger for a subset of these languages. We find their performance to be
competitive with near state-of-art methods in English, Danish and Swedish.
Moreover, we investigate the semantic features captured by these embeddings
through the proximity of word groupings. We will release these embeddings
publicly to help researchers in the development and enhancement of multilingual
applications.Comment: 10 pages, 2 figures, Proceedings of Conference on Computational
Natural Language Learning CoNLL'201
GATE -- an Environment to Support Research and Development in Natural Language Engineering
We describe a software environment to support research and development in natural language (NL) engineering. This environment -- GATE (General Architecture for Text Engineering) -- aims to advance research in the area of machine processing of natural languages by providing a software infrastructure on top of which heterogeneous NL component modules may be evaluated and refined individually or may be combined into larger application systems. Thus, GATE aims to support both researchers and developers working on component technologies (e.g. parsing, tagging, morphological analysis) and those working on developing end-user applications (e.g. information extraction, text summarisation, document generation, machine translation, and second language learning). GATE will promote reuse of component technology, permit specialisation and collaboration in large-scale projects, and allow for the comparison and evaluation of alternative technologies. The first release of GATE is now available
Constraint Logic Programming for Natural Language Processing
This paper proposes an evaluation of the adequacy of the constraint logic
programming paradigm for natural language processing. Theoretical aspects of
this question have been discussed in several works. We adopt here a pragmatic
point of view and our argumentation relies on concrete solutions. Using actual
contraints (in the CLP sense) is neither easy nor direct. However, CLP can
improve parsing techniques in several aspects such as concision, control,
efficiency or direct representation of linguistic formalism. This discussion is
illustrated by several examples and the presentation of an HPSG parser.Comment: 15 pages, uuencoded and compressed postscript to appear in
Proceedings of the 5th Int. Workshop on Natural Language Understanding and
Logic Programming. Lisbon, Portugal. 199
- âŚ