5 research outputs found
A prototype for projecting HPSG syntactic lexica towards LMF
The comparative evaluation of Arabic HPSG grammar lexica requires a deep
study of their linguistic coverage. The complexity of this task results mainly
from the heterogeneity of the descriptive components within those lexica
(underlying linguistic resources and different data categories, for example).
It is therefore essential to define more homogeneous representations, which in
turn will enable us to compare them and eventually merge them. In this context,
we present a method for comparing HPSG lexica based on a rule system. This
method is implemented within a prototype for the projection from Arabic HPSG to
a normalised pivot language compliant with LMF (ISO 24613 - Lexical Markup
Framework) and serialised using a TEI (Text Encoding Initiative) based
representation. The design of this system is based on an initial study of the
HPSG formalism looking at its adequacy for the representation of Arabic, and
from this, we identify the appropriate feature structures corresponding to each
Arabic lexical category and their possible LMF counterparts
A new method for interoperability between lexical resources using MDA approach
International audienceLexical resources are increasingly multiplatform due to the diverse needs of linguists. Merging, comparing, finding correspondences and deducing differences between these lexical resources remain difficult tasks. Thus, inte-roperability between these resources is hard even impossible to achieve. In this context, we establish a new method based on MDA approach to resolve interoperability between lexical resources. The proposed method consists of building common structure (OWL-DL ontology) for involved resources. This common structure has the ability to communicate involved resources. Hence, we may create a complex grid between involved resources allowing transformation from one format to another. We experiment our new built method on an LMF lexicon
Towards modeling Arabic lexicons compliant LMF in OWL-DL
International audienceElaborating reusable lexical databases and especially making interoperability operational are crucial tasks effecting both Natural Language Processing (NLP) and Semantic Web. With this respect, we consider that modeling Lexical Markup Framework (LMF) in Web Ontology Language Description Logics (OWL-DL) can be a beneficial attempt to reach these aims. This proposal will have large repute since it concerns the reference standard LMF for modeling lexical structures. In this paper, we study the requirement for this suggestion. We first make a quick presentation of the LMF framework. Next, we define the three ontology definition sublanguages that may be easily used by specific users: OWL Lite, OWL-DL and OWL Full. After comparing of the three, we have chosen to work with OWL-DL. We then define the ontology language OWL and describe the steps needed to model LMF in OWL. Finally, we apply this model to develop an instance for an Arabic lexicon
TEI and LMF crosswalks
The present paper explores various arguments in favour of making the Text
Encoding Initia-tive (TEI) guidelines an appropriate serialisation for ISO
standard 24613:2008 (LMF, Lexi-cal Mark-up Framework) . It also identifies the
issues that would have to be resolved in order to reach an appropriate
implementation of these ideas, in particular in terms of infor-mational
coverage. We show how the customisation facilities offered by the TEI
guidelines can provide an adequate background, not only to cover missing
components within the current Dictionary chapter of the TEI guidelines, but
also to allow specific lexical projects to deal with local constraints. We
expect this proposal to be a basis for a future ISO project in the context of
the on going revision of LMF
Tune your brown clustering, please
Brown clustering, an unsupervised hierarchical clustering technique based on ngram mutual information, has proven useful in many NLP applications. However, most uses of Brown clustering employ the same default configuration; the appropriateness of this configuration has gone predominantly unexplored. Accordingly, we present information for practitioners on the behaviour of Brown clustering in order to assist hyper-parametre tuning, in the form of a theoretical model of Brown clustering utility. This model is then evaluated empirically in two sequence labelling tasks over two text types. We explore the dynamic between the input corpus size, chosen number of classes, and quality of the resulting clusters, which has an impact for any approach using Brown clustering. In every scenario that we examine, our results reveal that the values most commonly used for the clustering are sub-optimal