5 research outputs found

    A prototype for projecting HPSG syntactic lexica towards LMF

    Get PDF
    The comparative evaluation of Arabic HPSG grammar lexica requires a deep study of their linguistic coverage. The complexity of this task results mainly from the heterogeneity of the descriptive components within those lexica (underlying linguistic resources and different data categories, for example). It is therefore essential to define more homogeneous representations, which in turn will enable us to compare them and eventually merge them. In this context, we present a method for comparing HPSG lexica based on a rule system. This method is implemented within a prototype for the projection from Arabic HPSG to a normalised pivot language compliant with LMF (ISO 24613 - Lexical Markup Framework) and serialised using a TEI (Text Encoding Initiative) based representation. The design of this system is based on an initial study of the HPSG formalism looking at its adequacy for the representation of Arabic, and from this, we identify the appropriate feature structures corresponding to each Arabic lexical category and their possible LMF counterparts

    A new method for interoperability between lexical resources using MDA approach

    Get PDF
    International audienceLexical resources are increasingly multiplatform due to the diverse needs of linguists. Merging, comparing, finding correspondences and deducing differences between these lexical resources remain difficult tasks. Thus, inte-roperability between these resources is hard even impossible to achieve. In this context, we establish a new method based on MDA approach to resolve interoperability between lexical resources. The proposed method consists of building common structure (OWL-DL ontology) for involved resources. This common structure has the ability to communicate involved resources. Hence, we may create a complex grid between involved resources allowing transformation from one format to another. We experiment our new built method on an LMF lexicon

    Towards modeling Arabic lexicons compliant LMF in OWL-DL

    Get PDF
    International audienceElaborating reusable lexical databases and especially making interoperability operational are crucial tasks effecting both Natural Language Processing (NLP) and Semantic Web. With this respect, we consider that modeling Lexical Markup Framework (LMF) in Web Ontology Language Description Logics (OWL-DL) can be a beneficial attempt to reach these aims. This proposal will have large repute since it concerns the reference standard LMF for modeling lexical structures. In this paper, we study the requirement for this suggestion. We first make a quick presentation of the LMF framework. Next, we define the three ontology definition sublanguages that may be easily used by specific users: OWL Lite, OWL-DL and OWL Full. After comparing of the three, we have chosen to work with OWL-DL. We then define the ontology language OWL and describe the steps needed to model LMF in OWL. Finally, we apply this model to develop an instance for an Arabic lexicon

    TEI and LMF crosswalks

    Get PDF
    The present paper explores various arguments in favour of making the Text Encoding Initia-tive (TEI) guidelines an appropriate serialisation for ISO standard 24613:2008 (LMF, Lexi-cal Mark-up Framework) . It also identifies the issues that would have to be resolved in order to reach an appropriate implementation of these ideas, in particular in terms of infor-mational coverage. We show how the customisation facilities offered by the TEI guidelines can provide an adequate background, not only to cover missing components within the current Dictionary chapter of the TEI guidelines, but also to allow specific lexical projects to deal with local constraints. We expect this proposal to be a basis for a future ISO project in the context of the on going revision of LMF

    Tune your brown clustering, please

    Get PDF
    Brown clustering, an unsupervised hierarchical clustering technique based on ngram mutual information, has proven useful in many NLP applications. However, most uses of Brown clustering employ the same default configuration; the appropriateness of this configuration has gone predominantly unexplored. Accordingly, we present information for practitioners on the behaviour of Brown clustering in order to assist hyper-parametre tuning, in the form of a theoretical model of Brown clustering utility. This model is then evaluated empirically in two sequence labelling tasks over two text types. We explore the dynamic between the input corpus size, chosen number of classes, and quality of the resulting clusters, which has an impact for any approach using Brown clustering. In every scenario that we examine, our results reveal that the values most commonly used for the clustering are sub-optimal
    corecore