13 research outputs found

    Lexical Semantic Recognition

    Full text link
    In lexical semantics, full-sentence segmentation and segment labeling of various phenomena are generally treated separately, despite their interdependence. We hypothesize that a unified lexical semantic recognition task is an effective way to encapsulate previously disparate styles of annotation, including multiword expression identification / classification and supersense tagging. Using the STREUSLE corpus, we train a neural CRF sequence tagger and evaluate its performance along various axes of annotation. As the label set generalizes that of previous tasks (PARSEME, DiMSUM), we additionally evaluate how well the model generalizes to those test sets, finding that it approaches or surpasses existing models despite training only on STREUSLE. Our work also establishes baseline models and evaluation metrics for integrated and accurate modeling of lexical semantics, facilitating future work in this area.Comment: 11 pages, 3 figures; to appear at MWE 202

    Regularization of word embeddings for multi-word expression identification

    Get PDF
    In this paper we compare the effects of applying various state-of-the-art word representation strategies in the task of multi-word expression (MWE) identification. In particular, we analyze the strengths and weaknesses of the usage of `1-regularized sparse word embeddings for identifying MWEs. Our earlier study demonstrated the effectiveness of regularized word embeddings in other sequence labeling tasks, i.e. part-of-speech tagging and named entity recognition, but it has not yet been rigorously evaluated for the identification of MWEs yet

    Investigating different levels of joining entity and relation classification

    Get PDF
    Named entities, such as persons or locations, are crucial bearers of information within an unstructured text. Recognition and classification of these (named) entities is an essential part of information extraction. Relation classification, the process of categorizing semantic relations between two entities within a text, is another task closely linked to named entities. Those two tasks -- entity and relation classification -- have been commonly treated as a pipeline of two separate models. While this separation simplifies the problem, it also disregards underlying dependencies and connections between the two subtasks. As a consequence, merging both subtasks into one joint model for entity and relation classification is the next logical step. A thorough investigation and comparison of different levels of joining the two tasks is the goal of this thesis. This thesis will accomplish the objective by defining different levels of joint entity and relation classification and developing (implementing and evaluating) and analyzing machine learning models for each level. The levels which will be investigated are: (L1) a pipeline of independent models for entity classification and relation classification (L2) using the entity class predictions as features for relation classification (L3) global features for both entity and relation classification (L4) explicit utilization of a single joint model for entity and relation classification The best results are achieved using the model for level 3 with an F1 score of 0.830 for entity classification and an F_1 score of 0.52 for relation classification.Entitäten, wie Personen oder Orte sind ausschlaggebende Informationsträger in unstrukturierten Texten. Das Erkennen und das Klassifizieren dieser Entitäten ist eine entscheidende Aufgabe in der Informationsextraktion. Das Klassifizieren von semantischen Relationen zwischen zwei Entitäten in einem Text ist eine weitere Aufgabe, die eng mit Entitäten verbunden ist. Diese zwei Aufgaben (Entitäts- und Relationsklassifikation) werden üblicherweise in einer Pipeline hintereinander mit zwei verschiedenen Modellen durchgeführt. Während die Aufteilung der beiden Probleme den Klassifizierungsprozess vereinfacht, ignoriert sie aber auch darunterliegende Abhängigkeiten und Zusammenhänge zwischen den beiden Aufgaben. Daher scheint es ratsam, ein gemeinsames Modell für beide Probleme zu entwickeln. Eine umfassende Untersuchung von verschiedenen Stufen der Verknüpfung der beiden Aufgaben ist das Ziel dieser Bachelorarbeit. Dazu werden Modelle für die unterschiedlichen Stufen der Verknüpfung zwischen Entitäts- und Relationsklassifikation definiert und mittels maschinellen Lernens ausgewertet und evaluiert. Die verschiedenen Stufen die betrachtet werden, sind: (L1) Verwendung einer Pipeline zum sequentiellen und unabhängigen Ausführen beider Modelle (L2) Verwendung der Vorhersagen über die Entitätsklassen als Merkmale für die Relationsklassifikation (L3) Verwendung von globalen Merkmale für sowohl die Entitätsklassifikation als auch für die Relationsklassifikation (L4) Explizite Verwendung eines gemeinsamen Modells zur Entitäts- und Relationsklassifikation Die besten Resultate wurden mit dem Modell für Level 3 erreicht. Das F1-Maß der Entitätsklassifikation beträgt 0.830 und das F1-Maß der Relationsklassifikation beträgt 0.52

    The 9th Conference of PhD Students in Computer Science

    Get PDF

    The 8th Conference of PhD Students in Computer Science

    Get PDF

    A Computational Lexicon and Representational Model for Arabic Multiword Expressions

    Get PDF
    The phenomenon of multiword expressions (MWEs) is increasingly recognised as a serious and challenging issue that has attracted the attention of researchers in various language-related disciplines. Research in these many areas has emphasised the primary role of MWEs in the process of analysing and understanding language, particularly in the computational treatment of natural languages. Ignoring MWE knowledge in any NLP system reduces the possibility of achieving high precision outputs. However, despite the enormous wealth of MWE research and language resources available for English and some other languages, research on Arabic MWEs (AMWEs) still faces multiple challenges, particularly in key computational tasks such as extraction, identification, evaluation, language resource building, and lexical representations. This research aims to remedy this deficiency by extending knowledge of AMWEs and making noteworthy contributions to the existing literature in three related research areas on the way towards building a computational lexicon of AMWEs. First, this study develops a general understanding of AMWEs by establishing a detailed conceptual framework that includes a description of an adopted AMWE concept and its distinctive properties at multiple linguistic levels. Second, in the use of AMWE extraction and discovery tasks, the study employs a hybrid approach that combines knowledge-based and data-driven computational methods for discovering multiple types of AMWEs. Third, this thesis presents a representative system for AMWEs which consists of multilayer encoding of extensive linguistic descriptions. This project also paves the way for further in-depth AMWE-aware studies in NLP and linguistics to gain new insights into this complicated phenomenon in standard Arabic. The implications of this research are related to the vital role of the AMWE lexicon, as a new lexical resource, in the improvement of various ANLP tasks and the potential opportunities this lexicon provides for linguists to analyse and explore AMWE phenomena
    corecore