6 research outputs found

    SzegedKoref: A Hungarian Coreference Corpus

    Get PDF
    In this paper we introduce SzegedKoref, a Hungarian corpus in which coreference relations are manually annotated. For annotation, we selected some texts of Szeged Treebank, the biggest treebank of Hungarian with manual annotation at several linguistic layers. The corpus contains approximately 55,000 tokens and 4000 sentences. Due to its size, the corpus can be exploited in training and testing machine learning based coreference resolution systems, which we would like to implement in the near future. We present the annotated texts, we describe the annotated categories of anaphoric relations, we report on the annotation process and we offer several examples of each annotated category. Two linguistic phenomena – phonologically empty pronouns and pronouns referring to subordinate clauses – are important characteristics of Hungarian coreference relations. In our paper, we also discuss both of them

    Magyar nyelvű orvosi szakcikkek hivatkozásainak automatikus feldolgozása

    Get PDF
    Cikkünkben bemutatunk egy szakirodalmi hivatkozások feldolgozására kidolgozott nyelvtechnológiai rendszert. A rendszer két legfontosabb modulja egy közelítő illesztésen alapuló visszakereső modul és egy szekvenciajelölő modul, ami a hivatkozások egyes elemeit azonosítja. Ez utóbbi megoldásnál egy újszerű kétlépcsős újrarangsoroló technikát is ismertetünk

    Relevance segmentation of long documents

    Get PDF
    In this paper, we present our methods to identify the most salient topics for a selected domain based on topic modeling. We propose a topic relevance score and segmentation procedure which can split the document into parts referring to various topics. We also offer a solution for visualizing textual spans that are related to a given topic. In this way, it can be easily determined which are the most relevant and most irrelevant segments of a long document (like blog posts or news articles)

    Relevance Segmentation of Long Documents

    Get PDF
    In this paper, we present our methods to identify the most salient topics for a selected domain based on topic modeling. We propose a topic relevance score and segmentation procedure which can split the document into parts referring to various topics. We also offer a solution for visualizing textual spans that are related to a given topic. In this way, it can be easily determined which are the most relevant and most irrelevant segments of a long document (like blog posts or news articles)

    Structure preserving adversarial generation of labeled training samples for single-cell segmentation

    No full text
    We introduce a generative data augmentation strategy to improve the accuracy of instance segmentation of microscopy data for complex tissue structures. Our pipeline uses regular and conditional generative adver-sarial networks (GANs) for image-to-image translation to construct synthetic microscopy images along with their corresponding masks to simulate the distribution and shape of the objects and their appearance. The synthetic samples are then used for training an instance segmentation network (for example, StarDist or Cell -pose). We show on two single-cell-resolution tissue datasets that our method improves the accuracy of downstream instance segmentation tasks compared with traditional training strategies using either the raw data or basic augmentations. We also compare the quality of the object masks with those generated by a traditional cell population simulation method, finding that our synthesized masks are closer to the ground truth considering Fre ' chet inception distances.Peer reviewe

    Structure preserving adversarial generation of labeled training samples for single-cell segmentation

    No full text
    We introduce a generative data augmentation strategy to improve the accuracy of instance segmentation of microscopy data for complex tissue structures. Our pipeline uses regular and conditional generative adver-sarial networks (GANs) for image-to-image translation to construct synthetic microscopy images along with their corresponding masks to simulate the distribution and shape of the objects and their appearance. The synthetic samples are then used for training an instance segmentation network (for example, StarDist or Cell -pose). We show on two single-cell-resolution tissue datasets that our method improves the accuracy of downstream instance segmentation tasks compared with traditional training strategies using either the raw data or basic augmentations. We also compare the quality of the object masks with those generated by a traditional cell population simulation method, finding that our synthesized masks are closer to the ground truth considering Fre ' chet inception distances
    corecore