6 research outputs found
SzegedKoref: A Hungarian Coreference Corpus
In this paper we introduce SzegedKoref, a Hungarian corpus in which coreference relations are manually annotated. For annotation, we selected some texts of Szeged Treebank, the biggest treebank of Hungarian with manual annotation at several linguistic layers. The corpus contains approximately 55,000 tokens and 4000 sentences. Due to its size, the corpus can be exploited in training and testing machine learning based coreference resolution systems, which we would like to implement in the near future. We present the annotated texts, we describe the annotated categories of anaphoric relations, we report on the annotation process and we offer several examples of each annotated category. Two linguistic phenomena – phonologically empty pronouns and pronouns referring to subordinate clauses – are important characteristics of Hungarian coreference relations. In our paper, we also discuss both of them
Magyar nyelvű orvosi szakcikkek hivatkozásainak automatikus feldolgozása
Cikkünkben bemutatunk egy szakirodalmi hivatkozások feldolgozására kidolgozott nyelvtechnológiai rendszert. A rendszer két legfontosabb modulja egy közelítő illesztésen alapuló visszakereső modul és egy szekvenciajelölő modul, ami a hivatkozások egyes elemeit azonosítja. Ez utóbbi megoldásnál egy újszerű kétlépcsős újrarangsoroló technikát is ismertetünk
Relevance segmentation of long documents
In this paper, we present our methods to identify the most salient topics for a selected domain based on topic modeling. We propose a topic relevance score and segmentation procedure which can split the document into parts referring to various topics. We also offer a solution for visualizing textual spans that are related to a given topic. In this way, it can be easily determined which are the most relevant and most irrelevant segments of a long document (like blog posts or news articles)
Relevance Segmentation of Long Documents
In this paper, we present our methods to identify the most salient topics for a selected domain based on topic modeling. We propose a topic relevance score and segmentation procedure which can split the document into
parts referring to various topics. We also offer a solution for visualizing textual spans that are related to a given topic. In this way, it can be easily determined
which are the most relevant and most irrelevant segments of a long document (like blog posts or news articles)
Structure preserving adversarial generation of labeled training samples for single-cell segmentation
We introduce a generative data augmentation strategy to improve the accuracy of instance segmentation of microscopy data for complex tissue structures. Our pipeline uses regular and conditional generative adver-sarial networks (GANs) for image-to-image translation to construct synthetic microscopy images along with their corresponding masks to simulate the distribution and shape of the objects and their appearance. The synthetic samples are then used for training an instance segmentation network (for example, StarDist or Cell -pose). We show on two single-cell-resolution tissue datasets that our method improves the accuracy of downstream instance segmentation tasks compared with traditional training strategies using either the raw data or basic augmentations. We also compare the quality of the object masks with those generated by a traditional cell population simulation method, finding that our synthesized masks are closer to the ground truth considering Fre ' chet inception distances.Peer reviewe
Structure preserving adversarial generation of labeled training samples for single-cell segmentation
We introduce a generative data augmentation strategy to improve the accuracy of instance segmentation of microscopy data for complex tissue structures. Our pipeline uses regular and conditional generative adver-sarial networks (GANs) for image-to-image translation to construct synthetic microscopy images along with their corresponding masks to simulate the distribution and shape of the objects and their appearance. The synthetic samples are then used for training an instance segmentation network (for example, StarDist or Cell -pose). We show on two single-cell-resolution tissue datasets that our method improves the accuracy of downstream instance segmentation tasks compared with traditional training strategies using either the raw data or basic augmentations. We also compare the quality of the object masks with those generated by a traditional cell population simulation method, finding that our synthesized masks are closer to the ground truth considering Fre ' chet inception distances