23 research outputs found

    HuSpaCy : an industrial-strength Hungarian natural language processing toolkit

    Get PDF
    Although there are a couple of open-source language processing pipelines available for Hungarian, none of them satisfies the requirements of today’s NLP applications. A language processing pipeline should consist of close to state-of-the-art lemmatization, morphosyntactic analysis, entity recognition and word embeddings. Industrial text processing applications have to satisfy non-functional software quality requirements, what is more, frameworks supporting multiple languages are more and more favored. This paper introduces HuSpaCy, an industryready Hungarian language processing toolkit. The presented tool provides components for the most important basic linguistic analysis tasks. It is open-source and is available under a permissive license. Our system is built upon spaCy’s NLP components resulting in an easily usable, fast yet accurate application. Experiments confirm that HuSpaCy has high accuracy while maintaining resource-efficient prediction capabilities

    Hybrid lemmatization in HuSpaCy

    Full text link
    Lemmatization is still not a trivial task for morphologically rich languages. Previous studies showed that hybrid architectures usually work better for these languages and can yield great results. This paper presents a hybrid lemmatizer utilizing both a neural model, dictionaries and hand-crafted rules. We introduce a hybrid architecture along with empirical results on a widely used Hungarian dataset. The presented methods are published as three HuSpaCy models.Comment: published at the conference XIX. Magyar Sz\'am\'it\'og\'epes Nyelv\'eszeti Konferencia (XIX. Hungarian Computational Linguistics Conference

    Advancing Hungarian Text Processing with HuSpaCy: Efficient and Accurate NLP Pipelines

    Full text link
    This paper presents a set of industrial-grade text processing models for Hungarian that achieve near state-of-the-art performance while balancing resource efficiency and accuracy. Models have been implemented in the spaCy framework, extending the HuSpaCy toolkit with several improvements to its architecture. Compared to existing NLP tools for Hungarian, all of our pipelines feature all basic text processing steps including tokenization, sentence-boundary detection, part-of-speech tagging, morphological feature tagging, lemmatization, dependency parsing and named entity recognition with high accuracy and throughput. We thoroughly evaluated the proposed enhancements, compared the pipelines with state-of-the-art tools and demonstrated the competitive performance of the new models in all text preprocessing steps. All experiments are reproducible and the pipelines are freely available under a permissive license.Comment: Submitted to TSD 2023 Conferenc

    Hybrid lemmatization in HuSpaCy

    Get PDF
    Lemmatization is still not a trivial task for morphologically rich languages. Previous studies showed that hybrid architectures usually work better for these languages and can yield great results. This paper presents a hybrid lemmatizer utilizing both a neural model, dictionaries and hand-crafted rules. We introduce a hybrid architecture along with empirical results on a widely used Hungarian dataset. The presented methods are published as three HuSpaCy models

    Transformer-alapĂș HuSpaCy elƑelemzƑ lĂĄncok

    Get PDF
    ÉvrƑl Ă©vre jelennek meg egyre pontosabb nyĂ­lt forrĂĄskĂłdĂș termĂ©szetes-nyelvfeldolgozĂł transformer-alapĂș nyelvi modellek. Viszont ezekre Ă©pĂŒlƑ, kifejezetten magyar nyelvre fejlesztett elƑelemzƑ lĂĄnc, ami tartalmaz tokenizĂĄlĂĄst, mondatra bontĂĄst, lemmatizĂĄlĂĄst, szĂłfaji egyĂ©rtelmƱsĂ­tĂ©st, morfolĂłgiai cĂ­mkĂ©zĂ©st, fĂŒggƑsĂ©gi elemzĂ©st Ă©s nĂ©velem-felismerĂ©st, idĂĄig nem lĂ©tezett. Dolgozatunkban bemutatunk egy, a fent emlĂ­tett feladatokat megvalĂłsĂ­tĂł transformer-alapĂș elƑelemzƑ lĂĄncot, amit több kĂŒlönbözƑ magyarra elĂ©rhetƑ nyelvi modellel is teszteltĂŒnk. Az elkĂ©szĂŒlt rendszer a feladatok nagy rĂ©szĂ©ben state-of-the-art eredmĂ©nyeket Ă©rt el. Arra is kellƑ figyelmet fordĂ­tottunk, hogy megvizsgĂĄljunk kĂŒlönbözƑ erƑforrĂĄs-igĂ©nyƱ lehetƑsĂ©geket Ă©s bemutassunk olyan mĂłdszereket, amelyekkel a leghatĂ©konyabbnak bizonyult rendszerek memĂłriahasznĂĄlata csökkenthetƑ

    Preparation and characterization of cationic Pluronic for surface modification and functionalization of polymeric drug delivery nanoparticles

    Get PDF
    Biodegradable poly(lactic-co-glycolic acid) copolymer, PLGA nanoparticles (NPs) with a surface layer of poly(ethylene oxide)-poly(propylene oxide)-poly(ethylene oxide) triblock copolymers, Pluronics, are promising drug carrier systems. With the aim to increase the potential of targeted drug delivery the end group derivative of Pluronics was synthesized in a straightforward way to obtain Pluronic-amines. The formation of functional amine groups was confirmed by fluorescamine method and NMR analysis of their Boc-Phe-OH and Fmoc-Phe-OH conjugates. Pluronic and Pluronic-amine stabilized PLGA NPs prepared by nanoprecipitation were characterized by dynamic light scattering and zeta potential measurements. All of the systems showed high colloidal stability checked by electrolyte induced aggregation, although the presence of Pluronic-amine on the surface decreased the zeta potential in some extent. The introduction of reactive primary amine groups into the surface layer of PLGA NPs while preserving the aggregation stability, provides a possibility for coupling of various ligands allowing targeted delivery and also contributes to the improved membrane affinity of NPs

    Exosomal small RNA profiling in first-trimester maternal blood explores early molecular pathways of preterm preeclampsia

    Get PDF
    IntroductionPreeclampsia (PE) is a severe obstetrical syndrome characterized by new-onset hypertension and proteinuria and it is often associated with fetal intrauterine growth restriction (IUGR). PE leads to long-term health complications, so early diagnosis would be crucial for timely prevention. There are multiple etiologies and subtypes of PE, and this heterogeneity has hindered accurate identification in the presymptomatic phase. Recent investigations have pointed to the potential role of small regulatory RNAs in PE, and these species, which travel in extracellular vesicles (EVs) in the circulation, have raised the possibility of non-invasive diagnostics. The aim of this study was to investigate the behavior of exosomal regulatory small RNAs in the most severe subtype of PE with IUGR.MethodsWe isolated exosomal EVs from first-trimester peripheral blood plasma samples of women who later developed preterm PE with IUGR (n=6) and gestational age-matched healthy controls (n=14). The small RNA content of EVs and their differential expression were determined by next-generation sequencing and further validated by quantitative real-time PCR. We also applied the rigorous exceRpt bioinformatics pipeline for small RNA identification, followed by target verification and Gene Ontology analysis.ResultsOverall, >2700 small RNAs were identified in all samples and, of interest, the majority belonged to the RNA interference (RNAi) pathways. Among the RNAi species, 16 differentially expressed microRNAs were up-regulated in PE, whereas up-regulated and down-regulated members were equally found among the six identified Piwi-associated RNAs. Gene ontology analysis of the predicted small RNA targets showed enrichment of genes in pathways related to immune processes involved in decidualization, placentation and embryonic development, indicating that dysregulation of the induced small RNAs is connected to the impairment of immune pathways in preeclampsia development. Finally, the subsequent validation experiments revealed that the hsa_piR_016658 piRNA is a promising biomarker candidate for preterm PE associated with IUGR.DiscussionOur rigorously designed study in a homogeneous group of patients unraveled small RNAs in circulating maternal exosomes that act on physiological pathways dysregulated in preterm PE with IUGR. Therefore, our small RNA hits are not only suitable biomarker candidates, but the revealed biological pathways may further inform us about the complex pathology of this severe PE subtype

    Screening for preeclampsia in the first trimester of pregnancy in routine clinical practice in Hungary

    No full text
    We aimed to evaluate the contribution of different factors in the Fetal Medicine Foundation algorithms for preeclampsia (PE) risk calculation during first-trimester screening in Hungary. We selected subjects for the nested case-control study from a prospective cohort of 2545 low-risk pregnancies. Eighty-two patients with PE and 82 gestational age-matched controls were included. Individual PE risk was calculated using two risk-assessing softwares. Using Astraia 2.3.1, considering maternal characteristics and biophysical parameters only, detection rates (DR) were 63.6% for early-PE and 67.6% for late-PE. When we added placenta associated plasma protein A (PAPP-A) to the risk calculation, DRs decreased to 54.5% and 64.8% respectively. Using Astraia 2.8.2 with maternal characteristics and biophysical parameters resulted in the DRs of 63.6% (early-PE) and 56.3% (late-PE). If we added PAPP-A to the risk calculation, DRs improved to 72.7% and 54.9%. The addition of placental growth factor (PlGF) did not increase detection rates in either calculation. In conclusion, using maternal characteristics, biophysical parameters, and PAPP-A, an acceptable screening efficacy could be achieved for early-PE during first-trimester screening. Since PlGF did not improve efficacy in our study, we suggest setting new standard curves for PlGF in Eastern European pregnant women, and the evaluation of novel biochemical markers
    corecore