88 research outputs found

    Knowledge acquisition for coreference resolution

    Get PDF
    Diese Arbeit befasst sich mit dem Problem der statistischen Koreferenzauflösung. Theoretische Studien bezeichnen Koreferenz als ein vielseitiges linguistisches Phänomen, das von verschiedenen Faktoren beeinflusst wird. Moderne statistiche Algorithmen dagegen basieren sich typischerweise auf einfache wissensarme Modelle. Ziel dieser Arbeit ist das Schließen der Lücke zwischen Theorie und Praxis. Ausgehend von den Erkentnissen der theoretischen Studien erfolgt die Bestimmung der linguistischen Faktoren die fuer die Koreferenz besonders relevant erscheinen. Unterschiedliche Informationsquellen werden betrachtet: von der Oberflächenübereinstimmung bis zu den tieferen syntaktischen, semantischen und pragmatischen Merkmalen. Die Präzision der untersuchten Faktoren wird mit korpus-basierten Methoden evaluiert. Die Ergebnisse beweisen, dass die Koreferenz mit den linguistischen, in den theoretischen Studien eingebrachten Merkmalen interagiert. Die Arbeit zeigt aber auch, dass die Abdeckung der untersuchten theoretischen Aussagen verbessert werden kann. Die Merkmale stellen die Grundlage für den Aufbau eines einerseits linguistisch gesehen reichen andererseits auf dem Machinellen Lerner basierten, d.h. eines flexiblen und robusten Systems zur Koreferenzauflösung. Die aufgestellten Untersuchungen weisen darauf hin dass das wissensreiche Model erfolgversprechende Leistung zeigt und im Vergleich mit den Algorithmen, die sich auf eine einzelne Informationsquelle verlassen, sowie mit anderen existierenden Anwendungen herausragt. Das System erreicht einen F-wert von 65.4% auf dem MUC-7 Korpus. In den bereits veröffentlichen Studien ist kein besseres Ergebnis verzeichnet. Die Lernkurven zeigen keine Konvergenzzeichen. Somit kann der Ansatz eine gute Basis fuer weitere Experimente bilden: eine noch bessere Leistung kann dadurch erreicht werden, dass man entweder mehr Texte annotiert oder die bereits existierende Daten effizienter einsetzt. Diese Arbeit beweist, dass statistiche Algorithmen fuer Koreferenzauflösung stark von den theoretischen linguistischen Studien profitiern können und sollen: auch unvollständige Informationen, die automatische fehleranfällige Sprachmodule liefern, können die Leistung der Anwendung signifikant verbessern.This thesis addresses the problem of statistical coreference resolution. Theoretical studies describe coreference as a complex linguistic phenomenon, affected by various different factors. State-of-the-art statistical approaches, on the contrary, rely on rather simple knowledge-poor modeling. This thesis aims at bridging the gap between the theory and the practice. We use insights from linguistic theory to identify relevant linguistic parameters of co-referring descriptions. We consider different types of information, from the most shallow name-matching measures to deeper syntactic, semantic, and discourse knowledge. We empirically assess the validity of the investigated theoretic predictions for the corpus data. Our data-driven evaluation experiments confirm that various linguistic parameters, suggested by theoretical studies, interact with coreference and may therefore provide valuable information for resolution systems. At the same time, our study raises several issues concerning the coverage of theoretic claims. It thus brings feedback to linguistic theory. We use the investigated knowledge sources to build a linguistically informed statistical coreference resolution engine. This framework allows us to combine the flexibility and robustness of a machine learning-based approach with wide variety of data from different levels of linguistic description. Our evaluation experiments with different machine learners show that our linguistically informed model, on the one side, outperforms algorithms, based on a single knowledge source and, on the other side, yields the best result on the MUC-7 data, reported in the literature (F-score of 65.4% with the SVM-light learning algorithm). The learning curves for our classifiers show no signs of convergence. This suggests that our approach makes a good basis for further experimentation: one can obtain even better results by annotating more material or by using the existing data more intelligently. Our study proves that statistical approaches to the coreference resolution task may and should benefit from linguistic theories: even imperfect knowledge, extracted from raw text data with off-the-shelf error-prone NLP modules, helps achieve significant improvements

    Anaphora resolution for bengali: An experiment with domain adaptation

    Get PDF
    In this paper we present our first attempt on anaphora resolution for a resource poor language, namely Bengali. We address the issue of adapting a state-of-the-art system, BART, which was originally developed for English. Overall performance of co-reference resolution greatly depends on the high accurate mention detectors. We develop a number of models based on the heuristics used as well as on the particular machine learning employed. Thereafter we perform a series of experiments for adapting BART for Bengali. Our evaluation shows, a language-dependant system (designed primarily for English) can achieve a good performance level when re-trained and tested on a new language with proper subsets of features. The system produces the recall, precision and F-measure values of 56.00%, 46.50% and 50.80%, respectively. The contribution of this work is two-fold, viz. (i). attempt to build a machine learning based anaphora resolution system for a resource-poor Indian language; and (ii). domain adaptation of a state-of-the-art English co-reference resolution system for Bengali, which has completely different orthography and characteristics

    Multi-lingual Opinion Mining on YouTube

    Get PDF
    In order to successfully apply opinion mining (OM) to the large amounts of user-generated content produced every day, we need robust models that can handle the noisy input well yet can easily be adapted to a new domain or language. We here focus on opinion mining for YouTube by (i) modeling classifiers that predict the type of a comment and its polarity, while distinguishing whether the polarity is directed towards the product or video; (ii) proposing a robust shallow syntactic structure (STRUCT) that adapts well when tested across domains; and (iii) evaluating the effectiveness on the proposed structure on two languages, English and Italian. We rely on tree kernels to automatically extract and learn features with better generalization power than traditionally used bag-of-word models. Our extensive empirical evaluation shows that (i) STRUCT outperforms the bag-of-words model both within the same domain (up to 2.6% and 3% of absolute improvement for Italian and English, respectively); (ii) it is particularly useful when tested across domains (up to more than 4% absolute improvement for both languages), especially when little training data is available (up to 10% absolute improvement) and (iii) the proposed structure is also effective in a lower-resource language scenario, where only less accurate linguistic processing tools are available

    Translation of Modal Verbs in Media Texts: Corpus-Based Approach

    Get PDF
    The main modal verbs of the English language (can, could, may, must, should, need, will, would) in media texts have been studied, namely the ways of their translation into Russian. Using the method of continuous sampling, 50 concordances were selected and analyzed for each of the 8 modal verbs. The material of the study was examples in the field of journalism from the National Corpus of the Russian Language, namely, a parallel subcorpus composed of original texts and their translations. Additionally, a ranking was carried out according to the frequency of modal verbs in the parallel corpus NKRYA (language pair Russian-English) and the English web corpus WebCorp. When comparing, the absolute frequencies of modal verbs were used, since the first corpus is static, and the second is dynamic (replenished daily). The need to supplement the parallel corpus was revealed, since the sorting results were not identical. Based on the analysis of translation transformations, the following was found: literal translation, grammatical substitutions and omission were most often used in the translation of modal verbs in media texts. It has been established that impersonal constructions were often used, and modality was transmitted using linguistic means of another level

    OVARIAN INFERTILITY FACTOR IN PATIENTS OF LATE REPRODUCTIVE AGE

    Get PDF
    Objective: in women over 35 years of age, there is a progressive and age-related decline in fertility, which is due to multiple consequences, including a decrease in follicular reserve. Cytokines play a role, mediating the interaction between oocytes and other cells. In addition, there is a change in the expression of mRNA of a number of genes, leading to a decrease in the ability to bear children. The goal is to highlight the problem of reducing fertility in women of older reproductive age. Materials and methods: medLine, Pubmed, RISC, etc. Results: analysis of literature data shows that fertility decline is determined by a combination of physiological, molecular and genetic factors that play an increasing role as we age.Conclusion: the development of modern technology can solve the problem of infertility in the vast majority of cases. However, the lack of effectiveness of assisted reproductive technologies (ART) for women over 35 years of age requires optimizing a care strategy for these women

    Endometrial Infertility in Patients of Late Reproductive age (a review)

    Get PDF
    Background. Endometrial infertility is a frequent cause of failure in assisted reproduction. Causes of endometrial infertility are manifold and require comprehensive assessment for a successful choice of treatment strategy.Objectives. A review of infertility concepts accounting for endometrial infertility in women of late reproductive age.Methods. Bibliographic analysis: sources for review were mined in the PubMed, MedLine, eLibrary and Cyberleninka databases at a depth of 10 years. Keyword queries were: endometrial factors of infertility, uterine infertility [маточные факторы бесплодия], causes of infertility. Selected articles related to female infertility and, particularly, endometrial factors of infertility. Low-informative articles were not considered.Results. A total of 51 sources were analysed, with 36 selected in the review. The reviewed evidence suggests that endometrial female infertility in late reproductive age is associated with cumulative gynaecological pathology and age-related change adversely impacting endometrial receptivity and synchrony with embryo maturation in assisted reproductive protocols.Conclusion. Determining the functional status of endometrium is prerequisite for the outcome prognosis in assisted reproduction due to feasible failures to conceive with a vital embryo but reduced endometrial receptivity. This observation warrants a timely diagnosis and treatment of endometrial disorders prior to having assisted reproductive interventions. Woman’s age is the main predictor of successful pregnancy in IVF/ICSI protocols. Among the main markers of successful implantation is endometrial thickness. Uterine infertility may relate to impaired local immunity and autoimmune responses in uterine cavity. The most common mechanisms of uterine infertility are associated uterine myoma, endometriosis and endometritis. Women with uterine infertility attempting IVF/ICSI procedures often exhibit asynchronous endometrial development relative to the embryo maturity for implantation
    corecore