205 research outputs found

    Огляд підходів до розв’язання задач ідентифікації парафраз

    Get PDF
    The article is devoted to a review of approaches to solving the problem of identifying paraphrases. This problem's relevance and use in tasks such as plagiarism detection, text simplification, and information search are described. Several classes of solutions were considered. The first approach is based on manual rules - it uses manually selected features based on the fundamental properties of paraphrases. The second approach is based on lexical similarity and various databases and ontologies. Machine learning-based approaches are also presented in this paper and describe different architectures that can be used to identify paraphrases. The last approach considered is based on deep learning and modern models of transformers. Pages of the article in the issue: 71 - 78 Language of the article: UkrainianСтаття присвячена огляду підходів до розв’язання задачі ідентифікації парафраз. Описується актуальність та використання даної задачі у таких задачах як виявлення плагіату, спрощення тексту та пошук інформації. Було розглянуто декілька класів вирішення даної задачі. Перший підхід заснований на ручних правилах - використовує вручну підібрані особливості базуючись на базових властивостях парафраз. Другий підхід заснований на лексичній подібності та різноманітних базах даних і онтології. Підходи, засновані на машинному навчанні також представлені у даній статті та описує архітектури, які можуть бути використані для ідентифікації парафраз. Останній підхід який розглянуто базується на глибокому навчанні та сучасних моделях трансформерів

    Adapting a general parser to a sublanguage

    Full text link
    In this paper, we propose a method to adapt a general parser (Link Parser) to sublanguages, focusing on the parsing of texts in biology. Our main proposal is the use of terminology (identication and analysis of terms) in order to reduce the complexity of the text to be parsed. Several other strategies are explored and finally combined among which text normalization, lexicon and morpho-guessing module extensions and grammar rules adaptation. We compare the parsing results before and after these adaptations

    Disambiguation of Super Parts of Speech (or Supertags): Almost Parsing

    Get PDF
    In a lexicalized grammar formalism such as Lexicalized Tree-Adjoining Grammar (LTAG), each lexical item is associated with at least one elementary structure (supertag) that localizes syntactic and semantic dependencies. Thus a parser for a lexicalized grammar must search a large set of supertags to choose the right ones to combine for the parse of the sentence. We present techniques for disambiguating supertags using local information such as lexical preference and local lexical dependencies. The similarity between LTAG and Dependency grammars is exploited in the dependency model of supertag disambiguation. The performance results for various models of supertag disambiguation such as unigram, trigram and dependency-based models are presented.Comment: ps file. 8 page

    Three New Probabilistic Models for Dependency Parsing: An Exploration

    Full text link
    After presenting a novel O(n^3) parsing algorithm for dependency grammar, we develop three contrasting ways to stochasticize it. We propose (a) a lexical affinity model where words struggle to modify each other, (b) a sense tagging model where words fluctuate randomly in their selectional preferences, and (c) a generative model where the speaker fleshes out each word's syntactic and conceptual structure without regard to the implications for the hearer. We also give preliminary empirical results from evaluating the three models' parsing performance on annotated Wall Street Journal training text (derived from the Penn Treebank). In these results, the generative (i.e., top-down) model performs significantly better than the others, and does about equally well at assigning part-of-speech tags.Comment: 6 pages, LaTeX 2.09 packaged with 4 .eps files, also uses colap.sty and acl.bs
    corecore