205 research outputs found
Огляд підходів до розв’язання задач ідентифікації парафраз
The article is devoted to a review of approaches to solving the problem of identifying paraphrases. This problem's relevance and use in tasks such as plagiarism detection, text simplification, and information search are described. Several classes of solutions were considered. The first approach is based on manual rules - it uses manually selected features based on the fundamental properties of paraphrases. The second approach is based on lexical similarity and various databases and ontologies. Machine learning-based approaches are also presented in this paper and describe different architectures that can be used to identify paraphrases. The last approach considered is based on deep learning and modern models of transformers.
Pages of the article in the issue: 71 - 78
Language of the article: UkrainianСтаття присвячена огляду підходів до розв’язання задачі ідентифікації парафраз. Описується актуальність та використання даної задачі у таких задачах як виявлення плагіату, спрощення тексту та пошук інформації. Було розглянуто декілька класів вирішення даної задачі. Перший підхід заснований на ручних правилах - використовує вручну підібрані особливості базуючись на базових властивостях парафраз. Другий підхід заснований на лексичній подібності та різноманітних базах даних і онтології. Підходи, засновані на машинному навчанні також представлені у даній статті та описує архітектури, які можуть бути використані для ідентифікації парафраз. Останній підхід який розглянуто базується на глибокому навчанні та сучасних моделях трансформерів
Recommended from our members
Event-based hyperspace analogue to language for query expansion
Bag-of-words approaches to information retrieval (IR) are effective but assume independence between words. The Hyperspace Analogue to Language (HAL) is a cognitively motivated and validated semantic space model that captures statistical dependencies between words by considering their co-occurrences in a surrounding window of text. HAL has been successfully applied to query expansion in IR, but has several limitations, including high processing cost and use of distributional statistics that do not exploit syntax. In this paper, we pursue two methods for incorporating syntactic-semantic information from textual ‘events’ into HAL. We build the HAL space directly from events to investigate whether processing costs can be reduced through more careful definition of word co-occurrence, and improve the quality of the pseudo-relevance feedback by applying event information as a constraint during HAL construction. Both methods significantly improve performance results in comparison with original HAL, and interpolation of HAL and relevance model expansion outperforms either method alone
Adapting a general parser to a sublanguage
In this paper, we propose a method to adapt a general parser (Link Parser) to
sublanguages, focusing on the parsing of texts in biology. Our main proposal is
the use of terminology (identication and analysis of terms) in order to reduce
the complexity of the text to be parsed. Several other strategies are explored
and finally combined among which text normalization, lexicon and
morpho-guessing module extensions and grammar rules adaptation. We compare the
parsing results before and after these adaptations
Disambiguation of Super Parts of Speech (or Supertags): Almost Parsing
In a lexicalized grammar formalism such as Lexicalized Tree-Adjoining Grammar
(LTAG), each lexical item is associated with at least one elementary structure
(supertag) that localizes syntactic and semantic dependencies. Thus a parser
for a lexicalized grammar must search a large set of supertags to choose the
right ones to combine for the parse of the sentence. We present techniques for
disambiguating supertags using local information such as lexical preference and
local lexical dependencies. The similarity between LTAG and Dependency grammars
is exploited in the dependency model of supertag disambiguation. The
performance results for various models of supertag disambiguation such as
unigram, trigram and dependency-based models are presented.Comment: ps file. 8 page
Three New Probabilistic Models for Dependency Parsing: An Exploration
After presenting a novel O(n^3) parsing algorithm for dependency grammar, we
develop three contrasting ways to stochasticize it. We propose (a) a lexical
affinity model where words struggle to modify each other, (b) a sense tagging
model where words fluctuate randomly in their selectional preferences, and (c)
a generative model where the speaker fleshes out each word's syntactic and
conceptual structure without regard to the implications for the hearer. We also
give preliminary empirical results from evaluating the three models' parsing
performance on annotated Wall Street Journal training text (derived from the
Penn Treebank). In these results, the generative (i.e., top-down) model
performs significantly better than the others, and does about equally well at
assigning part-of-speech tags.Comment: 6 pages, LaTeX 2.09 packaged with 4 .eps files, also uses colap.sty
and acl.bs
- …