Search CORE

22 research outputs found

Methods of the automated search of plagiarism in electronic documents

Author: Bugalskyy D. I.
Lupenko S. A.
Бугальський Дмитро Іванович
Лупенко Сергій Анатолійович
Publication venue: TNTU
Publication date: 19/11/2014
Field of study

Electronic archive of Ternopil National Ivan Puluj Technical University

Метод аналізу відгуків клієнтів з природномовних текстів

Author: Шаховська Н.Б.
Шаховська Х.Р.
Publication venue: Інститут проблем штучного інтелекту МОН України та НАН України
Publication date: 01/01/2018
Field of study

Стаття присвячена методу аналізу текстів природною мовою, що містять відгуки клієнтів. Метод від-різняється від існуючих комбінацією різних типів векторизатора та уведенням ієрархії компонентів. Послі-довність застосування різних векторизаторів дає змогу будувати ієрархію ознак та маркерів. Використання методу опорних векторів та острівної кластеризації з подальшим навчання моделі для прогнозування почут-тів є одним із кращих методів аналізу настроїв, як для небінарних, так і для бінарних аспектів. На основі від-критого набору даних з допомогою Python та Tablau побудовано програмний продукт для аналізу вподобань клієнтів і візуалізації результатів аналізів.The article is devoted to the method of analysis of texts in the natural language, containing reviews of clients. The method differs from the existing combination of different types of vectorizer and the introduction of the component hierarchy. The sequencing of the use of different vectorizers allows us to build a hierarchy of features and markers. Using the reference vectors and island clustering techniques, with the subsequent training of a model for prediction of feelings, is one of the best methods for analyzing mood, both for non-binary and binary aspects. Based on open data set with Python and Tablau, a software product was developed to analyze customer preferences and visualize the results of analyzes

Наукова електронна бібліотека періодичних видань НАН України (Vernadsky National Library of Ukraine)

Cross-language high similarity search using a conceptual thesaurus

Author: A. Chowdhury
A.Z. Broder
D. Pinto
J. Dean
M. Anderka
M. Potthast
M.S. Charikar
P. Mcnamee
P.F. Brown
R. Steinberger
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

This work addresses the issue of cross-language high similarity and near-duplicates search, where, for the given document, a highly similar one is to be identified from a large cross-language collection of documents. We propose a concept-based similarity model for the problem which is very light in computation and memory. We evaluate the model on three corpora of different nature and two language pairs English-German and English-Spanish using the Eurovoc conceptual thesaurus. Our model is compared with two state-of-the-art models and we find, though the proposed model is very generic, it produces competitive results and is significantly stable and consistent across the corpora.This work was done in the framework of the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems and it has been partially funded by the European Commission as part of the WIQ-EI IRSES project (grant no. 269180) within the FP 7 Marie Curie People Framework, and by the Text-Enterprise 2.0 research project (TIN2009-13391-C04-03). The research work of the second author is supported by the CONACyT 192021/302009 grantGupta, P.; Barrón Cedeño, LA.; Rosso, P. (2012). Cross-language high similarity search using a conceptual thesaurus. En Information Access Evaluation. Multilinguality, Multimodality, and Visual Analytics. Springer Verlag (Germany). 7488:67-75. https://doi.org/10.1007/978-3-642-33247-0_8S6775748

Crossref

RiuNet

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Generating Clusters of Duplicate Documents: An Approach Based on Frequent Closed Itemsets

Author: Игнатов Д. И.
Кузнецов С. О.
Объедков С. А.
Самохин М. В.
Publication venue: б. и.
Publication date: 01/01/2005
Field of study

Множество документов в Интернете имеют дубликаты, в связи с чем необходимы средства эффективного вычисления кластеров документов-дубликатов [1-5, 8-10, 13-14]. В работе исследуется применение алгоритмов Data Mining для поиска кластеров дубликатов с использованием синтаксических и лексических методов составления образов документов. На основе экспериментальной работы делаются некоторые выводы о способе выбора параметров методов.A vast amount of documents in the Web have duplicates, which necessitates creation of efficient methods for computing clusters of duplicates [1-5, 8-10, 13-14]. In this paper some algorithms of Data Mining are used for constructing clusters of duplicate documents (duplicates), documents being represented by both syntactic and lexical methods. Series of experiments suggest some conclusions about choosing parameters of the methods

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

THE METHOD FOR DETECTING PLAGIARISM IN A COLLECTION OF DOCUMENTS

Author: Iryna SHVOROB
Natalya SHAKHOVSKA
Publication venue: Polish Association for Knowledge Promotion
Publication date: 01/01/2015
Field of study

The development of the intelligent system for searching for plagiarism by combining two algorithms of searching fuzzy duplicate is considered in this article. This combining contributed to the high computational efficiency. Another advantage of the algorithm is its high efficiency when small-sized documents are compared. The practical use of the algorithm makes it possible to improve the quality of the detection of plagiarism. Also, this algorithm can be used in different systems text search

Crossref

Biblioteka Nauki - repozytorium artykuÅÃ³w

Directory of Open Access Journals

The problem of fuzzy duplicate detection of large texts

Author: E.V. Sharapova
R.V. Sharapov
Publication venue: Новая техника
Publication date: 01/01/2018
Field of study

Основная статьяIn the paper, we considered the problem of fuzzy duplicate detection. There are given the basic approaches to detection of text duplicates – distance between strings, fuzzy search algorithms without indexing data, fuzzy search algorithms with indexing data. The review of existing methods for the fuzzy duplicate detection is given. The algorithm of fuzzy duplicate detection is present. The algorithm of fuzzy duplicate texts detection was implemented in the system AVTOR.NET. The use of filtering text, stemming and character replacement, allow the algorithm to found duplicates even in minor modified texts

Samara University

Efficient partial-duplicate detection based on sequence matching

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

Crossref