Search CORE

149 research outputs found

The problem of fuzzy duplicate detection of large texts

Author: E.V. Sharapova
R.V. Sharapov
Publication venue: Новая техника
Publication date: 01/01/2018
Field of study

Основная статьяIn the paper, we considered the problem of fuzzy duplicate detection. There are given the basic approaches to detection of text duplicates – distance between strings, fuzzy search algorithms without indexing data, fuzzy search algorithms with indexing data. The review of existing methods for the fuzzy duplicate detection is given. The algorithm of fuzzy duplicate detection is present. The algorithm of fuzzy duplicate texts detection was implemented in the system AVTOR.NET. The use of filtering text, stemming and character replacement, allow the algorithm to found duplicates even in minor modified texts

Efficient partial-duplicate detection based on sequence matching

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

A text uniqueness checking system for Armenian language

Author: Tomeyan Gohar
Publication venue
Publication date: 01/01/2017
Field of study

The goal of this dissertation is to develop a tool to analyze the similarity of Armenian texts. The idea is to compare two texts or to compare a text with a set of texts and detect the possibility of plagiarism. This system will be used in academic contexts but can also be useful in other situations. In the academic context it is very important to evaluate the uniqueness of reports, scienti c papers and other documents that are everyday disseminated on the web. There are already several tools with this purpose but not for Armenian texts.O objetivo desta dissertação é desenvolver uma ferramenta para analisar a semelhança de textos em arménio. A ideia é comparar dois textos ou comparar um texto com um conjunto de textos e detectar a possibilidade de plágio. Este sistema poderá ser usado em contextos académicos, mas, também pode ser útil em outras situações. No contexto académico, é muito importante avaliar a singularidade de relatórios, artigos científicos e outros documentos que são todos os dias divulgados na web. Já existem várias ferramentas com este propósito mas não para a linguagem arménia.Տեղեկատվական տեխնոլոգիաների զարգացմանը զուգընթաց ավելացել են նաև գրագո֊ ղության դեպքերը։ Հաշվի առնելով այն հանգամանքը, որ կան գրագողությունը ստուգող մի շարք համակարգեր, բայց ոչ մի համակարգ նախատեսված չէ հայերեն տեքստերի ունիկալության վերլուծություն համար, խնդիր դրվեց մշակել այնպիսի համակարգ, որը կապահովի տեղեկատվական համակարգերում տեքստերի ունիկալության վերլուծությունը, ինչպես նաև թույլ կտա համեմատել և հայտնաբերել գրագողության առկայությունը։ Աշ֊ խատանքի նպատակն է ուսումնական գործընթացում ունիկալությունը ստուգող համա֊ կարգերի կիրառումը, քանի որ շատ կարևոր է գնահատել ատենախոսությունների, ռեֆե֊ րատների, կուրսային աշխատանքների և այլ տեքստերի ունիկալության աստիճանը։ Այս նախագիծը հնարավորություն կտա մշակել և հիմնավորել հայերեն տեքստերի ունիկալու֊ թյան համակարգչային վերլուծությունը և կանխել գրագողությունը հայերենում

Biblioteca Digital do IPB

Contour and texture for visual recognition of object categories

Author: Shotton Jamie Daniel Joseph
Publication venue: University of Cambridge
Publication date: 22/05/2007
Field of study

The recognition of categories of objects in images has become a central topic in computer vision. Automatic visual recognition systems are rapidly becoming central to applications such as image search, robotics, vehicle safety systems, and image editing. This work addresses three sub-problems of recognition: image classification, object detection, and semantic segmentation. The task of classification is to determine whether an object of a particular category is present or not. Object detection aims to localize any objects of the category. Semantic segmentation is a more complete image understanding, whereby an image is partitioned into coherent regions that are assigned meaningful class labels. This thesis proposes novel discriminative learning approaches to these problems. Our primary contributions are threefold. Firstly, we demonstrate that the contours (the outline and interior edges) of an object are, alone, sufficient for accurate visual recognition. Secondly, we propose two powerful new feature types: (i) a learned codebook of contour fragments matched with an improved oriented chamfer distance, and (ii) a set of texture-based features that simultaneously exploit local appearance, approximate shape, and appearance context. The efficacy of these new features types is evaluated on a wide variety of datasets. Thirdly, we show how, in combination, these two largely orthogonal feature types can substantially improve recognition performance above that achieved by either alone