16 research outputs found

    THE METHOD FOR DETECTING PLAGIARISM IN A COLLECTION OF DOCUMENTS

    Get PDF
    The development of the intelligent system for searching for plagiarism by combining two algorithms of searching fuzzy duplicate is considered in this article. This combining contributed to the high computational efficiency. Another advantage of the algorithm is its high efficiency when small-sized documents are compared. The practical use of the algorithm makes it possible to improve the quality of the detection of plagiarism. Also, this algorithm can be used in different systems text search

    Метод побудови текстового шаблону для екстракції інформації зі слабоструктурованих даних

    No full text
    80% світових даних є неструктурованими або слабоструктурованими. У зв’язку з цим, актуальною є проблема екстракції інформації та її подальше збереження у зручній для опрацювання формі. Для зручності екстракції даних у роботі запропоновано використання текстових шаблонів на основі словника ключових слів. Основною метою є розроблення методу виділення складових елементів для побудови текстового шаблону, а також розроблення методу кластеризації текстового шаблону. Проведено аналіз розроблених методів на прикладі роботи бібліотечної системи.80% of world data is unstructured or semistructured. In this regard, the main task is the problem of extraction of information and its further preservation in a form suitable for processing. For the convenience of data extraction, we suggest using text templates based on the dictionary of keywords. The main goal is to develop a method for selecting component elements for constructing a text template, as well as developing a method for clustering a text template. The analysis of the developed methods on the example of work of the library system is carried out

    Solving the broken link problem in Walden's Paths

    Get PDF
    With the extent of the web expanding at an increasing rate, the problems caused by broken links are reaching epidemic proportions. Studies have indicated that a substantial number of links on the Internet are broken. User surveys indicate broken links are considered the third biggest problem faced on the Internet. Currently Walden's Paths Path Manager tool is capable of detecting the degree and type of change within a page in a path. Although it also has the ability to highlight missing pages or broken links, it has no method of correcting them thus leaving the broken link problem unsolved. This thesis proposes a solution to this problem in Walden's Paths. The solution centers on the idea that "significant" keyphrases extracted from the original page can be used to accurately locate the document using a search engine. This thesis proposes an algorithm to extract representative keyphrases to locate exact copies of the original page. In the absence of an exact copy, a similar but separate algorithm is used to extract keyphrases that will help locating similar pages that can be substituted in place of the missing page. Both sets of keyphrases are stored as additions to the page signature in the Path Manager tool and can be used when the original page is removed from its current location on the Web
    corecore