7 research outputs found
Пошук і реферування в системі електронного документообігу
Робота присвячена проблемі пошуку документів у масиві за атрибутами та на основі повнотекстового пошуку. Представлено модифікований метод рубрикації та метод реферування на основі рубрикації. Показано переваги використання цього підходу на прикладі системи електронного документообігу SmartBase.SEDO.This work deals with the problem of document search in arrays by attributes and uses full-text search technology. Modification of rubrication method is presented and abstracting rubrication-based method is developed. The advantages of this conception usage is demonstrated on the electronic documents circulation system SmartBase.SEDO
THE METHOD FOR DETECTING PLAGIARISM IN A COLLECTION OF DOCUMENTS
The development of the intelligent system for searching for plagiarism by combining two algorithms of searching fuzzy duplicate is considered in this article. This combining contributed to the high computational efficiency. Another advantage of the algorithm is its high efficiency when small-sized documents are compared. The practical use of the algorithm makes it possible to improve the quality of the detection of plagiarism. Also, this algorithm can be used in different systems text search
Facilitating Reading through a Theme-Driven Approach
Readers often encounter the need to explore a document only for a specific point
of interest. We call the phenomena of approaching a narrative not for its entirety, but for
a thread of a particular topic, thematic reading. Present reading tools and information
retrieval techniques provide only limited assistance to readers in such a situation. Our
research centers on this phenomenon. We conducted investigations on both human
behavior and machine automation, with a goal of better meeting the requirements of
thematic reading.
To observe readers? behavior and understand their expectations, we implemented
a reader?s interface with designs targeting the predicted needs of thematic readers. We
conducted user studies using both the system and Microsoft Word. We proved that
thematic reading is capable of achieving the goal of understanding a specific topic, at
least to a degree that succeeds in topic-wise tasks. We also reached guidelines for
designing future reading platforms in major aspects such as view, navigation, and
contextual awareness. As for machine automation, we investigated the potential to automatically locate
thematically relevant excerpts. This investigation was inspired by the editorial
compilation of a textbook index. To increase the search performance, we proposed a
two-step methodology which first expands the query with expansion and then filters the
intermediate results by checking the term-occurrence proximity. For query expansion,
we compared the query expansion with WordNet, morphological inflections, and both
processes together. Our results show that in the context of our study, WordNet made
almost no contribution to the enhancement of recall, while expansion with the
inflectional variants turned out to be a successful and essential scheme. For the
refinement section, the results show that the proximity check on the alternative phrases
formed after inflectional expansion can effectively increase the precision of the
previously acquired return results.
We further tested a different scheme ? using sliding window ? of defining target
and verification units in the methodology. Our findings show that the structural
delimitations (sentences and chapters) outperformed sliding windows. The first scheme
was able to achieve consistently desirable results, while the results from the second were
inconclusive