    Аналитическая оценка результатов проверки выпускных квалификационных работ студентов средствами систем обнаружения текстовых заимствований

    In this paper there are results of the bachelor and master theses citing analysis. These students graduated from the Higher mathematics chair of the Russian Technological University in the summer of 2018. In this comparative analysis the dependencies of thesis loan percent on parameters of students, statistical values of their theses are explored. This research is actual because of the progress and development of new informational technologies used in the educational system. Popularity of the text loan detection systems increases. Automatic plagiarism detection systems are intended to make educational process better, the text drawing search easier, to support the copyright laws and academical honesty. The percentage is given by two main Russian plagiarism detection systems: Antiplagiat and Rucontext. Connections between thesis parameters are explored. Advantages of each text loan detection systems are described. In this research there are the results of the pedagogical experiment aimed to analyze statistically the dependencies of the bachelor’s and master’s theses loan percentage which have been got from Antiplagiat and Rucontext systems on the author’s parameters, statistical values describing thesis text. The comparison between statistical results of these systems have been made. The conclusions about their advantages have been presented in the paper. In order to make the comparison methods of the mathematical statistics have been used. Numerical experiment has been provided by means of the packages of the R statistical language. The difference between text loan percentages in the Antiplagiat and Rucontext systems has been analyzed. It has been shown that it grows when length of the text becomes larger. The dependencies of the text loan percentage on the available parameters of the thesis author and text parameters have been presented. The dependencies types are the same for the both systems. Scale of the coefficients in the statistical dependencies is also the same. The difference is in the very set of the parameters: the Rucontext percentage is better described statistically with the sex of the author, the Antiplagiat percentage is described with the type of the higher education (bachelor’s or master’s thesis). Also the dependency of the text loan percentage on the length of the thesis text differs: the Antiplagiat percentage is better described statistically with the number of words but the Rucontext percentage is described with the number of characters. It seems that these differences can be explained with different text search and analyze algorithms. The dependencies between the Rucontext percentage and the Antiplagiat text loan percentage is presented.Цель исследования. Цель представленной статьи – аналитическое сравнение результатов обработки выпускных квалификационных работ бакалавров и магистров кафедры Высшей математики Института кибернетики Российского технологического университета (МИРЭА) летом 2018 года с помощью двух систем обнаружения текстовых заимствований: Антиплагиат и Руконтекст. Исследование является актуальным в связи с развитием информационных технологий в образовании и возрастающей популярностью механизмов анализа текста на наличие заимствований путем автоматизированной проверки. Системы, разработанные с целью автоматизации обнаружения текстовых заимствований в различных видах работ, созданы с целью усовершенствования образовательного процесса, упрощения процедуры проверки студенческих работ преподавателями, соблюдения авторских прав, и ориентированы на развитие академической честности. Материалы и методы. Математический анализ результатов был произведен на основе методов математической статистики, непосредственно в вычислительном эксперименте применён пакет статистической обработки данных языка R. Результаты. В представленном исследовании был проведен педагогический эксперимент по статистическому анализу взаимосвязей характеристик выпускных квалификационных работ бакалавров и магистров кафедры Высшей математики Института кибернетики Российского технологического университета (РТУ МИРЭА) летом 2018 года: выявлены зависимости между параметрами, характеризующими конкретного студента, статистическими параметрами, описывающими его работу, и процентом оригинальности, полученным в системах проверки выпускных квалификационных работ на наличие текстовых заимствований Антиплагиат и Руконтекст. Произведено сравнение результатов, полученных при анализе выпускных квалификационных работ в разных системах. Формируются выводы о преимуществах каждой из рассматриваемых систем. При рассмотрении разницы между процентом оригинальности, полученным в системах Антиплагиат и Руконтекст, было выявлено, что с ростом длины текста работы (количества слов) растёт разница между результатами, полученными в этих системах. Заключение. При поиске взаимосвязи между процентом оригинальности работы и статистическими параметрами, описывающими работу, а также доступными параметрами, характеризующими автора, оказалось, что тип зависимости для двух рассматриваемых систем совпадает, и масштаб коэффициентов одинаков. Различия наблюдаются в конкретном наборе параметров: зависимость оригинальности работы от характеристик студентов при использовании системы Руконтекст лучше описывается параметром пола, а в результатах системы Антиплагиат – уровнем образования. Это можно объяснить разным наполнением баз: в базах Антиплагиата больше студенческих работ. Также разные параметры лучше описывают зависимость процента оригинальности от длины текста: для Антиплагиата лучший результат получен при использовании количества символов, а для Руконтекст – числа слов. Эти зависимости, по-видимому, объясняются различными техническими алгоритмами поиска заимствований в тексте. Также в исследовании рассмотрена статистическая зависимость между оригинальностью, полученной в каждой из систем

    A Similarity Detection Method Based on Distance Matrix Model with Row-Column Order penalty Factor

    Paper detection involves multiple disciplines, and making a comprehensive and correct evaluation of academic misconduct is quite a complex and sensitive issue. There are some problems in the existing main detection models, such as incomplete segmentation preprocessing specification, impact of the semantic orders on detection, near-synonym evaluation, slow paper backtrack and so on. This paper presents a sentence-level paper similarity comparison model with segmentation preprocessing based on special identifier. This model integrates the characteristics of vector detection, hamming distance and the longest common substring and carries out detection specific to near-synonyms, word deletion and changes in word order by redefining distance matrix and adding ordinal measures, making sentence similarity detection in terms of semantics and backbone word segmentation more effective. Compared with the traditional paper similarity retrieval, the present method adopts modular-2 arithmetic with low computation. Paper detection method with reliability and high efficiency is of great academic significance in word segmentation, similarity detection and document summarization

    The effectiveness of feature selection methods for ımbalanced text classification

    Metin verilerinin sınıflar arasında dağılımı genellikle eşit değildir. Bu durum, metin sınıflandırma işleminde sınıflandırıcıların performansına olumsuz yansımaktadır. Dengesiz metin sınıflandırma ile ilgili birçok çalışma yapılmıştır. Metin sınıflandırma işleminin önemli aşamalarından olan öznitelik seçim aşaması, dengesiz metin probleminde de kritik öneme sahiptir. Öznitelik seçme metotlarının dengesiz metinlerin sınıflandırılması üzerindeki etkisi bu çalışmada etraflıca araştırılmıştır. Bu doğrultuda, iki farklı veri seti üzerinde üç farklı sınıflandırıcı ve dokuz farklı öznitelik seçim metodu ile birçok deney yapılmıştır. Ayrıca öznitelik seçim yöntemlerinin başarıları farklı öznitelik sayılarında da gözlemlenmiştir. NDM, DFSS, PFS, POISSON, CHI2, IG, GINI, DFS ve MDFS olarak adlandırılan 9 farklı öznitelik seçim metodu değerlendirilmiştir. Destek Vektör Makinesi (SVM), Karar Ağacı (DTREE) ve Basit Bayes (MNB) sınıflandırıcıları ile deneysel sonuçlar elde edilmiştir. Reuters-21578 veri setinde DFS ve CHI2 öznitelik seçim yöntemleri Makro-F1 değerlendirme metriği üzerinden yaklaşık en yüksek 80 değerini alırken, SPAM SMS veri setinde, DFS öznitelik seçim yöntemi en yüksek skor olarak 95 ve CHI2 öznitelik seçim yöntemi 94 değerlerini almıştır. Öznitelik seçme metotlarından DFS ve CHI2’nin dengesiz metin sınıflandırmada daha başarılı olduğu görülmektedir.The distribution of text data across classes is often imbalanced. This situation has a negative impact on the performance of classifiers in the text classification process. Many studies have been performed on imbalanced text classification. The feature selection stage, which is one of the important stages of the text classification process, is also critical in the imbalanced text classification problem. The effect of feature selection methods on the classification of imbalanced texts has been thoroughly investigated in this study. In this direction, many experiments were carried out with three different classifiers and nine different feature selection methods on two different data sets. In addition, the success of feature selection methods has been observed employing different number of features. Nine different feature selection methods called NDM, DFSS, PFS, POISSON, CHI2, IG, GINI, DFS and MDFS were evaluated. Experimental results obtained with Support Vector Machines (SVM), Decision Tree (DTREE), and Naïve Bayes (MNB) classifiers. On the Reuters-21578 dataset, DFS and CHI2 feature selection methods obtained approximately 80 as the highest Macro-F1 score. On the SPAM SMS dataset, DFS feature selection method obtained 95 and CHI2 feature selection method obtained 94 as the highest Macro-F1 score. It is seen that feature selection methods DFS and CHI2 are more successful than the others for imbalanced text classification

    Extracting sensory experiences and cultural ecosystem services from actively crowdsourced descriptions of everyday lived landscapes

    Acknowledgements We would like to thank everyone who took part in Window Expeditions, without you this research would not have been possible! We would also like to extend our gratitude to the anonymous reviewers whose helpful comments improved the quality of this paper. Funding University Research Priority Program (URPP) – Language and Space & Swiss National Science Foundation Grant [P500PT_214436]Peer reviewe

    Connection of the living environment and implementation of activities of the elderly

    Uvod: Za kakovost življenja v starosti je pomembna čim večja samostojnost pri izvajanju aktivnosti. S tem, kako starejše osebe ohranijo to samostojnost, je povezano tudi okolje, tako fizično kot socialno. Za ohranitev samostojnosti je pomembno, da izzivi okolja ne presegajo sposobnosti posameznika. Namen: Namen diplomskega dela je raziskati povezanost med bivalnim okoljem starejših oseb in izvajanjem aktivnosti glede na starost, stopnjo samostojnosti in obliko bivanja. Metode dela: V kvantitativni raziskavi smo podatke pridobili z anketnim vprašalnikom, ki je vseboval štiri sklope. Prvi sklop je bil namenjen pridobitvi demografskih podatkov, drugi sklop je predstavljal Indeks Barthel. Tretji sklop je vseboval 11 trditev o socialnem okolju, pri katerih so anketiranci označili strinjanje na tristopenjski Likertovi lestvici. Četrti sklop je vseboval 17 trditev o fizičnem okolju, na katere so anketiranci odgovarjali z DA oz. NE. Vzorčenje je bilo namensko in priložnostno. Rezultati: Anketiranci v zgodnjem starostnem obdobju so strinjanje s trditvami o socialnem okolju ocenili s povprečno višjimi ocenami pri vseh trditvah kot starejše osebe v poznem starostnem obdobju. Anketiranci v poznem starostnem obdobju so se v večjem deležu strinjali pri večini trditev o fizičnem okolju kot anketiranci v zgodnjem starostnem obdobju. Anketiranci, ki živijo v instituciji, so se v večjem deležu strinjali s trditvami o fizičnem okolju. Razprava in zaključek: Na izvajanje določene aktivnosti vplivajo sposobnosti starejše osebe, sama aktivnost in okolje v katerem se aktivnost izvaja. Vsi ti dejavniki so med seboj tesno povezani, zato mora delovni terapevt pri obravnavah starejših oseb to tudi upoštevati in na njih gledati ter jih obravnavati celostno. Institucionalno fizično okolje je v primerjavi z domačim fizičnim okoljem bolje prilagojeno za bivanje starejših oseb. Delovni terapevt z obravnavami na uporabnikovem domu lahko tudi svetuje glede ustreznega prilagajanja fizičnega okolja konkretni starejši osebi, kar ima lahko vpliv na njihovo samostojnejše izvajanje aktivnosti.Introduction: Maximum independence in carrying out activities is important for the quality of life in old age. The environment, both physically and socially, is also linked to how older people maintain this independence. To maintain independence, the challenges of the environment must not exceed the ability of the individual. Purpose: The purpose of this thesis is to investigate the relationship between the living environment of the elderly and the implementation of activities according to age, level of independence and form of residence. Methods of work: Data were obtained from a survey questionnaire containing four parts. The first part was devoted to obtaining demographic data, the second part was represented by the Barthel Index. The third part contained 11 statements about the social environment, where respondents marked agreement on a 3-point Likert scale. The fourth part contained 17 statements about the physical environment, to which the respondents answered with YES or NO. The sampling was purposeful and occasional. Results: The response rate of the respondents in the early age group agreed with the statements about the social environment was higher than the rate of the elderly in the late age period. Respondents in the late age period agreed with the statements about the physical environment in higher percentage than respondents in the early age group. Higher percentage of respondents, who live in an institution, agreed with the statements about physical environment. Discussion and Conclusion: The execution of activities is influenced by the elderlies’ abilities, activity and environment. All these factors are closely related therefore the occupational therapist must consider this when treating a client and see them as a whole. The institutional physical environment is more adapted to elderlies’ residence than their domestic environments. An occupational therapist could enable more autonomous execution of activities for the elderly by appropriately adapting their domestic environments and performing treatments at their home