372 research outputs found

    Shape-Based Plagiarism Detection for Flowchart Figures in Texts

    Full text link
    Plagiarism detection is well known phenomenon in the academic arena. Copying other people is considered as serious offence that needs to be checked. There are many plagiarism detection systems such as turn-it-in that has been developed to provide this checks. Most, if not all, discard the figures and charts before checking for plagiarism. Discarding the figures and charts results in look holes that people can take advantage. That means people can plagiarized figures and charts easily without the current plagiarism systems detecting it. There are very few papers which talks about flowcharts plagiarism detection. Therefore, there is a need to develop a system that will detect plagiarism in figures and charts. This paper presents a method for detecting flow chart figure plagiarism based on shape-based image processing and multimedia retrieval. The method managed to retrieve flowcharts with ranked similarity according to different matching sets.Comment: 12 page

    TopRoBERTa: Topology-Aware Authorship Attribution of Deepfake Texts

    Full text link
    Recent advances in Large Language Models (LLMs) have enabled the generation of open-ended high-quality texts, that are non-trivial to distinguish from human-written texts. We refer to such LLM-generated texts as \emph{deepfake texts}. There are currently over 11K text generation models in the huggingface model repo. As such, users with malicious intent can easily use these open-sourced LLMs to generate harmful texts and misinformation at scale. To mitigate this problem, a computational method to determine if a given text is a deepfake text or not is desired--i.e., Turing Test (TT). In particular, in this work, we investigate the more general version of the problem, known as \emph{Authorship Attribution (AA)}, in a multi-class setting--i.e., not only determining if a given text is a deepfake text or not but also being able to pinpoint which LLM is the author. We propose \textbf{TopRoBERTa} to improve existing AA solutions by capturing more linguistic patterns in deepfake texts by including a Topological Data Analysis (TDA) layer in the RoBERTa model. We show the benefits of having a TDA layer when dealing with noisy, imbalanced, and heterogeneous datasets, by extracting TDA features from the reshaped pooled_outputpooled\_output of RoBERTa as input. We use RoBERTa to capture contextual representations (i.e., semantic and syntactic linguistic features), while using TDA to capture the shape and structure of data (i.e., linguistic structures). Finally, \textbf{TopRoBERTa}, outperforms the vanilla RoBERTa in 2/3 datasets, achieving up to 7\% increase in Macro F1 score

    Using Flowchart Technique to Improve Students' Understanding on Indefinite and Definite Articles

    Get PDF
    The purpose of the study was to investigate the effects of using flowchart technique on improving students’ understanding of English article. The researcher used experimental teaching at SMPN 1 Baktiya Barat, Aceh Utara. Two classes of students participated in this research. One class for experiment and the other for control class. The method was held by tests and treatment. The pre-test was given to the participants at the first meeting. The experiment class received the treatment by using flowchart technique, while control class was not. After the treatment was done, the post-test was administered to both classes to see the effects of using flowchart technique. The data were analyzed using Sudjana’s (2002) statistical formula and table of frequency distribution. The result showed that the value of calculation of t-test is higher than that of the t-table. The t-test result is 16.35 > value of t-table 2.019. This means that there was a significant difference between students’ understanding of English article taught using flowchart technique and not using it. In line with this, the alternative hypothesis (Ha) was accepted and zero hypothesis (H0) was rejected. The results of statistical analysis show that the use of flowcharts in the classroom can improve the learning process of students towards the material taught, so that the use of flowchart techniques is considered successful. Additionally, the findings imply that a more appropriate technique to teach grammar is needed, especially in English article

    Continuous expressive speaking styles synthesis based on CVSM and MR-HMM

    Get PDF
    This paper introduces a continuous system capable of automatically producing the most adequate speaking style to synthesize a desired target text. This is done thanks to a joint modeling of the acoustic and lexical parameters of the speaker models by adapting the CVSM projection of the training texts using MR-HMM techniques. As such, we consider that as long as sufficient variety in the training data is available, we should be able to model a continuous lexical space into a continuous acoustic space. The proposed continuous automatic text to speech system was evaluated by means of a perceptual evaluation in order to compare them with traditional approaches to the task. The system proved to be capable of conveying the correct expressiveness (average adequacy of 3.6) with an expressive strength comparable to oracle traditional expressive speech synthesis (average of 3.6) although with a drop in speech quality mainly due to the semi-continuous nature of the data (average quality of 2.9). This means that the proposed system is capable of improving traditional neutral systems without requiring any additional user interaction

    DARIAH and the Benelux

    Get PDF

    Software Plagiarism Detection Using Abstract Syntax Tree and Graph-based Data Mining

    Get PDF
    This study is using a graph-based data mining technique to discover cases of software plagiarism. We hypothesize that repetitive patterns found in the abstract syntax tree (AST) representation of source code will only match such patterns of other source code if the author of both are the same. A graph-based data mining technique was used for analyzing the AST and extracting the patterns. The results from the data miner were compared using a graph matching algorithm, which provided the measure of similarity. We used artificial test sets and actual student assignments for evaluation. The experiments identified plagiarism behaviors in both artificial and real-world data. These findings proved the system to be feasible. This system can be applied to every kind of programming language that use abstract syntax trees for compilation, and these ASTs can easily be extracted using the compiler. An advantage of this system over other plagiarism detectors is that it can deal with partial source code plagiarism behavior, which others do not currently do. Disadvantages of our approach include slow speed because of the graph-based data mining system used, and dependence on compilers to provide the AST. Also, if a source code cannot be compiled, the compiler will not provide a full AST, and the results will be inaccurate.Computer Science Departmen

    Semi-Automated Methods for Measuring Practice Conformance for Capital Projects

    Get PDF
    The goal of this thesis is to explore semi-automated methods for measuring practice conformance for capital projects. Thorough measurement of practice conformance for capital projects typically requires manual audits. Surveys that may assist can often be subjective, non-repeatable and unverifiable, since they are self-reported. However, some of the tasks assigned to auditors are also non-repeatable, and they may be costly, time-consuming, tedious, and error-prone. Tools for assisting practice conformance measurements are in high demand in the construction domain. In response, various information technology-based and web deployed Benchmarking and Metrics (BM&M) programs have been introduced to reduce time and costs, to assist in providing repeatable and accurate results, and to increase efficiency and productivity of reporters and auditors. Moreover, moves toward automated practice conformance measurement are expected to reduce time and cost. Past studies have also resulted in significant advances in data mining, natural language processing, machine learning, computer vision and other artificial intelligence-based approaches toward complete automation, but technical limitations exist that constrain complete automation or make it impractical. An approach is needed to support practical, net beneficial, incremental steps toward automation of practice conformance measurement for capital projects that would assist capital project participants to improve project performance over time. To address this need, a new approach is proposed in this thesis. Additionally, a framework to beneficially increase automation is presented. Toolsets are explored that may make practice conformance measurement cheaper, faster, easier, repeatable, and more accurate for capital project participants. This framework and the toolsets are validated through the development of a practice conformance model, case studies on real project data, and application experiments. It is concluded that the proposed semi-automated framework for measuring practice conformance for capital projects is practical to implement in the near term. These results provide a basis on which capital project participants can implement efficacious practice conformance measurement to support capital project performance improvement programs

    Modified EDA and Backtranslation Augmentation in Deep Learning Models for Indonesian Aspect-Based Sentiment Analysis

    Get PDF
    In the process of developing a business, aspect-based sentiment analysis (ABSA) could help extract customers' opinions on different aspects of the business from online reviews. Researchers have found great prospective in deep learning approaches to solving ABSA tasks. Furthermore, studies have also explored the implementation of text augmentation, such as Easy Data Augmentation (EDA), to improve the deep learning models’ performance using only simple operations. However, when implementing EDA to ABSA, there will be high chances that the augmented sentences could lose important aspects or sentiment-related words (target words) critical for training. Corresponding to that, another study has made adjustments to EDA for English aspect-based sentiment data provided with the target words tag. However, the solution still needs additional modifications in the case of non-tagged data. Hence, in this work, we will focus on modifying EDA that integrates POS tagging and word similarity to not only understand the context of the words but also extract the target words directly from non-tagged sentences. Additionally, the modified EDA is combined with the backtranslation method, as the latter has also shown quite a significant contribution to the model’s performance in several research studies. The proposed method is then evaluated on a small Indonesian ABSA dataset using baseline deep learning models. Results show that the augmentation method could increase the model’s performance on a limited dataset problem. In general, the best performance for aspect classification is achieved by implementing the proposed method, which increases the macro-accuracy and F1, respectively, on Long Short-Term Memory (LSTM) and Bidirectional LSTM models compared to the original EDA. The proposed method also obtained the best performance for sentiment classification using a convolutional neural network, increasing the overall accuracy by 2.2% and F1 by 3.2%. Doi: 10.28991/ESJ-2023-07-01-018 Full Text: PD

    Network theory and CAD collections

    Get PDF
    Graph and network theory have become commonplace in modern life. So widespread in fact that most people not only understand the basics of what a network is, but are adept at using them and do so daily. This has not long been the case however and the relatively quick growth and uptake of network technology has sparked the interest of many scientists and researchers. The Science of Networks has sprung up, showing how networks are useful in connecting molecules and particles, computers and web pages, as well as people. Despite being shown to be effective in many areas, network theory has yet to be applied to mechanical engineering design. This work makes use of network science advances and explores how they can impact Computer Aided Design (CAD) data. CAD data is considered the most valuable design data within mechanical engineering and two places large collections are found are educational institutes and industry. This work begins by exploring 5 novel networks of different sized CAD collections, where metrics and network developments are assessed. From there collections from educational and industrial settings are explored in depth, with novel methods and visualisations being presented. The results of this investigation show that network science provides interesting analysis of CAD collections and two key discoveries are presented: network metrics and visualisations are shown to be effective at highlighting plagiarism in collections of students' CAD submissions. Also when used to assess collections of real world company data, network theory is shown to provide unique metrics for analysis and characterising collections of CAD and associated data

    Understanding the Authorial Writer: a mixed methods approach to the psychology of authorial identity in relation to plagiarism.

    Get PDF
    Academic writing is an important part of undergraduate study that tutors recognise as central to success in higher education. Across the academy, writing is used to assess, develop and facilitate student learning. However, there are growing concerns that students appropriate written work from other sources and present it as their own, committing the academic offence of plagiarism. Conceptualising plagiarism as literary theft, current institutional practices concentrate on deterring and detecting behaviours that contravene the rules of the academy. Plagiarism is a topic that often elicits an emotional response in academic tutors, who are horrified that students commit these ‘crimes’. Recently, educators have suggested that deterring and detecting plagiarism is ineffective and described moralistic conceptualisations of plagiarism as unhelpful. These commentaries highlight the need for credible alternative approaches to plagiarism that include pedagogic aspects of academic writing. The authorial identity approach to reducing plagiarism concentrates on developing understanding of authorship in students using pedagogy. This thesis presents three studies that contribute to the authorial identity approach to student plagiarism. Building on the findings of previous research, the current studies used a sequential mixed-methods approach to expand psychological knowledge concerning authorial identity in higher education contexts. The first, qualitative, study used thematic analysis of interviews with 27 professional academics teaching at institutions in the United Kingdom. The findings from this multidisciplinary sample identified that academics understood authorial identity as composed of five themes; an individual with authorial identity had confidence; valued writing; felt attachment and ownership of their writing; thought independently and critically; and had rhetorical goals. In addition, the analysis identified two integrative themes representing aspects of authorial identity that underlie all of the other themes: authorial identity as ‘tacit knowledge’ and authorial identity as ‘negotiation of identities’. The themes identified in the first study informed important aspects of the two following quantitative studies. The second study used findings from the first study to generate a pool of questionnaire items, assess their content validity and administer them to a multidisciplinary sample of 439 students in higher education. Psychometric analyses were used to identify a latent variable model of student authorial identity with three factors: ‘authorial confidence’, ‘valuing writing’ and ‘identification with author’. This model formed the basis of a new psychometric tool for measuring authorial identity. The resultant Student Attitudes and Beliefs about Authorship Scale (SABAS) had greater reliability and validity when compared with alternative measures. The third study used confirmatory factor analysis to validate the SABAS model with a sample of 306 students. In addition, this study identified aspects of convergent validity and test-retest reliability that allow the SABAS to be used with confidence in research and pedagogy. The overall findings of the combined studies present a psycho-social model of student authorial identity. This model represents an important contribution to the theoretical underpinnings of the authorial identity approach to student plagiarism. Differing from previous models by including social aspects of authorial identity, the psycho-social model informs future pedagogy development and research by outlining a robust, empirically supported theoretical framework.University of Derby Teaching Informed Research studentship
    corecore