980 research outputs found

    Evaluating the Robustness of Learning Analytics Results Against Fake Learners

    Get PDF
    Massive Open Online Courses (MOOCs) collect large amounts of rich data. A primary objective of Learning Analytics (LA) research is studying these data in order to improve the pedagogy of interactive learning environments. Most studies make the underlying assumption that the data represent truthful and honest learning activity. However, previous studies showed that MOOCs can have large cohorts of users that break this assumption and achieve high performance through behaviors such as Cheating Using Multiple Accounts or unauthorized collaboration, and we therefore denote them fake learners. Because of their aberrant behavior, fake learners can bias the results of Learning Analytics (LA) models. The goal of this study is to evaluate the robustness of LA results when the data contain a considerable number of fake learners. Our methodology follows the rationale of ‘replication research’. We challenge the results reported in a well-known, and one of the first LA/PedagogicEfficacy MOOC papers, by replicating its results with and without the fake learners (identified using machine learning algorithms). The results show that fake learners exhibit very different behavior compared to true learners. However, even though they are a significant portion of the student population (∌15%), their effect on the results is not dramatic (does not change trends). We conclude that the LA study that we challenged was robust against fake learners. While these results carry an optimistic message on the trustworthiness of LA research, they rely on data from one MOOC. We believe that this issue should receive more attention within the LA research community, and can explain some ‘surprising’ research results in MOOCs. Keywords: Learning Analytics, Educational Data Mining, MOOCs, Fake Learners, Reliability, IR

    Detecting AI generated text using neural networks

    Get PDF
    For humans, distinguishing machine generated text from human written text is men- tally taxing and slow. NLP models have been created to do this more effectively and faster. But, what if some adversarial changes have been added to the machine generated text? This thesis discusses this issue and text detectors in general. The primary goal of this thesis is to describe the current state of text detectors in research and to discuss a key adversarial issue in modern NLP transformers. To describe the current state of text detectors a Systematic Literature Review was done on 50 relevant papers to machine-centric detection in chapter 2. As for the key ad- versarial issue, chapter 3 describes an experiment where RoBERTa was used to test transformers against simple mutations which cause mislabelling. The state of the literature was written at length in the 2nd chapter, showing how viable text detection as a subject has become. Lastly, RoBERTa was shown to be vulnerable to mutation attacks. The solution was found to be fine-tuning it to some heuristics, as long as the mutations can be predicted the model can be fine tuned to detect them

    Detecting ChatGPT: A Survey of the State of Detecting ChatGPT-Generated Text

    Full text link
    While recent advancements in the capabilities and widespread accessibility of generative language models, such as ChatGPT (OpenAI, 2022), have brought about various benefits by generating fluent human-like text, the task of distinguishing between human- and large language model (LLM) generated text has emerged as a crucial problem. These models can potentially deceive by generating artificial text that appears to be human-generated. This issue is particularly significant in domains such as law, education, and science, where ensuring the integrity of text is of the utmost importance. This survey provides an overview of the current approaches employed to differentiate between texts generated by humans and ChatGPT. We present an account of the different datasets constructed for detecting ChatGPT-generated text, the various methods utilized, what qualitative analyses into the characteristics of human versus ChatGPT-generated text have been performed, and finally, summarize our findings into general insightsComment: Published in the Proceedings of the Student Research Workshop associated with RANLP-202

    Artificial Intelligence and Machine Learning in Cybersecurity: Applications, Challenges, and Opportunities for MIS Academics

    Get PDF
    The availability of massive amounts of data, fast computers, and superior machine learning (ML) algorithms has spurred interest in artificial intelligence (AI). It is no surprise, then, that we observe an increase in the application of AI in cybersecurity. Our survey of AI applications in cybersecurity shows most of the present applications are in the areas of malware identification and classification, intrusion detection, and cybercrime prevention. We should, however, be aware that AI-enabled cybersecurity is not without its drawbacks. Challenges to AI solutions include a shortage of good quality data to train machine learning models, the potential for exploits via adversarial AI/ML, and limited human expertise in AI. However, the rewards in terms of increased accuracy of cyberattack predictions, faster response to cyberattacks, and improved cybersecurity make it worthwhile to overcome these challenges. We present a summary of the current research on the application of AI and ML to improve cybersecurity, challenges that need to be overcome, and research opportunities for academics in management information systems

    Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning

    Get PDF
    Learning-based pattern classifiers, including deep networks, have shown impressive performance in several application domains, ranging from computer vision to cybersecurity. However, it has also been shown that adversarial input perturbations carefully crafted either at training or at test time can easily subvert their predictions. The vulnerability of machine learning to such wild patterns (also referred to as adversarial examples), along with the design of suitable countermeasures, have been investigated in the research field of adversarial machine learning. In this work, we provide a thorough overview of the evolution of this research area over the last ten years and beyond, starting from pioneering, earlier work on the security of non-deep learning algorithms up to more recent work aimed to understand the security properties of deep learning algorithms, in the context of computer vision and cybersecurity tasks. We report interesting connections between these apparently-different lines of work, highlighting common misconceptions related to the security evaluation of machine-learning algorithms. We review the main threat models and attacks defined to this end, and discuss the main limitations of current work, along with the corresponding future challenges towards the design of more secure learning algorithms.Comment: Accepted for publication on Pattern Recognition, 201

    Survey of review spam detection using machine learning techniques

    Get PDF

    Review Paper on Enhancing COVID-19 Fake News Detection With Transformer Model

    Get PDF
    The growing propagation of disinformation about the COVID-19 epidemic needs powerful fake news detection technologies. This review provides an in-depth examination of existing techniques, including traditional machine learning methods such as Random Forest and Naive Bayes, as well as sophisticated models for deep learning such as Bi- GRU, CNN, and LSTM, RNN, & transformer-based architecture such as BERT and XLM- Roberta, are also available. One noticeable development is the merging of traditional algorithmswith sophisticated transformers, which emphasize the quest of improved accuracy and flexibility.However, important research gaps have been identified. There has been little research on cross- lingual detection algorithms, revealing a substantial gap in multilingual false news detection, which is critical in the global context of COVID-19 information spread. Furthermore, the researchemphasizes the need of flexible methodologies by emphasizing the need for appropriate preprocessing strategies for various content types. Furthermore, the lack of common assessment measures is a barrier, underlining the need of unified frameworks for successfully benchmarking and comparing models. This analysis provides light on the changing COVID-19 false news detection environment, emphasizing the need for novel, adaptive, and internationally relevant approaches to successfully address the ubiquitous dissemination of disinformation during the current pandemic

    A framework for strategic planning of data analytics in the educational sector

    Get PDF
    The field of big data and data analysis is not a new one. Big data systems have been investigated with respect to the volume of the data and how it is stored, the data velocity and how it is subject to change, variety of data to be analysed and data veracity referring to integrity and quality. Higher Education Institutions (HEIs) have a significant range of data sources across their operations and increasingly invest in collecting, analysing and reporting on their data in order to improve their efficiency. Data analytics and Business Intelligence (BI) are two terms that are increasingly popular over the past few years in the relevant literature with emphasis on their impact in the education sector. There is a significant volume of literature discussing the benefits of data analytics in higher education and even more papers discussing specific case studies of institutions resorting on BI by deploying various data analytics practices. Nevertheless, there is a lack of an integrated framework that supports HEIs in using learning analytics both at strategic and operational level. This research study was driven by the need to offer a point of reference for universities wishing to make good use of the plethora of data they can access. Increasingly institutions need to become ‘smart universities’ by supporting their decisions with findings from the analysis of their operations. The Business Intelligence strategies of many universities seems to focus mostly on identifying how to collect data but fail to address the most important issue that is how to analyse the data, what to do with the findings and how to create the means for a scalable use of learning analytics at institutional level. The scope of this research is to investigate the different factors that affect the successful deployment of data analytics in educational contexts focusing both on strategic and operational aspects of academia. The research study attempts to identify those elements necessary for introducing data analytics practices across an institution. The main contribution of the research is a framework that models the data collection, analysis and visualisation in higher education. The specific contribution to the field comes in the form of generic guidelines for strategic planning of HEI data analytics projects, combined with specific guidelines for staff involved in the deployment of data analytics to support certain institutional operations. The research is based on a mixed method approach that combines grounded theory in the form of extensive literature review, state-of-the-art investigation and case study analysis, as well as a combination of qualitative and quantitative data collection. The study commences with an extensive literature review that identifies the key factors affecting the use of learning analytics. Then the research collected more information from an analysis of a wide range of case studies showing how learning analytics are used across HEIs. The primary data collection concluded with a series of focus groups and interviews assessing the role of learning analytics in universities. Next, the research focused on a synthesis of guidelines for using learning analytics both at strategic and operational levels, leading to the production of generic and specific guidelines intended for different university stakeholders. The proposed framework was revised twice to create an integrated point of reference for HEIs that offers support across institutions in scalable and applicable way that can accommodate the varying needs met at different HEIs. The proposed framework was evaluated by the same participants in the earlier focus groups and interviews, providing a qualitative approach in evaluating the contributions made during this research study. The research resulted in the creation of an integrated framework that offers HEIs a reference for setting up a learning analytics strategy, adapting institutional policies and revising operations across faculties and departments. The proposed C.A.V. framework consists of three phases including Collect, Analysis and Visualisation. The framework determines the key features of data sources and resulting dashboards but also a list of functions for the data collection, analysis and visualisation stages. At strategic level, the C.A.V. framework enables institutions to assess their learning analytics maturity, determine the learning analytics stages that they are involved in, identify the different learning analytics themes and use a checklist as a reference point for their learning analytics deployment. Finally, the framework ensures that institutional operations can become more effective by determining how learning analytics provide added value across different operations, while assessing the impact of learning analytics on stakeholders. The framework also supports the adoption of learning analytics processes, the planning of dashboard contents and identifying factors affecting the implementation of learning analytics
    • 

    corecore