40,686 research outputs found

    Impact of Biases in Big Data

    Get PDF
    The underlying paradigm of big data-driven machine learning reflects the desire of deriving better conclusions from simply analyzing more data, without the necessity of looking at theory and models. Is having simply more data always helpful? In 1936, The Literary Digest collected 2.3M filled in questionnaires to predict the outcome of that year's US presidential election. The outcome of this big data prediction proved to be entirely wrong, whereas George Gallup only needed 3K handpicked people to make an accurate prediction. Generally, biases occur in machine learning whenever the distributions of training set and test set are different. In this work, we provide a review of different sorts of biases in (big) data sets in machine learning. We provide definitions and discussions of the most commonly appearing biases in machine learning: class imbalance and covariate shift. We also show how these biases can be quantified and corrected. This work is an introductory text for both researchers and practitioners to become more aware of this topic and thus to derive more reliable models for their learning problems

    Stability and sensitivity of Learning Analytics based prediction models

    Get PDF
    Learning analytics seek to enhance the learning processes through systematic measurements of learning related data and to provide informative feedback to learners and educators. Track data from Learning Management Systems (LMS) constitute a main data source for learning analytics. This empirical contribution provides an application of Buckingham Shum and Deakin Crick’s theoretical framework of dispositional learning analytics: an infrastructure that combines learning dispositions data with data extracted from computer-assisted, formative assessments and LMSs. In two cohorts of a large introductory quantitative methods module, 2049 students were enrolled in a module based on principles of blended learning, combining face-to-face Problem-Based Learning sessions with e-tutorials. We investigated the predictive power of learning dispositions, outcomes of continuous formative assessments and other system generated data in modelling student performance and their potential to generate informative feedback. Using a dynamic, longitudinal perspective, computer-assisted formative assessments seem to be the best predictor for detecting underperforming students and academic performance, while basic LMS data did not substantially predict learning. If timely feedback is crucial, both use-intensity related track data from e-tutorial systems, and learning dispositions, are valuable sources for feedback generation

    A Data Science Course for Undergraduates: Thinking with Data

    Get PDF
    Data science is an emerging interdisciplinary field that combines elements of mathematics, statistics, computer science, and knowledge in a particular application domain for the purpose of extracting meaningful information from the increasingly sophisticated array of data available in many settings. These data tend to be non-traditional, in the sense that they are often live, large, complex, and/or messy. A first course in statistics at the undergraduate level typically introduces students with a variety of techniques to analyze small, neat, and clean data sets. However, whether they pursue more formal training in statistics or not, many of these students will end up working with data that is considerably more complex, and will need facility with statistical computing techniques. More importantly, these students require a framework for thinking structurally about data. We describe an undergraduate course in a liberal arts environment that provides students with the tools necessary to apply data science. The course emphasizes modern, practical, and useful skills that cover the full data analysis spectrum, from asking an interesting question to acquiring, managing, manipulating, processing, querying, analyzing, and visualizing data, as well communicating findings in written, graphical, and oral forms.Comment: 21 pages total including supplementary material

    What learning analytics based prediction models tell us about feedback preferences of students

    Get PDF
    Learning analytics (LA) seeks to enhance learning processes through systematic measurements of learning related data and to provide informative feedback to learners and educators (Siemens & Long, 2011). This study examined the use of preferred feedback modes in students by using a dispositional learning analytics framework, combining learning disposition data with data extracted from digital systems. We analyzed the use of feedback of 1062 students taking an introductory mathematics and statistics course, enhanced with digital tools. Our findings indicated that compared with hints, fully worked-out solutions demonstrated a stronger effect on academic performance and acted as a better mediator between learning dispositions and academic performance. This study demonstrated how e-learners and their data can be effectively re-deployed to provide meaningful insights to both educators and learners

    Student profiling in a dispositional learning analytics application using formative assessment

    No full text
    How learning disposition data can help us translating learning feedback from a learning analytics application into actionable learning interventions, is the main focus of this empirical study. It extends previous work where the focus was on deriving timely prediction models in a data rich context, encompassing trace data from learning management systems, formative assessment data, e-tutorial trace data as well as learning dispositions. In this same educational context, the current study investigates how the application of cluster analysis based on e-tutorial trace data allows student profiling into different at-risk groups, and how these at-risk groups can be characterized with the help of learning disposition data. It is our conjecture that establishing a chain of antecedent-consequence relationships starting from learning disposition, through student activity in e-tutorials and formative assessment performance, to course performance, adds a crucial dimension to current learning analytics studies: that of profiling students with descriptors that easily lend themselves to the design of educational interventions
    • …
    corecore