2,190 research outputs found
Curricular Analytics in Higher Education
The dissertation addresses different aspects of student success in higher education. Numerous factors may impact a student\u27s ability to succeed and ultimately graduate, including pre-university preparation, as well as the student support services provided by a university. However, even the best efforts to improve in these areas may fail if other institutional factors overwhelm their ability to facilitate student progress. This dissertation addresses this issue from the perspective of curriculum structure. The structural properties of individual curricula are studied, and the extent to which this structure impacts student progress is explored. The structure of curricula are studied using actual university data and analyzed by applying different data mining techniques, machine learning methods and graph theory. These techniques and methods provide a mathematical tool to quantify the complexity of a curriculum structure. The results presented in this work show that there is an inverse correlation between the complexity of a curriculum and the graduation rate of students attempting that curriculum. To make it more practical, this study was extended further to implement a number of predictive models that give colleges and universities the ability to track the progress of their students in order to improve retention and graduation rates. These models accurately predict the performance of students in subsequent terms and accordingly could be used to provide early intervention alerts. The dissertation addresses another important aspect related to curricula. Specifically, how course enrollment sequences in a curriculum impact student progress. Thus, graduation rates could be improved by directing students to follow better course sequences. The novelty of the models presented in this dissertation is characterized in introducing graduation rate, for the first time in literature, from the perspective of curricular complexity. This provides the faculty and staff the ability to better advise students earlier in their academic careers
Exploiting Cognitive Structure for Adaptive Learning
Adaptive learning, also known as adaptive teaching, relies on learning path
recommendation, which sequentially recommends personalized learning items
(e.g., lectures, exercises) to satisfy the unique needs of each learner.
Although it is well known that modeling the cognitive structure including
knowledge level of learners and knowledge structure (e.g., the prerequisite
relations) of learning items is important for learning path recommendation,
existing methods for adaptive learning often separately focus on either
knowledge levels of learners or knowledge structure of learning items. To fully
exploit the multifaceted cognitive structure for learning path recommendation,
we propose a Cognitive Structure Enhanced framework for Adaptive Learning,
named CSEAL. By viewing path recommendation as a Markov Decision Process and
applying an actor-critic algorithm, CSEAL can sequentially identify the right
learning items to different learners. Specifically, we first utilize a
recurrent neural network to trace the evolving knowledge levels of learners at
each learning step. Then, we design a navigation algorithm on the knowledge
structure to ensure the logicality of learning paths, which reduces the search
space in the decision process. Finally, the actor-critic algorithm is used to
determine what to learn next and whose parameters are dynamically updated along
the learning path. Extensive experiments on real-world data demonstrate the
effectiveness and robustness of CSEAL.Comment: Accepted by KDD 2019 Research Track. In Proceedings of the 25th ACM
SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD'19
Promoting Generalization for Exact Solvers via Adversarial Instance Augmentation
Machine learning has been successfully applied to improve the efficiency of
Mixed-Integer Linear Programming (MILP) solvers. However, the learning-based
solvers often suffer from severe performance degradation on unseen MILP
instances -- especially on large-scale instances from a perturbed environment
-- due to the limited diversity of training distributions. To tackle this
problem, we propose a novel approach, which is called Adversarial Instance
Augmentation and does not require to know the problem type for new instance
generation, to promote data diversity for learning-based branching modules in
the branch-and-bound (B&B) Solvers (AdaSolver). We use the bipartite graph
representations for MILP instances and obtain various perturbed instances to
regularize the solver by augmenting the graph structures with a learned
augmentation policy. The major technical contribution of AdaSolver is that we
formulate the non-differentiable instance augmentation as a contextual bandit
problem and adversarially train the learning-based solver and augmentation
policy, enabling efficient gradient-based training of the augmentation policy.
To the best of our knowledge, AdaSolver is the first general and effective
framework for understanding and improving the generalization of both
imitation-learning-based (IL-based) and reinforcement-learning-based (RL-based)
B&B solvers. Extensive experiments demonstrate that by producing various
augmented instances, AdaSolver leads to a remarkable efficiency improvement
across various distributions
Taken Together: Conceptualizing Students’ Concurrent Course Enrollment across the Post-Secondary Curriculum using temporal analytics
In this study, we develop and test four measures for conceptualizing the potential impact of co-enrollment in different courses on students’ changing risk for academic difficulty and recovery from academic difficulty in a focal course. We offer four predictors, two related to instructional complexity and two related to structural complexity (the organization of the curriculum) that highlight different trends in student experience of the focal course. Course difficulty, discipline of major, time in semester, and simultaneous difficulty across courses were all significantly related to entering a moderate and high-risk classification in the early warning system (EWS). Course difficulty, discipline of major, and time in semester were related to exiting academic difficulty classifications. We observe a snowball effect, whereby students who are experiencing difficulty in the focal course are at increased risk of experiencing difficulty in their other courses. Our findings suggest that different metrics may be needed to identify risk for academic difficulty and recovery from academic difficulty. Our results demonstrate what a more holistic assessment of academic functioning might look like in early warning systems and course recommender systems, and suggest that academic planners consider the relationship between course co-enrollment and student academic success
Computing Competencies for Undergraduate Data Science Curricula: ACM Data Science Task Force
At the August 2017 ACM Education Council meeting, a task force was formed to explore a process to add to the broad, interdisciplinary conversation on data science, with an articulation of the role of computing discipline-specific contributions to this emerging field. Specifically, the task force would seek to define what the computing/computational contributions are to this new field, and provide guidance on computing-specific competencies in data science for departments offering such programs of study at the undergraduate level.
There are many stakeholders in the discussion of data science – these include colleges and universities that (hope to) offer data science programs, employers who hope to hire a workforce with knowledge and experience in data science, as well as individuals and professional societies representing the fields of computing, statistics, machine learning, computational biology, computational social sciences, digital humanities, and others. There is a shared desire to form a broad interdisciplinary definition of data science and to develop curriculum guidance for degree programs in data science.
This volume builds upon the important work of other groups who have published guidelines for data science education. There is a need to acknowledge the definition and description of the individual contributions to this interdisciplinary field. For instance, those interested in the business context for these concepts generally use the term “analytics”; in some cases, the abbreviation DSA appears, meaning Data Science and Analytics.
This volume is the third draft articulation of computing-focused competencies for data science. It recognizes the inherent interdisciplinarity of data science and situates computing-specific competencies within the broader interdisciplinary space
Probabilistic models for mining imbalanced relational data
Most data mining and pattern recognition techniques are designed for learning from at data files with the assumption of equal populations per class. However, most real-world data are stored as rich relational databases that generally have imbalanced class distribution. For such domains, a rich relational technique is required to accurately model the different objects and relationships in the domain, which can not be easily represented as a set of simple attributes, and at the same time handle the imbalanced class problem.Motivated by the significance of mining imbalanced relational databases that represent the majority of real-world data, learning techniques for mining imbalanced relational domains are investigated. In this thesis, the employment of probabilistic models in mining relational databases is explored. In particular, the Probabilistic Relational Models (PRMs) that were proposed as an extension of the attribute-based Bayesian Networks. The effectiveness of PRMs in mining real-world databases was explored by learning PRMs from a real-world university relational database. A visual data mining tool is also proposed to aid the interpretation of the outcomes of the PRM learned models.Despite the effectiveness of PRMs in relational learning, the performance of PRMs as predictive models is significantly hindered by the imbalanced class problem. This is due to the fact that PRMs share the assumption common to other learning techniques of relatively balanced class distributions in the training data. Therefore, this thesis proposes a number of models utilizing the effectiveness of PRMs in relational learning and extending it for mining imbalanced relational domains.The first model introduced in this thesis examines the problem of mining imbalanced relational domains for a single two-class attribute. The model is proposed by enriching the PRM learning with the ensemble learning technique. The premise behind this model is that an ensemble of models would attain better performance than a single model, as misclassification committed by one of the models can be often correctly classified by others.Based on this approach, another model is introduced to address the problem of mining multiple imbalanced attributes, in which it is important to predict several attributes rather than a single one. In this model, the ensemble bagging sampling approach is exploited to attain a single model for mining several attributes. Finally, the thesis outlines the problem of imbalanced multi-class classification and introduces a generalized framework to handle this problem for both relational and non-relational domains
Active Learning for Reducing Labeling Effort in Text Classification Tasks
Labeling data can be an expensive task as it is usually performed manually by
domain experts. This is cumbersome for deep learning, as it is dependent on
large labeled datasets. Active learning (AL) is a paradigm that aims to reduce
labeling effort by only using the data which the used model deems most
informative. Little research has been done on AL in a text classification
setting and next to none has involved the more recent, state-of-the-art Natural
Language Processing (NLP) models. Here, we present an empirical study that
compares different uncertainty-based algorithms with BERT as the used
classifier. We evaluate the algorithms on two NLP classification datasets:
Stanford Sentiment Treebank and KvK-Frontpages. Additionally, we explore
heuristics that aim to solve presupposed problems of uncertainty-based AL;
namely, that it is unscalable and that it is prone to selecting outliers.
Furthermore, we explore the influence of the query-pool size on the performance
of AL. Whereas it was found that the proposed heuristics for AL did not improve
performance of AL; our results show that using uncertainty-based AL with
BERT outperforms random sampling of data. This difference in
performance can decrease as the query-pool size gets larger.Comment: Accepted as a conference paper at the joint 33rd Benelux Conference
on Artificial Intelligence and the 30th Belgian Dutch Conference on Machine
Learning (BNAIC/BENELEARN 2021). This camera-ready version submitted to
BNAIC/BENELEARN, adds several improvements including a more thorough
discussion of related work plus an extended discussion section. 28 pages
including references and appendice
- …