6,589 research outputs found

    Semi-supervised techniques for mining learning outcomes and prerequisites

    Get PDF
    Educational content of today no longer only resides in textbooks and classrooms; more and more learning material is found in a free, accessible form on the Internet. Our long-standing vision is to transform this web of educational content into an adaptive, web-scale "textbook", that can guide its readers to most relevant "pages" according to their learning goal and current knowledge. In this paper, we address one core, long-standing problem towards this goal: identifying outcome and prerequisite concepts within a piece of educational content (e.g., a tutorial). Specifically, we propose a novel approach that leverages textbooks as a source of distant supervision, but learns a model that can generalize to arbitrary documents (such as those on the web). As such, our model can take advantage of any existing textbook, without requiring expert annotation. At the task of predicting outcome and prerequisite concepts, we demonstrate improvements over a number of baselines on six textbooks, especially in the regime of little to no ground-truth labels available. Finally, we demonstrate the utility of a model learned using our approach at the task of identifying prerequisite documents for adaptive content recommendation - an important step towards our vision of the "web as a textbook"

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

    Intelligent Recommendation System for Higher Education

    Get PDF
    Education domain is very vast and the data is increasing every day. Extracting information from this data requires various data mining techniques. Educational data mining combines various methods of data mining, machine learning and statistics; which are appropriate for the unique data that comes from educational sector. Most of the education recommendation systems available help students to choose particular stream for graduate education after successful schooling or to choose particular career options after graduation. Counseling students during their course of graduate education will help him to comprehend subjects in better ways that will results in enhancing his understanding about subjects. This is possible by knowing the ability of student in learning subjects in past semesters and also mining the similar learning patterns from the past databases. Most educational systems allow students to plan out their subjects (particularly electives) during the beginning of the semester or course. The student is not fully aware about what subjects are good for his career, in which field he is interested in, or how would he perform. Recommending students to choose electives by considering his learning ability, his area of interest, extra-curricular activities and his performance in prerequisites would facilitate students to give a better performance and avoid their risk of failure. This would allow student to specialize in his domain of interest. This early prediction benefits the students to take necessary steps in advance to avoid poor performance and to improve their academic scores. To develop this system, various algorithms and recommendation techniques have to be applied. This paper reviews various data mining and machine learning approaches which are used in educational field and how it can be implemented

    A Survey on Metric Learning for Feature Vectors and Structured Data

    Full text link
    The need for appropriate ways to measure the distance or similarity between data is ubiquitous in machine learning, pattern recognition and data mining, but handcrafting such good metrics for specific problems is generally difficult. This has led to the emergence of metric learning, which aims at automatically learning a metric from data and has attracted a lot of interest in machine learning and related fields for the past ten years. This survey paper proposes a systematic review of the metric learning literature, highlighting the pros and cons of each approach. We pay particular attention to Mahalanobis distance metric learning, a well-studied and successful framework, but additionally present a wide range of methods that have recently emerged as powerful alternatives, including nonlinear metric learning, similarity learning and local metric learning. Recent trends and extensions, such as semi-supervised metric learning, metric learning for histogram data and the derivation of generalization guarantees, are also covered. Finally, this survey addresses metric learning for structured data, in particular edit distance learning, and attempts to give an overview of the remaining challenges in metric learning for the years to come.Comment: Technical report, 59 pages. Changes in v2: fixed typos and improved presentation. Changes in v3: fixed typos. Changes in v4: fixed typos and new method

    Annotation Protocol for Textbook Enrichment with Prerequisite Knowledge Graph

    Get PDF
    Extracting and formally representing the knowledge embedded in textbooks, such as the concepts explained and the relations between them, can support the provision of advanced knowledge-based services for learning environments and digital libraries. In this paper, we consider a specific type of relation in textbooks referred to as prerequisite relations (PR). PRs represent precedence relations between concepts aimed to provide the reader with the knowledge needed to understand a further concept(s). Their annotation in educational texts produces datasets that can be represented as a graph of concepts connected by PRs. However, building good-quality and reliable datasets of PRs from a textbook is still an open issue, not just for automated annotation methods but even for manual annotation. In turn, the lack of good-quality datasets and well-defined criteria to identify PRs affect the development and validation of automated methods for prerequisite identification. As a contribution to this issue, in this paper, we propose PREAP, a protocol for the annotation of prerequisite relations in textbooks aimed at obtaining reliable annotated data that can be shared, compared, and reused in the research community. PREAP defines a novel textbook-driven annotation method aimed to capture the structure of prerequisites underlying the text. The protocol has been evaluated against baseline methods for manual and automatic annotation. The findings show that PREAP enables the creation of prerequisite knowledge graphs that have higher inter-annotator agreement, accuracy, and alignment with text than the baseline methods. This suggests that the protocol is able to accurately capture the PRs expressed in the text. Furthermore, the findings show that the time required to complete the annotation using PREAP are significantly shorter than with the other manual baseline methods. The paper includes also guidelines for using PREAP in three annotation scenarios, experimentally tested. We also provide example datasets and a user interface that we developed to support prerequisite annotation

    Explainable AI Using Knowledge Graphs

    Get PDF
    During the last decade, traditional data-driven deep learning (DL) has shown remarkable success in essential natural language processing tasks, such as relation extraction. Yet, challenges remain in developing artificial intelligence (AI) methods in real-world cases that require explainability through human interpretable and traceable outcomes. The scarcity of labeled data for downstream supervised tasks and entangled embeddings produced as an outcome of self-supervised pre-training objectives also hinders interpretability and explainability. Additionally, data labeling in multiple unstructured domains, particularly healthcare and education, is computationally expensive as it requires a pool of human expertise. Consider Education Technology, where AI systems fall along a “capability spectrum” depending on how extensively they exploit various resources, such as academic content, granularity in student engagement, academic domain experts, and knowledge bases to identify concepts that would help achieve knowledge mastery for student goals. Likewise, the task of assessing human health using online conversations raises challenges for current statistical DL methods through evolving cultural and context-specific discussions. Hence, developing strategies that merge AI with stratified knowledge to identify concepts that would delineate healthcare conversation patterns and help healthcare professionals decide. Such technological innovations are imperative as they provide consistency and explainability in outcomes. This tutorial discusses the notion of explainability and interpretability through the use of knowledge graphs in (1) Healthcare on the Web, (2) Education Technology. This tutorial will provide details of knowledge-infused learning algorithms and its contribution to explainability for the above two applications that can be applied to any other domain using knowledge graphs

    Design and Implementation of a Data Visualization Course with a Real-World Project Component in an Undergraduate Information Systems Curriculum

    Get PDF
    This paper describes a new data visualization class and its real-world project component in the information systems undergraduate program at Loyola University Chicago Quinlan School of Business. The motivation for and the evolution of the data visualization class are outlined. The fit and the position of the data visualization class in the information systems curriculum are discussed. The content of the class, including the choice of Tableau as the data visualization tool used for instruction, is discussed as well. The paper also describes the details of the project component of the class undertaken in conjunction with GE Transportation and discusses the validity and feasibility of using real-world data and scenarios. The outcomes of the project (which included the analysis of sensor data generated while testing locomotive engines) and the outcomes of the course are also discussed
    • 

    corecore