381,093 research outputs found

    Applying Data Mining Methods to Understand User Interactions within Learning Management Systems: Approaches and Lessons Learned

    Get PDF
    This article describes our processes for analyzing and mining the vast records of instructor and student usage data collected by a learning management system (LMS) widely used in higher education, called Canvas. Our data were drawn from over 33,000 courses taught over three years at a mid-sized public Western U.S. university. Our processes were guided by an established data mining framework, called Knowledge Discovery and Data Mining (KDD). In particular, we use the KDD framework in guiding our application of several educational data mining (EDM) methods (prediction, clustering, and data visualization) to model student and instructor Canvas usage data, and to examine the relationship between these models and student learning outcomes. We also describe challenges and lessons learned along the way

    Mining subjectively interesting attributed subgraphs

    Get PDF
    Community detection in graphs, data clustering, and local pattern mining are three mature fields of data mining and machine learning. In recent years, attributed subgraph mining is emerging as a new powerful data mining task in the intersection of these areas. Given a graph and a set of attributes for each vertex, attributed subgraph mining aims to find cohesive subgraphs for which (a subset of) the attribute values has exceptional values in some sense. While research on this task can borrow from the three abovementioned fields, the principled integration of graph and attribute data poses two challenges: the definition of a pattern language that is intuitive and lends itself to efficient search strategies, and the formalization of the interestingness of such patterns. We propose an integrated solution to both of these challenges. The proposed pattern language improves upon prior work in being both highly flexible and intuitive. We show how an effective and principled algorithm can enumerate patterns of this language. The proposed approach for quantifying interestingness of patterns of this language is rooted in information theory, and is able to account for prior knowledge on the data. Prior work typically quantifies interestingness based on the cohesion of the subgraph and for the exceptionality of its attributes separately, combining these in a parameterized trade-off. Instead, in our proposal this trade-off is implicitly handled in a principled, parameter-free manner. Extensive empirical results confirm the proposed pattern syntax is intuitive, and the interestingness measure aligns well with actual subjective interestingness

    Process Mining in The Rail Industry: A Qualitative Analysis of Success Factors and Remaining Challenges

    Get PDF
    This paper aims to identify success factors and remaining challenges relevant to the practice of process mining in the rail industry. Process mining is a method for analyzing processes based on event logs. In a case study, we examine three process mining projects performed at the largest rail organization in The Netherlands. Experiences gained in these projects are compared to success factors specified in literature. The projects were analyzed using observations, secondary data collection and semi-structured interviews. We were able to identify all success factors specified in literature in the case study. In addition, several new success factors are identified. These concern challenges regarding the implementation of process mining software, intra-organizational knowledge sharing and continuous availability of event logs. For the additional success factors identified, it was not yet possible to determine if they are industry specific or generic in nature

    Dynamic Data Mining: Methodology and Algorithms

    No full text
    Supervised data stream mining has become an important and challenging data mining task in modern organizations. The key challenges are threefold: (1) a possibly infinite number of streaming examples and time-critical analysis constraints; (2) concept drift; and (3) skewed data distributions. To address these three challenges, this thesis proposes the novel dynamic data mining (DDM) methodology by effectively applying supervised ensemble models to data stream mining. DDM can be loosely defined as categorization-organization-selection of supervised ensemble models. It is inspired by the idea that although the underlying concepts in a data stream are time-varying, their distinctions can be identified. Therefore, the models trained on the distinct concepts can be dynamically selected in order to classify incoming examples of similar concepts. First, following the general paradigm of DDM, we examine the different concept-drifting stream mining scenarios and propose corresponding effective and efficient data mining algorithms. • To address concept drift caused merely by changes of variable distributions, which we term pseudo concept drift, base models built on categorized streaming data are organized and selected in line with their corresponding variable distribution characteristics. • To address concept drift caused by changes of variable and class joint distributions, which we term true concept drift, an effective data categorization scheme is introduced. A group of working models is dynamically organized and selected for reacting to the drifting concept. Secondly, we introduce an integration stream mining framework, enabling the paradigm advocated by DDM to be widely applicable for other stream mining problems. Therefore, we are able to introduce easily six effective algorithms for mining data streams with skewed class distributions. In addition, we also introduce a new ensemble model approach for batch learning, following the same methodology. Both theoretical and empirical studies demonstrate its effectiveness. Future work would be targeted at improving the effectiveness and efficiency of the proposed algorithms. Meantime, we would explore the possibilities of using the integration framework to solve other open stream mining research problems

    Corporate Social Responsibility: Understanding the Mining Stakeholder with a Case Study

    Get PDF
    The social responsibility of corporate mining has been challenged by a significant socio-political risk from local communities. These issues reduce shareholder value by increasing costs and decreasing the market perception of corporate social responsibility. Community engagement is the process of understanding the behavior and interests of a group of targeted mining communities through surveys and data analysis, with the purpose of incorporating mining community acceptance into the mining sustainability. While mining organizations have discussed community engagement to varying degrees, there are three main shortcomings in current studies, as concluded in the authors\u27 previous research. This paper presents a framework to apply discrete choice theory to improve mining community engagement and corporate mining social responsibility. In addition, this paper establishes the main technical challenges to implement the developed framework, and presents methods to overcome the challenges for future research with a case study. The contribution of this research will transform mine sustainability in a fundamental way by facilitating the incorporation of effective community engagement. This will lead to more sustainable mines that local communities support

    Conformance Checking Based on Multi-Perspective Declarative Process Models

    Full text link
    Process mining is a family of techniques that aim at analyzing business process execution data recorded in event logs. Conformance checking is a branch of this discipline embracing approaches for verifying whether the behavior of a process, as recorded in a log, is in line with some expected behaviors provided in the form of a process model. The majority of these approaches require the input process model to be procedural (e.g., a Petri net). However, in turbulent environments, characterized by high variability, the process behavior is less stable and predictable. In these environments, procedural process models are less suitable to describe a business process. Declarative specifications, working in an open world assumption, allow the modeler to express several possible execution paths as a compact set of constraints. Any process execution that does not contradict these constraints is allowed. One of the open challenges in the context of conformance checking with declarative models is the capability of supporting multi-perspective specifications. In this paper, we close this gap by providing a framework for conformance checking based on MP-Declare, a multi-perspective version of the declarative process modeling language Declare. The approach has been implemented in the process mining tool ProM and has been experimented in three real life case studies
    • …
    corecore