381,093 research outputs found
Applying Data Mining Methods to Understand User Interactions within Learning Management Systems: Approaches and Lessons Learned
This article describes our processes for analyzing and mining the vast records of instructor and student usage data collected by a learning management system (LMS) widely used in higher education, called Canvas. Our data were drawn from over 33,000 courses taught over three years at a mid-sized public Western U.S. university. Our processes were guided by an established data mining framework, called Knowledge Discovery and Data Mining (KDD). In particular, we use the KDD framework in guiding our application of several educational data mining (EDM) methods (prediction, clustering, and data visualization) to model student and instructor Canvas usage data, and to examine the relationship between these models and student learning outcomes. We also describe challenges and lessons learned along the way
Mining subjectively interesting attributed subgraphs
Community detection in graphs, data clustering, and local pattern mining
are three mature fields of data mining and machine learning.
In recent years, attributed subgraph mining is emerging as a new
powerful data mining task in the intersection of these areas.
Given a graph and a set of attributes for each vertex,
attributed subgraph mining aims to find cohesive subgraphs
for which (a subset of) the attribute values has exceptional values in some sense.
While research on this task can borrow from the three abovementioned fields,
the principled integration of graph and attribute data poses two challenges:
the definition of a pattern language that is intuitive and lends itself to efficient search strategies,
and the formalization of the interestingness of such patterns.
We propose an integrated solution to both of these challenges.
The proposed pattern language improves upon prior work in being both highly flexible and intuitive.
We show how an effective and principled algorithm can enumerate patterns of this language.
The proposed approach for quantifying interestingness of patterns of this language
is rooted in information theory, and is able to account for prior knowledge on the data.
Prior work typically quantifies interestingness based on the cohesion of the subgraph
and for the exceptionality of its attributes separately,
combining these in a parameterized trade-off.
Instead, in our proposal this trade-off is implicitly handled in a principled, parameter-free manner.
Extensive empirical results confirm the proposed pattern syntax is intuitive,
and the interestingness measure aligns well with actual subjective interestingness
Process Mining in The Rail Industry: A Qualitative Analysis of Success Factors and Remaining Challenges
This paper aims to identify success factors and remaining challenges relevant to the practice of process mining in the rail industry. Process mining is a method for analyzing processes based on event logs. In a case study, we examine three process mining projects performed at the largest rail organization in The Netherlands. Experiences gained in these projects are compared to success factors specified in literature. The projects were analyzed using observations, secondary data collection and semi-structured interviews. We were able to identify all success factors specified in literature in the case study. In addition, several new success factors are identified. These concern challenges regarding the implementation of process mining software, intra-organizational knowledge sharing and continuous availability of event logs. For the additional success factors identified, it was not yet possible to determine if they are industry specific or generic in nature
Dynamic Data Mining: Methodology and Algorithms
Supervised data stream mining has become an important and challenging data mining task in modern
organizations. The key challenges are threefold: (1) a possibly infinite number of streaming examples
and time-critical analysis constraints; (2) concept drift; and (3) skewed data distributions.
To address these three challenges, this thesis proposes the novel dynamic data mining (DDM)
methodology by effectively applying supervised ensemble models to data stream mining. DDM can be
loosely defined as categorization-organization-selection of supervised ensemble models. It is inspired
by the idea that although the underlying concepts in a data stream are time-varying, their distinctions
can be identified. Therefore, the models trained on the distinct concepts can be dynamically selected in
order to classify incoming examples of similar concepts.
First, following the general paradigm of DDM, we examine the different concept-drifting stream
mining scenarios and propose corresponding effective and efficient data mining algorithms.
• To address concept drift caused merely by changes of variable distributions, which we term
pseudo concept drift, base models built on categorized streaming data are organized and
selected in line with their corresponding variable distribution characteristics.
• To address concept drift caused by changes of variable and class joint distributions, which we
term true concept drift, an effective data categorization scheme is introduced. A group of
working models is dynamically organized and selected for reacting to the drifting concept.
Secondly, we introduce an integration stream mining framework, enabling the paradigm advocated by
DDM to be widely applicable for other stream mining problems. Therefore, we are able to introduce
easily six effective algorithms for mining data streams with skewed class distributions.
In addition, we also introduce a new ensemble model approach for batch learning, following the same
methodology. Both theoretical and empirical studies demonstrate its effectiveness.
Future work would be targeted at improving the effectiveness and efficiency of the proposed
algorithms. Meantime, we would explore the possibilities of using the integration framework to solve
other open stream mining research problems
Corporate Social Responsibility: Understanding the Mining Stakeholder with a Case Study
The social responsibility of corporate mining has been challenged by a significant socio-political risk from local communities. These issues reduce shareholder value by increasing costs and decreasing the market perception of corporate social responsibility. Community engagement is the process of understanding the behavior and interests of a group of targeted mining communities through surveys and data analysis, with the purpose of incorporating mining community acceptance into the mining sustainability. While mining organizations have discussed community engagement to varying degrees, there are three main shortcomings in current studies, as concluded in the authors\u27 previous research. This paper presents a framework to apply discrete choice theory to improve mining community engagement and corporate mining social responsibility. In addition, this paper establishes the main technical challenges to implement the developed framework, and presents methods to overcome the challenges for future research with a case study. The contribution of this research will transform mine sustainability in a fundamental way by facilitating the incorporation of effective community engagement. This will lead to more sustainable mines that local communities support
Conformance Checking Based on Multi-Perspective Declarative Process Models
Process mining is a family of techniques that aim at analyzing business
process execution data recorded in event logs. Conformance checking is a branch
of this discipline embracing approaches for verifying whether the behavior of a
process, as recorded in a log, is in line with some expected behaviors provided
in the form of a process model. The majority of these approaches require the
input process model to be procedural (e.g., a Petri net). However, in turbulent
environments, characterized by high variability, the process behavior is less
stable and predictable. In these environments, procedural process models are
less suitable to describe a business process. Declarative specifications,
working in an open world assumption, allow the modeler to express several
possible execution paths as a compact set of constraints. Any process execution
that does not contradict these constraints is allowed. One of the open
challenges in the context of conformance checking with declarative models is
the capability of supporting multi-perspective specifications. In this paper,
we close this gap by providing a framework for conformance checking based on
MP-Declare, a multi-perspective version of the declarative process modeling
language Declare. The approach has been implemented in the process mining tool
ProM and has been experimented in three real life case studies
- …