1,773 research outputs found
On the role of pre and post-processing in environmental data mining
The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed
CASP-DM: Context Aware Standard Process for Data Mining
We propose an extension of the Cross Industry Standard Process for Data
Mining (CRISPDM) which addresses specific challenges of machine learning and
data mining for context and model reuse handling. This new general
context-aware process model is mapped with CRISP-DM reference model proposing
some new or enhanced outputs
A survey of temporal knowledge discovery paradigms and methods
With the increase in the size of data sets, data mining has recently become an important research topic and is receiving substantial interest from both academia and industry. At the same time, interest in temporal databases has been increasing and a growing number of both prototype and implemented systems are using an enhanced temporal understanding to explain aspects of behavior associated with the implicit time-varying nature of the universe. This paper investigates the confluence of these two areas, surveys the work to date, and explores the issues involved and the outstanding problems in temporal data mining
The Minimum Description Length Principle for Pattern Mining: A Survey
This is about the Minimum Description Length (MDL) principle applied to
pattern mining. The length of this description is kept to the minimum.
Mining patterns is a core task in data analysis and, beyond issues of
efficient enumeration, the selection of patterns constitutes a major challenge.
The MDL principle, a model selection method grounded in information theory, has
been applied to pattern mining with the aim to obtain compact high-quality sets
of patterns. After giving an outline of relevant concepts from information
theory and coding, as well as of work on the theory behind the MDL and similar
principles, we review MDL-based methods for mining various types of data and
patterns. Finally, we open a discussion on some issues regarding these methods,
and highlight currently active related data analysis problems
Graph Learning and Its Applications: A Holistic Survey
Graph learning is a prevalent domain that endeavors to learn the intricate
relationships among nodes and the topological structure of graphs. These
relationships endow graphs with uniqueness compared to conventional tabular
data, as nodes rely on non-Euclidean space and encompass rich information to
exploit. Over the years, graph learning has transcended from graph theory to
graph data mining. With the advent of representation learning, it has attained
remarkable performance in diverse scenarios, including text, image, chemistry,
and biology. Owing to its extensive application prospects, graph learning
attracts copious attention from the academic community. Despite numerous works
proposed to tackle different problems in graph learning, there is a demand to
survey previous valuable works. While some researchers have perceived this
phenomenon and accomplished impressive surveys on graph learning, they failed
to connect related objectives, methods, and applications in a more coherent
way. As a result, they did not encompass current ample scenarios and
challenging problems due to the rapid expansion of graph learning. Different
from previous surveys on graph learning, we provide a holistic review that
analyzes current works from the perspective of graph structure, and discusses
the latest applications, trends, and challenges in graph learning.
Specifically, we commence by proposing a taxonomy from the perspective of the
composition of graph data and then summarize the methods employed in graph
learning. We then provide a detailed elucidation of mainstream applications.
Finally, based on the current trend of techniques, we propose future
directions.Comment: 20 pages, 7 figures, 3 table
- …