232,070 research outputs found
Computer-supported analysis of scientific measurements
In the past decade, large-scale databases and knowledge bases have become available to researchers working in a range of scientific disciplines. In many cases these databases and knowledge bases contain measurements of properties of physical objects which have been obtained in experiments or at observation sites. As examples, one can think of crystallographic databases with molecular structures and property databases in materials science. These large collections of measurements, which will be called measurement bases, form interesting resources for scientific research. By analyzing the contents of a measurement base, one may be able to find patterns that are of practical and theoretical importance. With the use of measurement bases as a resource for scientific inquiry questions arise about the quality of the data being analyzed. In particular, the occurrence of conflicts and systematic errors raises doubts about the reliability of a measurement base and compromises any patterns found in it. On the other hand, conflicts and systematic errors may be interesting patterns in themselves and warrant further investigation. These considerations motivate the topic that will be addressed in this thesis: the development of systematic methods for detecting and resolving con icts and identifying\ud
systematic errors in measurement bases. These measurement analysis (MA) methods are implemented in a computer system supporting the user of the measurement base
Mining Frequent Neighborhood Patterns in Large Labeled Graphs
Over the years, frequent subgraphs have been an important sort of targeted
patterns in the pattern mining literatures, where most works deal with
databases holding a number of graph transactions, e.g., chemical structures of
compounds. These methods rely heavily on the downward-closure property (DCP) of
the support measure to ensure an efficient pruning of the candidate patterns.
When switching to the emerging scenario of single-graph databases such as
Google Knowledge Graph and Facebook social graph, the traditional support
measure turns out to be trivial (either 0 or 1). However, to the best of our
knowledge, all attempts to redefine a single-graph support resulted in measures
that either lose DCP, or are no longer semantically intuitive.
This paper targets mining patterns in the single-graph setting. We resolve
the "DCP-intuitiveness" dilemma by shifting the mining target from frequent
subgraphs to frequent neighborhoods. A neighborhood is a specific topological
pattern where a vertex is embedded, and the pattern is frequent if it is shared
by a large portion (above a given threshold) of vertices. We show that the new
patterns not only maintain DCP, but also have equally significant semantics as
subgraph patterns. Experiments on real-life datasets display the feasibility of
our algorithms on relatively large graphs, as well as the capability of mining
interesting knowledge that is not discovered in prior works.Comment: 9 page
Recommended from our members
Characterisation of FAD-family folds using a machine learning approach
Flavin adenine dinucleotide (FAD) and its derivatives play a crucial role in
biological processes. They are major organic cofactors and electron carriers
in both enzymatic activities and biochemical pathways. We have analysed
the relationships between sequence and structure of FAD-containing proteins
using a machine learning approach. Decision trees were generated using the
C4.5 algorithm as a means of automatically generating rules from biological
databases (TOPS, CATH and PDB). These rules were then used as
background knowledge for an ILP system to characterise the four different
classes of FAD-family folds classified in Dym and Eisenberg (2001). These
FAD-family folds are: glutathione reductase (GR), ferredoxin reductase (FR),
p-cresol methylhydroxylase (PCMH) and pyruvate oxidase (PO). Each FADfamily
was characterised by a set of rules. The “knowledge patterns”
generated from this approach are a set of rules containing conserved sequence
motifs, secondary structure sequence elements and folding information.
Every rule was then verified using statistical evaluation on the measured
significance of each rule. We show that this machine learning approach is
capable of learning and discovering interesting patterns from large biological
databases and can generate “knowledge patterns” that characterise the FADcontaining
proteins, and at the same time classify these proteins into four
different families
A framework for knowledge – Driven CRM
In this paper we propose a framework to combine
KDD (Knowledge Discovered in Databases) and
CRM (Customer Relationship Management), with
an emphasis on customer retention. The key
aspect of the proposed framework is to enable
adaptive use of knowledge discovered to predict
customer buying patterns and capture interesting
knowledge about customers
Robust and cost-effective approach for discovering action rules
The main goal of Knowledge Discovery in
Databases is to find interesting and usable patterns, meaningful
in their domain. Actionable Knowledge Discovery came to
existence as a direct respond to the need of finding more usable
patterns called actionable patterns. Traditional data mining
and algorithms are often confined to deliver frequent patterns
and come short for suggesting how to make these patterns
actionable. In this scenario the users are expected to act.
However, the users are not advised about what to do with
delivered patterns in order to make them usable. In this paper,
we present an automated approach to focus on not only creating
rules but also making the discovered rules actionable.
Up to now few works have been reported in this field which
lacking incomprehensibility to the user, overlooking the cost
and not providing rule generality. Here we attempt to present a
method to resolving these issues. In this paper CEARDM
method is proposed to discover cost-effective action rules from
data. These rules offer some cost-effective changes to
transferring low profitable instances to higher profitable ones.
We also propose an idea for improving in CEARDM method
A case study in knowledge acquisition for logistic cargo distribution data mining framework
Knowledge acquisition is one of important aspect of Knowledge Discovery in Databases to ensure the correct and interesting knowledge is extracted and represented to the stakeholders and decision makers. The process can
undertake using several techniques as such in this study, it is using data mining to extract the knowledge patterns and representing the knowledge described using ontology based representation. In this paper, a data set of
Logistic Cargo Distribution is selected for the experiment. The dataset describes the shipment of logistic items for the Malaysian Army
Product design and manufacturing process improvement using association rules
Modern manufacturing systems equipped with computerized data logging systems collect large volumes of data in real time. The data may contain valuable information for operation and control strategies as well as providing knowledge of normal and abnormal operational patterns. Knowledge discovery in databases can be applied to these data to unearth hidden, unknown, representable, and ultimately useful knowledge. Data mining offers tools for discovery of patterns, associations, changes, anomalies, rules, and statistically significant structures and events in data. Extraction of previously unknown, meaningful information from manufacturing databases provides knowledge that may benefit many application areas within the enterprise, for example improving design or fine tuning production processes. This paper examines the application of association rules to manufacturing databases to extract useful information about a manufacturing system's capabilities and its constraints. The quality of each identified rule is tested and, from numerous rules, only those that are statistically very strong and contain substantial design information are selected. The final set of extracted rules contains very interesting information relating to the geometry of the product and also indicates where limitations exist for improvement of the manufacturing processes involved in the production of complex geometric shapes
A BELIEF-DRIVEN DISCOVERY FRAMEWORK BASED ON DATA MONITORING AND TRIGGERING
A new knowledge-discovery framework, called Data Monitoring and Discovery Triggering (DMDT),
is defined, where the user specifies monitors that âwatch" for significant changes to the data
and changes to the user-defined system of beliefs. Once these changes are detected, knowledge
discovery processes, in the form of data mining queries, are triggered. The proposed framework
is the result of an observation, made in the previous work of the authors, that when changes to
the user-defined beliefs occur, this means that, there are interesting patterns in the data. In this
paper, we present an approach for finding these interesting patterns using data monitoring and
belief-driven discovery techniques. Our approach is especially useful in those applications where
data changes rapidly with time, as in some of the On-Line Transaction Processing (OLTP) systems. The proposed approach integrates active databases, data mining queries and subjective
measures of interestingness based on user-defined systems of beliefs in a novel and synergetic
way to yield a new type of data mining systems.Information Systems Working Papers Serie
Ontology mining for personalized search
Knowledge discovery for user information needs in user local information repositories is a challenging task. Traditional data mining techniques cannot provide a satisfactory solution for this challenge, because there exists a lot of uncertainties in the local information repositories. In this chapter, we introduce ontology mining,
a new methodology, for solving this challenging issue, which aims to discover interesting and useful knowledge in databases in order to meet the specified constraints on an ontology. In this way, users can efficiently specify their information needs on the ontology rather than dig useful knowledge from the huge amount of discorded patterns or rules. The proposed ontology mining model is evaluated by applying to an information gathering system, and the results are promising
- …