Search CORE

232,070 research outputs found

Computer-supported analysis of scientific measurements

Author: Jong Hidde de
Publication venue: Universiteit Twente
Publication date: 01/01/1998
Field of study

In the past decade, large-scale databases and knowledge bases have become available to researchers working in a range of scientific disciplines. In many cases these databases and knowledge bases contain measurements of properties of physical objects which have been obtained in experiments or at observation sites. As examples, one can think of crystallographic databases with molecular structures and property databases in materials science. These large collections of measurements, which will be called measurement bases, form interesting resources for scientific research. By analyzing the contents of a measurement base, one may be able to find patterns that are of practical and theoretical importance. With the use of measurement bases as a resource for scientific inquiry questions arise about the quality of the data being analyzed. In particular, the occurrence of conflicts and systematic errors raises doubts about the reliability of a measurement base and compromises any patterns found in it. On the other hand, conflicts and systematic errors may be interesting patterns in themselves and warrant further investigation. These considerations motivate the topic that will be addressed in this thesis: the development of systematic methods for detecting and resolving con icts and identifying\ud systematic errors in measurement bases. These measurement analysis (MA) methods are implemented in a computer system supporting the user of the measurement base

University of Twente Research Information

Mining Frequent Neighborhood Patterns in Large Labeled Graphs

Author: Han Jialong
Wen Ji-Rong
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Over the years, frequent subgraphs have been an important sort of targeted patterns in the pattern mining literatures, where most works deal with databases holding a number of graph transactions, e.g., chemical structures of compounds. These methods rely heavily on the downward-closure property (DCP) of the support measure to ensure an efficient pruning of the candidate patterns. When switching to the emerging scenario of single-graph databases such as Google Knowledge Graph and Facebook social graph, the traditional support measure turns out to be trivial (either 0 or 1). However, to the best of our knowledge, all attempts to redefine a single-graph support resulted in measures that either lose DCP, or are no longer semantically intuitive. This paper targets mining patterns in the single-graph setting. We resolve the "DCP-intuitiveness" dilemma by shifting the mining target from frequent subgraphs to frequent neighborhoods. A neighborhood is a specific topological pattern where a vertex is embedded, and the pattern is frequent if it is shared by a large portion (above a given threshold) of vertices. We show that the new patterns not only maintain DCP, but also have equally significant semantics as subgraph patterns. Experiments on real-life datasets display the feasibility of our algorithms on relatively large graphs, as well as the capability of mining interesting knowledge that is not discovered in prior works.Comment: 9 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Recommended from our members

Characterisation of FAD-family folds using a machine learning approach

Author: Gilbert D
Tan A C
Tuson A
Publication venue: INCOB
Publication date: 01/01/2002
Field of study

Flavin adenine dinucleotide (FAD) and its derivatives play a crucial role in biological processes. They are major organic cofactors and electron carriers in both enzymatic activities and biochemical pathways. We have analysed the relationships between sequence and structure of FAD-containing proteins using a machine learning approach. Decision trees were generated using the C4.5 algorithm as a means of automatically generating rules from biological databases (TOPS, CATH and PDB). These rules were then used as background knowledge for an ILP system to characterise the four different classes of FAD-family folds classified in Dym and Eisenberg (2001). These FAD-family folds are: glutathione reductase (GR), ferredoxin reductase (FR), p-cresol methylhydroxylase (PCMH) and pyruvate oxidase (PO). Each FADfamily was characterised by a set of rules. The “knowledge patterns” generated from this approach are a set of rules containing conserved sequence motifs, secondary structure sequence elements and folding information. Every rule was then verified using statistical evaluation on the measured significance of each rule. We show that this machine learning approach is capable of learning and discovering interesting patterns from large biological databases and can generate “knowledge patterns” that characterise the FADcontaining proteins, and at the same time classify these proteins into four different families

Brunel University Research Archive

A framework for knowledge – Driven CRM

Author: Goon Tuck Choy
Ian Chai
Publication venue
Publication date: 14/02/2004
Field of study

In this paper we propose a framework to combine KDD (Knowledge Discovered in Databases) and CRM (Customer Relationship Management), with an emphasis on customer retention. The key aspect of the proposed framework is to enable adaptive use of knowledge discovered to predict customer buying patterns and capture interesting knowledge about customers

UUM Repository

Robust and cost-effective approach for discovering action rules

Author: Kalanat N
Saraee MH
Shamsinejad P
Publication venue: 'IACSIT Press'
Publication date: 01/01/2011
Field of study

The main goal of Knowledge Discovery in Databases is to find interesting and usable patterns, meaningful in their domain. Actionable Knowledge Discovery came to existence as a direct respond to the need of finding more usable patterns called actionable patterns. Traditional data mining and algorithms are often confined to deliver frequent patterns and come short for suggesting how to make these patterns actionable. In this scenario the users are expected to act. However, the users are not advised about what to do with delivered patterns in order to make them usable. In this paper, we present an automated approach to focus on not only creating rules but also making the discovered rules actionable. Up to now few works have been reported in this field which lacking incomprehensibility to the user, overlooking the cost and not providing rule generality. Here we attempt to present a method to resolving these issues. In this paper CEARDM method is proposed to discover cost-effective action rules from data. These rules offer some cost-effective changes to transferring low profitable instances to higher profitable ones. We also propose an idea for improving in CEARDM method

University of Salford Institutional Repository

Crossref

A case study in knowledge acquisition for logistic cargo distribution data mining framework

Author: Imran Nordin
Lee Angela Siew Hoong *
Puteri Nohuddin
Zaharin Yusoff *
Zuraini Zainol
Publication venue: 'International Journal of Advanced and Applied Sciences'
Publication date: 01/01/2018
Field of study

Knowledge acquisition is one of important aspect of Knowledge Discovery in Databases to ensure the correct and interesting knowledge is extracted and represented to the stakeholders and decision makers. The process can undertake using several techniques as such in this study, it is using data mining to extract the knowledge patterns and representing the knowledge described using ontology based representation. In this paper, a data set of Logistic Cargo Distribution is selected for the experiment. The dataset describes the shipment of logistic items for the Malaysian Army

Sunway Institutional Repository

Product design and manufacturing process improvement using association rules

Author: Jennifer Harding (1258389)
M. Turner (2002924)
Muhammad Shahbaz (772838)
Srinivas (7201385)
Publication venue
Publication date: 01/01/2006
Field of study

Modern manufacturing systems equipped with computerized data logging systems collect large volumes of data in real time. The data may contain valuable information for operation and control strategies as well as providing knowledge of normal and abnormal operational patterns. Knowledge discovery in databases can be applied to these data to unearth hidden, unknown, representable, and ultimately useful knowledge. Data mining offers tools for discovery of patterns, associations, changes, anomalies, rules, and statistically significant structures and events in data. Extraction of previously unknown, meaningful information from manufacturing databases provides knowledge that may benefit many application areas within the enterprise, for example improving design or fine tuning production processes. This paper examines the application of association rules to manufacturing databases to extract useful information about a manufacturing system's capabilities and its constraints. The quality of each identified rule is tested and, from numerous rules, only those that are statistically very strong and contain substantial design information are selected. The final set of extracted rules contains very interesting information relating to the geometry of the product and also indicates where limitations exist for improvement of the manufacturing processes involved in the production of complex geometric shapes

Loughborough University Institutional Repository

A BELIEF-DRIVEN DISCOVERY FRAMEWORK BASED ON DATA MONITORING AND TRIGGERING

Author: Silberschatz Avi
Tuzhilin Alexander
Publication venue: Stern School of Business, New York University
Publication date: 01/12/1996
Field of study

A new knowledge-discovery framework, called Data Monitoring and Discovery Triggering (DMDT), is defined, where the user specifies monitors that âwatch" for significant changes to the data and changes to the user-defined system of beliefs. Once these changes are detected, knowledge discovery processes, in the form of data mining queries, are triggered. The proposed framework is the result of an observation, made in the previous work of the authors, that when changes to the user-defined beliefs occur, this means that, there are interesting patterns in the data. In this paper, we present an approach for finding these interesting patterns using data monitoring and belief-driven discovery techniques. Our approach is especially useful in those applications where data changes rapidly with time, as in some of the On-Line Transaction Processing (OLTP) systems. The proposed approach integrates active databases, data mining queries and subjective measures of interestingness based on user-defined systems of beliefs in a novel and synergetic way to yield a new type of data mining systems.Information Systems Working Papers Serie

New York University Faculty Digital Archive

Ontology mining for personalized search

Author: Li Yuefeng
Tao Xiaohui
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Knowledge discovery for user information needs in user local information repositories is a challenging task. Traditional data mining techniques cannot provide a satisfactory solution for this challenge, because there exists a lot of uncertainties in the local information repositories. In this chapter, we introduce ontology mining, a new methodology, for solving this challenging issue, which aims to discover interesting and useful knowledge in databases in order to meet the specified constraints on an ontology. In this way, users can efficiently specify their information needs on the ontology rather than dig useful knowledge from the huge amount of discorded patterns or rules. The proposed ontology mining model is evaluated by applying to an information gathering system, and the results are promising

University of Southern Queensland ePrints