215,638 research outputs found

    A Communication Model that Bridges Knowledge Delivery between Data Miners and Domain Users

    Get PDF
    Findings generated from data mining sometimes are not interesting to the domain users. The problem is that data miners and the domain users do not speak the same language, so human subjectivity towards the domain users’ own fields of knowledge affects the understanding of knowledge generated from data mining. This paper proposes a communication model based on the reference services model in the field of library science in order to bridge the communications between data miners and domain users. The creation of a data liaison specialist role in the data mining team aims at understanding the subjectivity as well as the thinking process of both parties in order to translate knowledge between the two fields and deliver findings to domain users. Through five steps-”data interview, pre-mid evaluation, post-mid evaluation, knowledge delivery, and follow up-”the data liaison specialist can achieve effective knowledge synthesis and delivery to the domain users

    A framework for capturing domain knowledge via the web

    Full text link
    Domain knowledge can be formalized and represented by ontologies, which play an important role in the realization of the Semantic Web. However, since the acquisition of knowledge from certain domains usually requires deep involvement of qualified domain experts, construction of such ontologies is difficult and costly, even with the availability of dedicated languages and ontology editing tools. Some effect has been made to reduce this involvement by introducing a general paradigm of automatic domain knowledge learning from various sources. To make this paradigm more specific and practical, this paper proposes a framework for capturing domain knowledge through raw domain data available over the Web. This framework consists of three dedicated parts: data collection, pre-processing and mining, where mining part performs core task of the framework. Each part can be designed with specific optimized methods. The preliminary implementation of certain parts has shown it is able to capture the knowledge of electronic product taxonomy via the Web. © 2005. Chao Wang, Jie Lu & Guangquan Zhang

    Towards Role Based Hypothesis Evaluation for Health Data Mining

    Get PDF
    Data mining researchers have long been concerned with the application of tools to facilitate and improve data analysis on large, complex data sets. The current challenge is to make data mining and knowledge discovery systems applicable to a wider range of domains, among them health. Early work was performed over transactional, retail based data sets, but the attraction of finding previously unknown knowledge from the ever increasing amounts of data collected from the health domain is an emerging area of interest and specialisation. The problem is finding a solution that is suitably flexible to allow for generalised application whilst being specific enough to provide functionality that caters for the nuances of each role within the domain. The need for a more granular approach to problem solving in other areas of information technology has resulted in the use of role based solutions. This paper discusses the progress to date in developing a role oriented solution to the problem of providing for the diverse requirements of health domain data miners and defining the foundation for determining what constitutes an interesting discovery in an area as complex as health

    Survey On Moving Towards Frequent Pattern Growth for Infrequent Weighted Itemset Mining

    Get PDF
    In data mining and knowledge discovery technique domain, frequent pattern mining plays an important role but it does not consider different weight value of the items. Association Rule Mining is to find the correlation between data. The frequent itemsets are patterns or items like itemsets, substructures, or subsequences that come out in a data set frequently or continuously. In this paper we are presenting survey of various frequent pattern mining and weighted itemset mining. Different articles related to frequent and weighted infrequent itemset mining were proposed. This paper focus on survey of various Existing Algorithms related to frequent and infrequent itemset mining which creates a path for future researches in the field of Association Rule Mining

    Introducing DASC-PM: A Data Science Process Model

    Get PDF
    Data-driven disciplines like data mining and knowledge management already provide process-based frameworks for data analysis projects, such as the well-known cross-industry standard process for data mining (CRISP-DM) or knowledge discovery in databases (KDD). Although the domain of data science addresses a much broader problem space, i.e., also considers economic, social, and ecological impacts of data-driven projects, a corresponding domain-specific process model is still missing. Consequently, based on a total of four identified meta requirements and 17 corresponding requirements that were collected from experts of theory and practice, this contribution proposes the empirically grounded data science process model (DASC-PM)—a framework that maps a data science project as a four-step process model and contextualizes it among scientific procedures, various areas of application, IT infrastructures, and impacts. To illustrate the phase-oriented specification capabilities of the DASCPM, we exemplarily present competence and role profiles for the analysis phase of a data science project

    Image Mining for Flower Classification by Genetic Association Rule Mining Using GLCM features

    Full text link
    Image mining is concerned with knowledge discovery in image databases. It is the extension of data mining algorithms to image processing domain. Image mining plays a vital role in extracting useful information from images. In computer aided plant identification and classification system the image mining will take a crucial role for the flower classification. The content image based on the low-level features such as color and textures are used to flower image classification. A flower image is segmented using a histogram threshold based method. The data set has different flower species with similar appearance (small inter class variations) across different classes and varying appearance (large intra class variations) within a class. Also the images of flowers are of different pose with cluttered background under varying lighting conditions and climatic conditions. The flower images were collected from World Wide Web in addition to the photographs taken up in a natural scene. The proposed method is based on textural features such as Gray level co-occurrence matrix (GLCM). This paper introduces multi dimensional genetic association rule mining for classification of flowers effectively. The image Data mining approach has four major steps: Preprocessing, Feature Extraction, Preparation of Transactional database and multi dimensional genetic association rule mining and classification. The purpose of our experiments is to explore the feasibility of data mining approach. Results will show that there is promise in image mining based on multi dimensional genetic association rule mining. It is well known that data mining techniques are more suitable to larger databases than the one used for these preliminary tests. Computer-aided method using association rule could assist people and improve the accuracy of flower identification. In particular, a Computer aided method based on association rules becomes more accurate with a larger dataset .Experimental results show that this new method can quickly and effectively mine potential association rules

    Web-Page Recommendation Based on Web Usage and Domain Knowledge

    Full text link
    © 1989-2012 IEEE. Web-page recommendation plays an important role in intelligent Web systems. Useful knowledge discovery from Web usage data and satisfactory knowledge representation for effective Web-page recommendations are crucial and challenging. This paper proposes a novel method to efficiently provide better Web-page recommendation through semantic-enhancement by integrating the domain and Web usage knowledge of a website. Two new models are proposed to represent the domain knowledge. The first model uses an ontology to represent the domain knowledge. The second model uses one automatically generated semantic network to represent domain terms, Web-pages, and the relations between them. Another new model, the conceptual prediction model, is proposed to automatically generate a semantic network of the semantic Web usage knowledge, which is the integration of domain knowledge and Web usage knowledge. A number of effective queries have been developed to query about these knowledge bases. Based on these queries, a set of recommendation strategies have been proposed to generate Web-page candidates. The recommendation results have been compared with the results obtained from an advanced existing Web Usage Mining (WUM) method. The experimental results demonstrate that the proposed method produces significantly higher performance than the WUM method
    corecore