5,530 research outputs found

    Data Mining Decision Trees in Economy

    Get PDF
    Data Mining represents the extraction previously unknown, and potentially useful information from data. Using Data Mining Decision Trees techniques our investigation tries to illustrate how to extract meaningful socio-economical knowledge from large data sets. Our tests find 5 attributes selection measures that perform more accurate then the best performance of the 17 algorithms presented in literature.Data Mining, Decision Trees, classification error rate

    Region Based Data Mining on Agriculture Data

    Get PDF
    Spatial Data Mining is the process of discovering interesting and previously unknown, but potentially useful patterns from large spatial databases. Most relationships in spatial datasets are regional and there is a great need for regional regression methods that derive regional reflects different spatial characteristics of different regions. A central challenge in spatial data mining is the efficiency of spatial data mining algorithms, due to the often huge amount of spatial data and the complexity of spatial data types and spatial accessing methods. This paper proposes a regional regression technique for regions that are defined by a categorical attribute, in particular soil type. The result is a series of hierarchically grouped regions according to their similarity

    Intelligent data analysis - support for development of SMEs sector

    Get PDF
    The paper studies possibilities of intelligent data analysis application for discovering knowledge hidden in small and medium-sized enterprises’ (SMEs) data, on the territory of the province of Vojvodina. The knowledge revealed by intelligent analysis, and not accessible by any other means, could be the valuable starting point for working out of proactive and preventive actions for the development of the SMEs sector.Intelligent data analysis, CRISP-DM, clustering, small and medium enterprises., Research and Development/Tech Change/Emerging Technologies, C8, L2,

    Data Mining Applications in Big Data

    Get PDF
    Data mining is a process of extracting hidden, unknown, but potentially useful information from massive data. Big Data has great impacts on scientific discoveries and value creation. This paper introduces methods in data mining and technologies in Big Data. Challenges of data mining and data mining with big data are discussed. Some technology progress of data mining and data mining with big data are also presented

    Data mining in Cloud Computing

    Get PDF
    This paper describes how data mining is used in cloud computing. Data Mining is used for extracting potentially useful information from raw data. The integration of data mining techniques into normal day-to-day activities has become common place. Every day people are confronted with targeted advertising, and data mining techniques help businesses to become more efficient by reducing costs.Data mining techniques and applications are very much needed in the cloud computing paradigm. The implementation of data mining techniques through Cloud computing will allow the users to retrieve meaningful information from virtually integrated data warehouse that reduces the costs of infrastructure and storage

    New probabilistic interest measures for association rules

    Full text link
    Mining association rules is an important technique for discovering meaningful patterns in transaction databases. Many different measures of interestingness have been proposed for association rules. However, these measures fail to take the probabilistic properties of the mined data into account. In this paper, we start with presenting a simple probabilistic framework for transaction data which can be used to simulate transaction data when no associations are present. We use such data and a real-world database from a grocery outlet to explore the behavior of confidence and lift, two popular interest measures used for rule mining. The results show that confidence is systematically influenced by the frequency of the items in the left hand side of rules and that lift performs poorly to filter random noise in transaction data. Based on the probabilistic framework we develop two new interest measures, hyper-lift and hyper-confidence, which can be used to filter or order mined association rules. The new measures show significantly better performance than lift for applications where spurious rules are problematic

    Discovery of Frequent Itemsets: Frequent Item Tree-Based Approach

    Get PDF
    Mining frequent patterns in large transactional databases is a highly researched area in the field of data mining. Existing frequent pattern discovering algorithms suffer from many problems regarding the high memory dependency when mining large amount of data, computational and I/O cost. Additionally, the recursive mining process to mine these structures is also too voracious in memory resources. In this paper, we describe a more efficient algorithm for mining complete frequent itemsets from transactional databases. The suggested algorithm is partially based on FP-tree hypothesis and extracts the frequent itemsets directly from the tree. Its memory requirement, which is independent from the number of processed transactions, is another benefit of the new method. We present performance comparisons for our algorithm against the Apriori algorithm and FP-growth

    Discovery of Frequent Itemsets: Frequent Item Tree-Based Approach

    Get PDF
    Mining frequent patterns in large transactional databases is a highly researched area in the field of data mining. Existing frequent pattern discovering algorithms suffer from many problems regarding the high memory dependency when mining large amount of data, computational and I/O cost. Additionally, the recursive mining process to mine these structures is also too voracious in memory resources. In this paper, we describe a more efficient algorithm for mining complete frequent itemsets from transactional databases. The suggested algorithm is partially based on FP-tree hypothesis and extracts the frequent itemsets directly from the tree. Its memory requirement, which is independent from the number of processed transactions, is another benefit of the new method. We present performance comparisons for our algorithm against the Apriori algorithm and FP-growth
    • …
    corecore