564,642 research outputs found

    Bagging ensemble selection

    Get PDF
    Ensemble selection has recently appeared as a popular ensemble learning method, not only because its implementation is fairly straightforward, but also due to its excellent predictive performance on practical problems. The method has been highlighted in winning solutions of many data mining competitions, such as the Netix competition, the KDD Cup 2009 and 2010, the UCSD FICO contest 2010, and a number of data mining competitions on the Kaggle platform. In this paper we present a novel variant: bagging ensemble selection. Three variations of the proposed algorithm are compared to the original ensemble selection algorithm and other ensemble algorithms. Experiments with ten real world problems from diverse domains demonstrate the benefit of the bagging ensemble selection algorithm

    Data-mining chess databases

    Get PDF
    This is a report on the data-mining of two chess databases, the objective being to compare their sub-7-man content with perfect play as documented in Nalimov endgame tables. Van der Heijdenā€™s ENDGAME STUDY DATABASE IV is a definitive collection of 76,132 studies in which White should have an essentially unique route to the stipulated goal. Chessbaseā€™s BIG DATABASE 2010 holds some 4.5 million games. Insight gained into both database content and data-mining has led to some delightful surprises and created a further agenda

    Rural demographic change in the new century: slower growth, increased diversity

    Get PDF
    This brief examines rural demographic trends in the first decade of the twenty-first century using newly available data from the 2010 Census. The rural population grew by just 2.2 million between 2000 and 2010ā€”a gain barely half as great as that during the 1990s. Population growth was particularly slow in farming and mining counties and sharply reduced in rural manufacturing counties. Rural population gains were largest in high-amenity counties and just beyond the metropolitan fringe. Diversity accelerated in rural America, with racial and ethnic minorities accounting for 83 percent of rural population growth between 2000 and 2010

    Semi-Trusted Mixer Based Privacy Preserving Distributed Data Mining for Resource Constrained Devices

    Get PDF
    In this paper a homomorphic privacy preserving association rule mining algorithm is proposed which can be deployed in resource constrained devices (RCD). Privacy preserved exchange of counts of itemsets among distributed mining sites is a vital part in association rule mining process. Existing cryptography based privacy preserving solutions consume lot of computation due to complex mathematical equations involved. Therefore less computation involved privacy solutions are extremely necessary to deploy mining applications in RCD. In this algorithm, a semi-trusted mixer is used to unify the counts of itemsets encrypted by all mining sites without revealing individual values. The proposed algorithm is built on with a well known communication efficient association rule mining algorithm named count distribution (CD). Security proofs along with performance analysis and comparison show the well acceptability and effectiveness of the proposed algorithm. Efficient and straightforward privacy model and satisfactory performance of the protocol promote itself among one of the initiatives in deploying data mining application in RCD.Comment: IEEE Publication format, International Journal of Computer Science and Information Security, IJCSIS, Vol. 8 No. 1, April 2010, USA. ISSN 1947 5500, http://sites.google.com/site/ijcsis

    FP-tree and COFI Based Approach for Mining of Multiple Level Association Rules in Large Databases

    Full text link
    In recent years, discovery of association rules among itemsets in a large database has been described as an important database-mining problem. The problem of discovering association rules has received considerable research attention and several algorithms for mining frequent itemsets have been developed. Many algorithms have been proposed to discover rules at single concept level. However, mining association rules at multiple concept levels may lead to the discovery of more specific and concrete knowledge from data. The discovery of multiple level association rules is very much useful in many applications. In most of the studies for multiple level association rule mining, the database is scanned repeatedly which affects the efficiency of mining process. In this research paper, a new method for discovering multilevel association rules is proposed. It is based on FP-tree structure and uses cooccurrence frequent item tree to find frequent items in multilevel concept hierarchy.Comment: Pages IEEE format, International Journal of Computer Science and Information Security, IJCSIS, Vol. 7 No. 2, February 2010, USA. ISSN 1947 5500, http://sites.google.com/site/ijcsis
    • ā€¦
    corecore