564,642 research outputs found
Bagging ensemble selection
Ensemble selection has recently appeared as a popular ensemble learning method, not only because its implementation is fairly straightforward, but also due to its excellent predictive performance on practical problems. The method has been highlighted in winning solutions of many data mining competitions, such as the Netix competition, the KDD Cup 2009 and 2010, the UCSD FICO contest 2010, and a number of data mining competitions on the Kaggle platform. In this paper we present a novel variant: bagging ensemble selection. Three variations of the proposed algorithm are compared to the original ensemble selection algorithm and other ensemble algorithms. Experiments with ten real world problems from diverse domains demonstrate the benefit of the bagging ensemble selection algorithm
Data-mining chess databases
This is a report on the data-mining of two chess databases, the objective being to compare their sub-7-man content with perfect play as documented in Nalimov endgame tables. Van der Heijdenās ENDGAME STUDY DATABASE IV is a definitive collection of 76,132 studies in which White should have an essentially unique route to the stipulated goal. Chessbaseās BIG DATABASE 2010 holds some 4.5 million games. Insight gained into both database content and data-mining has led to some delightful surprises and created a further agenda
Rural demographic change in the new century: slower growth, increased diversity
This brief examines rural demographic trends in the first decade of the twenty-first century using newly available data from the 2010 Census. The rural population grew by just 2.2 million between 2000 and 2010āa gain barely half as great as that during the 1990s. Population growth was particularly slow in farming and mining counties and sharply reduced in rural manufacturing counties. Rural population gains were largest in high-amenity counties and just beyond the metropolitan fringe. Diversity accelerated in rural America, with racial and ethnic minorities accounting for 83 percent of rural population growth between 2000 and 2010
Semi-Trusted Mixer Based Privacy Preserving Distributed Data Mining for Resource Constrained Devices
In this paper a homomorphic privacy preserving association rule mining
algorithm is proposed which can be deployed in resource constrained devices
(RCD). Privacy preserved exchange of counts of itemsets among distributed
mining sites is a vital part in association rule mining process. Existing
cryptography based privacy preserving solutions consume lot of computation due
to complex mathematical equations involved. Therefore less computation involved
privacy solutions are extremely necessary to deploy mining applications in RCD.
In this algorithm, a semi-trusted mixer is used to unify the counts of itemsets
encrypted by all mining sites without revealing individual values. The proposed
algorithm is built on with a well known communication efficient association
rule mining algorithm named count distribution (CD). Security proofs along with
performance analysis and comparison show the well acceptability and
effectiveness of the proposed algorithm. Efficient and straightforward privacy
model and satisfactory performance of the protocol promote itself among one of
the initiatives in deploying data mining application in RCD.Comment: IEEE Publication format, International Journal of Computer Science
and Information Security, IJCSIS, Vol. 8 No. 1, April 2010, USA. ISSN 1947
5500, http://sites.google.com/site/ijcsis
FP-tree and COFI Based Approach for Mining of Multiple Level Association Rules in Large Databases
In recent years, discovery of association rules among itemsets in a large
database has been described as an important database-mining problem. The
problem of discovering association rules has received considerable research
attention and several algorithms for mining frequent itemsets have been
developed. Many algorithms have been proposed to discover rules at single
concept level. However, mining association rules at multiple concept levels may
lead to the discovery of more specific and concrete knowledge from data. The
discovery of multiple level association rules is very much useful in many
applications. In most of the studies for multiple level association rule
mining, the database is scanned repeatedly which affects the efficiency of
mining process. In this research paper, a new method for discovering multilevel
association rules is proposed. It is based on FP-tree structure and uses
cooccurrence frequent item tree to find frequent items in multilevel concept
hierarchy.Comment: Pages IEEE format, International Journal of Computer Science and
Information Security, IJCSIS, Vol. 7 No. 2, February 2010, USA. ISSN 1947
5500, http://sites.google.com/site/ijcsis
- ā¦