5 research outputs found

    Empirical analysis of classifiers and feature selection techniques on mobile phone data activities

    Get PDF
    Mobile phones nowadays become ubiquitous device and not only a device to facilitate communication, with some addition feature of hardware and software.There are many activities can be captured using mobile phone with many of features.However, not all of these features could benefit to the in processing and analyzer.The large number of features, in some cases, gives less accuracy influence the result. In the same time, a large feature takes requires longer time to build model. This paper aims to analyze accuracy impact of selected feature selection techniques and classifiers that taken on mobile phone activity data and evaluate the method. Furthermore, with use feature selection and discussed emphasis on accuracy impact on classified data of respective classifier, usage of features can be determined. To find the suitable combination between the classifier and the feature selection sometime is crucial. A series of tests conducted in Weka on the accuracy on feature selection shows a consistency on the results although with different order of features.The result found that combination of K* algorithm and correlation feature selection is the best combination with high accuracy rate and in the same time produce less feature subset

    Empirical analysis of classifiers and feature selection techniques on mobile phone data activities

    Get PDF
    Mobile phones nowadays become ubiquitous device and not only a device to facilitate communication, with some addition feature of hardware and software.There are many activities can be captured using mobile phone with many of features.However, not all of these features could benefit to the in processing and analyzer.The large number of features, in some cases, gives less accuracy influence the result. In the same time, a large feature takes requires longer time to build model. This paper aims to analyze accuracy impact of selected feature selection techniques and classifiers that taken on mobile phone activity data and evaluate the method. Furthermore, with use feature selection and discussed emphasis on accuracy impact on classified data of respective classifier, usage of features can be determined. To find the suitable combination between the classifier and the feature selection sometime is crucial. A series of tests conducted in Weka on the accuracy on feature selection shows a consistency on the results although with different order of features.The result found that combination of K* algorithm and correlation feature selection is the best combination with high accuracy rate and in the same time produce less feature subset

    Consistent subset sampling

    No full text
    Consistent sampling is a technique for specifying, in small space, a subset SS of a potentially large universe UU such that the elements in SS satisfy a suitably chosen sampling condition. Given a subset IU\mathcal{I}\subseteq U it should be possible to quickly compute IS\mathcal{I}\cap S, i.e., the elements in I\mathcal{I} satisfying the sampling condition. Consistent sampling has important applications in similarity estimation, and estimation of the number of distinct items in a data stream. In this paper we generalize consistent sampling to the setting where we are interested in sampling size-kk subsets occurring in some set in a collection of sets of bounded size bb, where kk is a small integer. This can be done by applying standard consistent sampling to the kk-subsets of each set, but that approach requires time Θ(bk)\Theta(b^k). Using a carefully designed hash function, for a given sampling probability p(0,1]p \in (0,1], we show how to improve the time complexity to Θ(bk/2loglogb+pbk)\Theta(b^{\lceil k/2\rceil}\log \log b + pb^k) in expectation, while maintaining strong concentration bounds for the sample. The space usage of our method is Θ(bk/4)\Theta(b^{\lceil k/4\rceil}). We demonstrate the utility of our technique by applying it to several well-studied data mining problems. We show how to efficiently estimate the number of frequent kk-itemsets in a stream of transactions and the number of bipartite cliques in a graph given as incidence stream. Further, building upon a recent work by Campagna et al., we show that our approach can be applied to frequent itemset mining in a parallel or distributed setting. We also present applications in graph stream mining.Comment: To appear in SWAT 201
    corecore