5 research outputs found
Empirical analysis of classifiers and feature selection techniques on mobile phone data activities
Mobile phones nowadays become ubiquitous device and not only a device to facilitate communication, with some addition feature of hardware and software.There are many activities can be captured using mobile phone with many of features.However, not all of these features could benefit to the in processing and analyzer.The large number of features, in some cases, gives less accuracy influence the result. In the same time, a large feature takes requires longer time to build
model. This paper aims to analyze accuracy impact of selected feature selection techniques and classifiers that taken on mobile phone activity data and evaluate the method. Furthermore, with use feature selection and discussed emphasis on
accuracy impact on classified data of respective classifier, usage of features can be determined. To find the suitable combination between the classifier and the feature selection sometime is crucial. A series of tests conducted in Weka on the accuracy on feature selection shows a consistency on the results although with different order of features.The result found that combination of K* algorithm and correlation feature selection is the best combination with high accuracy rate and in the same time produce less feature subset
Empirical analysis of classifiers and feature selection techniques on mobile phone data activities
Mobile phones nowadays become ubiquitous device and not only a device to facilitate communication, with some addition feature of hardware and software.There are many activities can be captured using mobile phone with many of features.However, not all of these features could benefit to the in processing and analyzer.The large number of features, in some cases, gives less accuracy influence the result. In the same time, a large feature takes requires longer time to build
model. This paper aims to analyze accuracy impact of selected feature selection techniques and classifiers that taken on mobile phone activity data and evaluate the method. Furthermore, with use feature selection and discussed emphasis on
accuracy impact on classified data of respective classifier, usage of features can be determined. To find the suitable combination between the classifier and the feature selection sometime is crucial. A series of tests conducted in Weka on the accuracy on feature selection shows a consistency on the results although with different order of features.The result found that combination of K* algorithm and correlation feature selection is the best combination with high accuracy rate and in the same time produce less feature subset
Consistent subset sampling
Consistent sampling is a technique for specifying, in small space, a subset
of a potentially large universe such that the elements in satisfy a
suitably chosen sampling condition. Given a subset it
should be possible to quickly compute , i.e., the elements
in satisfying the sampling condition. Consistent sampling has
important applications in similarity estimation, and estimation of the number
of distinct items in a data stream.
In this paper we generalize consistent sampling to the setting where we are
interested in sampling size- subsets occurring in some set in a collection
of sets of bounded size , where is a small integer. This can be done by
applying standard consistent sampling to the -subsets of each set, but that
approach requires time . Using a carefully designed hash function,
for a given sampling probability , we show how to improve the time
complexity to in expectation,
while maintaining strong concentration bounds for the sample. The space usage
of our method is .
We demonstrate the utility of our technique by applying it to several
well-studied data mining problems. We show how to efficiently estimate the
number of frequent -itemsets in a stream of transactions and the number of
bipartite cliques in a graph given as incidence stream. Further, building upon
a recent work by Campagna et al., we show that our approach can be applied to
frequent itemset mining in a parallel or distributed setting. We also present
applications in graph stream mining.Comment: To appear in SWAT 201