Search CORE

5 research outputs found

Empirical analysis of classifiers and feature selection techniques on mobile phone data activities

Author: Cahen E.J. (Ewan Jacov)
Mandjes M.R.H. (Michel)
Zwart A.P. (Bert)
Publication venue: Asian Research Publishing Network (ARPN)
Publication date: 01/05/2016
Field of study

Mobile phones nowadays become ubiquitous device and not only a device to facilitate communication, with some addition feature of hardware and software.There are many activities can be captured using mobile phone with many of features.However, not all of these features could benefit to the in processing and analyzer.The large number of features, in some cases, gives less accuracy influence the result. In the same time, a large feature takes requires longer time to build model. This paper aims to analyze accuracy impact of selected feature selection techniques and classifiers that taken on mobile phone activity data and evaluate the method. Furthermore, with use feature selection and discussed emphasis on accuracy impact on classified data of respective classifier, usage of features can be determined. To find the suitable combination between the classifier and the feature selection sometime is crucial. A series of tests conducted in Weka on the accuracy on feature selection shows a consistency on the results although with different order of features.The result found that combination of K* algorithm and correlation feature selection is the best combination with high accuracy rate and in the same time produce less feature subset

UUM Repository

Repository TU/e

Crossref

CWI's Institutional Repository

Pure OAI Repository

International Migration, Integration and Social Cohesion online publications

Empirical analysis of classifiers and feature selection techniques on mobile phone data activities

Author: Harmaini Fandi Husen
Mahmuddin Massudi
Publication venue: Asian Research Publishing Network (ARPN)
Publication date: 01/05/2016
Field of study

UUM Repository

Consistent subset sampling

Author: A.Z. Broder
A.Z. Broder
D.E. Willard
D.M. Kane
F. Geerts
G. Cormode
G.J. Woeginger
I. Baran
I. Dinur
L.S. Buriol
M. Charikar
M. Dietzfelbinger
P. Indyk
R. Impagliazzo
R. Schroeppel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Consistent sampling is a technique for specifying, in small space, a subset

S

of a potentially large universe

U

such that the elements in

S

satisfy a suitably chosen sampling condition. Given a subset

\mathcal{I}\subseteq U

it should be possible to quickly compute

\mathcal{I}\cap S

, i.e., the elements in

\mathcal{I}

satisfying the sampling condition. Consistent sampling has important applications in similarity estimation, and estimation of the number of distinct items in a data stream. In this paper we generalize consistent sampling to the setting where we are interested in sampling size-

k

subsets occurring in some set in a collection of sets of bounded size

b

, where

k

is a small integer. This can be done by applying standard consistent sampling to the

k

-subsets of each set, but that approach requires time

\Theta(b^k)

. Using a carefully designed hash function, for a given sampling probability

p \in (0,1]

, we show how to improve the time complexity to

\Theta(b^{\lceil k/2\rceil}\log \log b + pb^k)

in expectation, while maintaining strong concentration bounds for the sample. The space usage of our method is

\Theta(b^{\lceil k/4\rceil})

. We demonstrate the utility of our technique by applying it to several well-studied data mining problems. We show how to efficiently estimate the number of frequent

k

-itemsets in a stream of transactions and the number of bipartite cliques in a graph given as incidence stream. Further, building upon a recent work by Campagna et al., we show that our approach can be applied to frequent itemset mining in a parallel or distributed setting. We also present applications in graph stream mining.Comment: To appear in SWAT 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

The IT University of Copenhagen's Repository