Search CORE

5,511 research outputs found

Cross-Device Tracking: Matching Devices and Cookies

Author: Díaz-Morales Roberto
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/10/2015
Field of study

The number of computers, tablets and smartphones is increasing rapidly, which entails the ownership and use of multiple devices to perform online tasks. As people move across devices to complete these tasks, their identities becomes fragmented. Understanding the usage and transition between those devices is essential to develop efficient applications in a multi-device world. In this paper we present a solution to deal with the cross-device identification of users based on semi-supervised machine learning methods to identify which cookies belong to an individual using a device. The method proposed in this paper scored third in the ICDM 2015 Drawbridge Cross-Device Connections challenge proving its good performance

arXiv.org e-Print Archive

Crossref

COMET: A Recipe for Learning and Using Large Ensembles on Massive Data

Author: Basilico Justin D.
Dixon Kevin R.
Kegelmeyer W. Philip
Kolda Tamara G.
Munson M. Arthur
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

COMET is a single-pass MapReduce algorithm for learning on large-scale data. It builds multiple random forest ensembles on distributed blocks of data and merges them into a mega-ensemble. This approach is appropriate when learning from massive-scale data that is too large to fit on a single machine. To get the best accuracy, IVoting should be used instead of bagging to generate the training subset for each decision tree in the random forest. Experiments with two large datasets (5GB and 50GB compressed) show that COMET compares favorably (in both accuracy and training time) to learning on a subsample of data using a serial algorithm. Finally, we propose a new Gaussian approach for lazy ensemble evaluation which dynamically decides how many ensemble members to evaluate per data point; this can reduce evaluation cost by 100X or more

arXiv.org e-Print Archive

CiteSeerX

Machine Learning Playground

Author: Khan Adil
Publication venue: SJSU ScholarWorks
Publication date: 01/04/2018
Field of study

Machine learning is a science that “learns” about the data by finding unique patterns and relations in the data. There are a lot of libraries or tools available for processing machine learning datasets. You can upload your dataset in seconds and quickly start using these tools to get prediction results in a few minutes. However, generating an optimal model is a time consuming and tedious task. The tunable parameters (hyper-parameters) of any machine learning model may greatly affect the accuracy metrics. While most of the tools have models with default parameter setting to provide good results, they can often fail to provide optimal results for reallife datasets. This project will be to develop a GUI application where a user could upload a dataset and dynamically visualize accuracy results based on the selected algorithm and its hyperparameters

SJSU ScholarWorks