5,511 research outputs found
Cross-Device Tracking: Matching Devices and Cookies
The number of computers, tablets and smartphones is increasing rapidly, which
entails the ownership and use of multiple devices to perform online tasks. As
people move across devices to complete these tasks, their identities becomes
fragmented. Understanding the usage and transition between those devices is
essential to develop efficient applications in a multi-device world. In this
paper we present a solution to deal with the cross-device identification of
users based on semi-supervised machine learning methods to identify which
cookies belong to an individual using a device. The method proposed in this
paper scored third in the ICDM 2015 Drawbridge Cross-Device Connections
challenge proving its good performance
COMET: A Recipe for Learning and Using Large Ensembles on Massive Data
COMET is a single-pass MapReduce algorithm for learning on large-scale data.
It builds multiple random forest ensembles on distributed blocks of data and
merges them into a mega-ensemble. This approach is appropriate when learning
from massive-scale data that is too large to fit on a single machine. To get
the best accuracy, IVoting should be used instead of bagging to generate the
training subset for each decision tree in the random forest. Experiments with
two large datasets (5GB and 50GB compressed) show that COMET compares favorably
(in both accuracy and training time) to learning on a subsample of data using a
serial algorithm. Finally, we propose a new Gaussian approach for lazy ensemble
evaluation which dynamically decides how many ensemble members to evaluate per
data point; this can reduce evaluation cost by 100X or more
Machine Learning Playground
Machine learning is a science that “learns” about the data by finding unique patterns and relations in the data. There are a lot of libraries or tools available for processing machine learning datasets. You can upload your dataset in seconds and quickly start using these tools to get prediction results in a few minutes. However, generating an optimal model is a time consuming and tedious task. The tunable parameters (hyper-parameters) of any machine learning model may greatly affect the accuracy metrics. While most of the tools have models with default parameter setting to provide good results, they can often fail to provide optimal results for reallife datasets. This project will be to develop a GUI application where a user could upload a dataset and dynamically visualize accuracy results based on the selected algorithm and its hyperparameters
- …