37,497 research outputs found
Causally Regularized Learning with Agnostic Data Selection Bias
Most of previous machine learning algorithms are proposed based on the i.i.d.
hypothesis. However, this ideal assumption is often violated in real
applications, where selection bias may arise between training and testing
process. Moreover, in many scenarios, the testing data is not even available
during the training process, which makes the traditional methods like transfer
learning infeasible due to their need on prior of test distribution. Therefore,
how to address the agnostic selection bias for robust model learning is of
paramount importance for both academic research and real applications. In this
paper, under the assumption that causal relationships among variables are
robust across domains, we incorporate causal technique into predictive modeling
and propose a novel Causally Regularized Logistic Regression (CRLR) algorithm
by jointly optimize global confounder balancing and weighted logistic
regression. Global confounder balancing helps to identify causal features,
whose causal effect on outcome are stable across domains, then performing
logistic regression on those causal features constructs a robust predictive
model against the agnostic bias. To validate the effectiveness of our CRLR
algorithm, we conduct comprehensive experiments on both synthetic and real
world datasets. Experimental results clearly demonstrate that our CRLR
algorithm outperforms the state-of-the-art methods, and the interpretability of
our method can be fully depicted by the feature visualization.Comment: Oral paper of 2018 ACM Multimedia Conference (MM'18
The Dark Side(-Channel) of Mobile Devices: A Survey on Network Traffic Analysis
In recent years, mobile devices (e.g., smartphones and tablets) have met an
increasing commercial success and have become a fundamental element of the
everyday life for billions of people all around the world. Mobile devices are
used not only for traditional communication activities (e.g., voice calls and
messages) but also for more advanced tasks made possible by an enormous amount
of multi-purpose applications (e.g., finance, gaming, and shopping). As a
result, those devices generate a significant network traffic (a consistent part
of the overall Internet traffic). For this reason, the research community has
been investigating security and privacy issues that are related to the network
traffic generated by mobile devices, which could be analyzed to obtain
information useful for a variety of goals (ranging from device security and
network optimization, to fine-grained user profiling).
In this paper, we review the works that contributed to the state of the art
of network traffic analysis targeting mobile devices. In particular, we present
a systematic classification of the works in the literature according to three
criteria: (i) the goal of the analysis; (ii) the point where the network
traffic is captured; and (iii) the targeted mobile platforms. In this survey,
we consider points of capturing such as Wi-Fi Access Points, software
simulation, and inside real mobile devices or emulators. For the surveyed
works, we review and compare analysis techniques, validation methods, and
achieved results. We also discuss possible countermeasures, challenges and
possible directions for future research on mobile traffic analysis and other
emerging domains (e.g., Internet of Things). We believe our survey will be a
reference work for researchers and practitioners in this research field.Comment: 55 page
Machine learning for targeted display advertising: Transfer learning in action
This paper presents a detailed discussion of problem formulation and
data representation issues in the design, deployment, and operation of a
massive-scale machine learning system for targeted display advertising.
Notably, the machine learning system itself is deployed and has been in
continual use for years, for thousands of advertising campaigns (in
contrast to simply having the models from the system be deployed). In
this application, acquiring sufficient data for training from the ideal
sampling distribution is prohibitively expensive. Instead, data are
drawn from surrogate domains and learning tasks, and then transferred
to the target task. We present the design of this multistage transfer
learning system, highlighting the problem formulation aspects. We then
present a detailed experimental evaluation, showing that the different
transfer stages indeed each add value. We next present production
results across a variety of advertising clients from a variety of
industries, illustrating the performance of the system in use. We close
the paper with a collection of lessons learned from the work over half a
decade on this complex, deployed, and broadly used machine learning system.Statistics Working Papers Serie
Using contextual information to understand searching and browsing behavior
There is great imbalance in the richness of information on the web and the succinctness and poverty of search requests of web users, making their queries only a partial description of the underlying complex information needs. Finding ways to better leverage contextual information and make search context-aware holds the promise to dramatically improve the search experience of users. We conducted a series of studies to discover, model and utilize contextual information in order to understand and improve users' searching and browsing behavior on the web. Our results capture important aspects of context under the realistic conditions of different online search services, aiming to ensure that our scientific insights and solutions transfer to the operational settings of real world applications
Sensing Subjective Well-being from Social Media
Subjective Well-being(SWB), which refers to how people experience the quality
of their lives, is of great use to public policy-makers as well as economic,
sociological research, etc. Traditionally, the measurement of SWB relies on
time-consuming and costly self-report questionnaires. Nowadays, people are
motivated to share their experiences and feelings on social media, so we
propose to sense SWB from the vast user generated data on social media. By
utilizing 1785 users' social media data with SWB labels, we train machine
learning models that are able to "sense" individual SWB from users' social
media. Our model, which attains the state-by-art prediction accuracy, can then
be used to identify SWB of large population of social media users in time with
very low cost.Comment: 12 pages, 1 figures, 2 tables, 10th International Conference, AMT
2014, Warsaw, Poland, August 11-14, 2014. Proceeding
- …