871 research outputs found
Engineering Crowdsourced Stream Processing Systems
A crowdsourced stream processing system (CSP) is a system that incorporates
crowdsourced tasks in the processing of a data stream. This can be seen as
enabling crowdsourcing work to be applied on a sample of large-scale data at
high speed, or equivalently, enabling stream processing to employ human
intelligence. It also leads to a substantial expansion of the capabilities of
data processing systems. Engineering a CSP system requires the combination of
human and machine computation elements. From a general systems theory
perspective, this means taking into account inherited as well as emerging
properties from both these elements. In this paper, we position CSP systems
within a broader taxonomy, outline a series of design principles and evaluation
metrics, present an extensible framework for their design, and describe several
design patterns. We showcase the capabilities of CSP systems by performing a
case study that applies our proposed framework to the design and analysis of a
real system (AIDR) that classifies social media messages during time-critical
crisis events. Results show that compared to a pure stream processing system,
AIDR can achieve a higher data classification accuracy, while compared to a
pure crowdsourcing solution, the system makes better use of human workers by
requiring much less manual work effort
A Differentially Private Weighted Empirical Risk Minimization Procedure and its Application to Outcome Weighted Learning
It is commonplace to use data containing personal information to build
predictive models in the framework of empirical risk minimization (ERM). While
these models can be highly accurate in prediction, results obtained from these
models with the use of sensitive data may be susceptible to privacy attacks.
Differential privacy (DP) is an appealing framework for addressing such data
privacy issues by providing mathematically provable bounds on the privacy loss
incurred when releasing information from sensitive data. Previous work has
primarily concentrated on applying DP to unweighted ERM. We consider an
important generalization to weighted ERM (wERM). In wERM, each individual's
contribution to the objective function can be assigned varying weights. In this
context, we propose the first differentially private wERM algorithm, backed by
a rigorous theoretical proof of its DP guarantees under mild regularity
conditions. Extending the existing DP-ERM procedures to wERM paves a path to
deriving privacy-preserving learning methods for individualized treatment
rules, including the popular outcome weighted learning (OWL). We evaluate the
performance of the DP-wERM application to OWL in a simulation study and in a
real clinical trial of melatonin for sleep health. All empirical results
demonstrate the viability of training OWL models via wERM with DP guarantees
while maintaining sufficiently useful model performance. Therefore, we
recommend practitioners consider implementing the proposed privacy-preserving
OWL procedure in real-world scenarios involving sensitive data.Comment: 24 pages and 2 figures for the main manuscript, 5 pages and 2 figures
for the supplementary material
Who’s A Good Decision Maker? Data-Driven Expert Worker Ranking under Unobservable Quality
Evaluation of expert workers by their decision quality has substantial practical value, yet using other expert workers for decision quality evaluation tasks is costly and often infeasible. In this work, we frame the Ranking of Expert workers according to their unobserved decision Quality (REQ) -- without resorting to evaluation by other experts -- as a new Data Science problem. This problem is challenging, as the correct decisions are commonly unobservable and substantial parts of the information available to the decision maker is not available for retrospective decision evaluation. We propose a new machine learning approach to address this problem. We evaluate our method on one dataset representing real expert decisions and two public datasets, and find that our approach is successful in generating highly accurate rankings. Moreover, we observe that our approach’s superiority over the baseline is particularly prominent as evaluation settings become increasingly challenging
Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks
Future wireless networks have a substantial potential in terms of supporting
a broad range of complex compelling applications both in military and civilian
fields, where the users are able to enjoy high-rate, low-latency, low-cost and
reliable information services. Achieving this ambitious goal requires new radio
techniques for adaptive learning and intelligent decision making because of the
complex heterogeneous nature of the network structures and wireless services.
Machine learning (ML) algorithms have great success in supporting big data
analytics, efficient parameter estimation and interactive decision making.
Hence, in this article, we review the thirty-year history of ML by elaborating
on supervised learning, unsupervised learning, reinforcement learning and deep
learning. Furthermore, we investigate their employment in the compelling
applications of wireless networks, including heterogeneous networks (HetNets),
cognitive radios (CR), Internet of things (IoT), machine to machine networks
(M2M), and so on. This article aims for assisting the readers in clarifying the
motivation and methodology of the various ML algorithms, so as to invoke them
for hitherto unexplored services as well as scenarios of future wireless
networks.Comment: 46 pages, 22 fig
An Overview on Application of Machine Learning Techniques in Optical Networks
Today's telecommunication networks have become sources of enormous amounts of
widely heterogeneous data. This information can be retrieved from network
traffic traces, network alarms, signal quality indicators, users' behavioral
data, etc. Advanced mathematical tools are required to extract meaningful
information from these data and take decisions pertaining to the proper
functioning of the networks from the network-generated data. Among these
mathematical tools, Machine Learning (ML) is regarded as one of the most
promising methodological approaches to perform network-data analysis and enable
automated network self-configuration and fault management. The adoption of ML
techniques in the field of optical communication networks is motivated by the
unprecedented growth of network complexity faced by optical networks in the
last few years. Such complexity increase is due to the introduction of a huge
number of adjustable and interdependent system parameters (e.g., routing
configurations, modulation format, symbol rate, coding schemes, etc.) that are
enabled by the usage of coherent transmission/reception technologies, advanced
digital signal processing and compensation of nonlinear effects in optical
fiber propagation. In this paper we provide an overview of the application of
ML to optical communications and networking. We classify and survey relevant
literature dealing with the topic, and we also provide an introductory tutorial
on ML for researchers and practitioners interested in this field. Although a
good number of research papers have recently appeared, the application of ML to
optical networks is still in its infancy: to stimulate further work in this
area, we conclude the paper proposing new possible research directions
Gaining Insight into Determinants of Physical Activity using Bayesian Network Learning
Contains fulltext :
228326pre.pdf (preprint version ) (Open Access)
Contains fulltext :
228326pub.pdf (publisher's version ) (Open Access)BNAIC/BeneLearn 202
- …