33,390 research outputs found
Leveraging Crowdsourcing Data For Deep Active Learning - An Application: Learning Intents in Alexa
This paper presents a generic Bayesian framework that enables any deep
learning model to actively learn from targeted crowds. Our framework inherits
from recent advances in Bayesian deep learning, and extends existing work by
considering the targeted crowdsourcing approach, where multiple annotators with
unknown expertise contribute an uncontrolled amount (often limited) of
annotations. Our framework leverages the low-rank structure in annotations to
learn individual annotator expertise, which then helps to infer the true labels
from noisy and sparse annotations. It provides a unified Bayesian model to
simultaneously infer the true labels and train the deep learning model in order
to reach an optimal learning efficacy. Finally, our framework exploits the
uncertainty of the deep learning model during prediction as well as the
annotators' estimated expertise to minimize the number of required annotations
and annotators for optimally training the deep learning model.
We evaluate the effectiveness of our framework for intent classification in
Alexa (Amazon's personal assistant), using both synthetic and real-world
datasets. Experiments show that our framework can accurately learn annotator
expertise, infer true labels, and effectively reduce the amount of annotations
in model training as compared to state-of-the-art approaches. We further
discuss the potential of our proposed framework in bridging machine learning
and crowdsourcing towards improved human-in-the-loop systems
Quality of Information in Mobile Crowdsensing: Survey and Research Challenges
Smartphones have become the most pervasive devices in people's lives, and are
clearly transforming the way we live and perceive technology. Today's
smartphones benefit from almost ubiquitous Internet connectivity and come
equipped with a plethora of inexpensive yet powerful embedded sensors, such as
accelerometer, gyroscope, microphone, and camera. This unique combination has
enabled revolutionary applications based on the mobile crowdsensing paradigm,
such as real-time road traffic monitoring, air and noise pollution, crime
control, and wildlife monitoring, just to name a few. Differently from prior
sensing paradigms, humans are now the primary actors of the sensing process,
since they become fundamental in retrieving reliable and up-to-date information
about the event being monitored. As humans may behave unreliably or
maliciously, assessing and guaranteeing Quality of Information (QoI) becomes
more important than ever. In this paper, we provide a new framework for
defining and enforcing the QoI in mobile crowdsensing, and analyze in depth the
current state-of-the-art on the topic. We also outline novel research
challenges, along with possible directions of future work.Comment: To appear in ACM Transactions on Sensor Networks (TOSN
A traffic classification method using machine learning algorithm
Applying concepts of attack investigation in IT industry, this idea has been developed to design
a Traffic Classification Method using Data Mining techniques at the intersection of Machine
Learning Algorithm, Which will classify the normal and malicious traffic. This classification will
help to learn about the unknown attacks faced by IT industry. The notion of traffic classification
is not a new concept; plenty of work has been done to classify the network traffic for
heterogeneous application nowadays. Existing techniques such as (payload based, port based
and statistical based) have their own pros and cons which will be discussed in this
literature later, but classification using Machine Learning techniques is still an open field to explore and has provided very promising results up till now
Utility in WTP space: a tool to address confounding random scale effects in destination choice to the Alps
Destination choice models with individual-specific taste variation have become the presumptive analytical approach in applied nonmarket valuation. Under the usual specification, tastes are represented by coefficients of site attributes that enter utility, and the distribution of these coefficients is estimated. The distribution of willingness-to-pay (WTP) for site attributes is then derived from the estimated distribution of coefficients. Though conceptually appealing this procedure often results in untenable distributions of willingness to pay. An alternative procedure is to estimate the distribution of willingness to pay directly, through a re-parameterization of the model. We compare hierarchical Bayes and maximum simulated likelihood estimates under both approaches, using data on site choice in the Alps. We find that models parameterized in terms of WTP provide more reasonable estimates for the distribution of WTP, and also fit the data better than models parameterized in terms of attribute coefficients. This approach to parameterizing utility is hence deemed promising for applied nonmarket valuation
Data Mining in Electronic Commerce
Modern business is rushing toward e-commerce. If the transition is done
properly, it enables better management, new services, lower transaction costs
and better customer relations. Success depends on skilled information
technologists, among whom are statisticians. This paper focuses on some of the
contributions that statisticians are making to help change the business world,
especially through the development and application of data mining methods. This
is a very large area, and the topics we cover are chosen to avoid overlap with
other papers in this special issue, as well as to respect the limitations of
our expertise. Inevitably, electronic commerce has raised and is raising fresh
research problems in a very wide range of statistical areas, and we try to
emphasize those challenges.Comment: Published at http://dx.doi.org/10.1214/088342306000000204 in the
Statistical Science (http://www.imstat.org/sts/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Ranking relations using analogies in biological and information networks
Analogical reasoning depends fundamentally on the ability to learn and
generalize about relations between objects. We develop an approach to
relational learning which, given a set of pairs of objects
,
measures how well other pairs A:B fit in with the set . Our work
addresses the following question: is the relation between objects A and B
analogous to those relations found in ? Such questions are
particularly relevant in information retrieval, where an investigator might
want to search for analogous pairs of objects that match the query set of
interest. There are many ways in which objects can be related, making the task
of measuring analogies very challenging. Our approach combines a similarity
measure on function spaces with Bayesian analysis to produce a ranking. It
requires data containing features of the objects of interest and a link matrix
specifying which relationships exist; no further attributes of such
relationships are necessary. We illustrate the potential of our method on text
analysis and information networks. An application on discovering functional
interactions between pairs of proteins is discussed in detail, where we show
that our approach can work in practice even if a small set of protein pairs is
provided.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS321 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Evaluation of the Performance of the Markov Blanket Bayesian Classifier Algorithm
The Markov Blanket Bayesian Classifier is a recently-proposed algorithm for
construction of probabilistic classifiers. This paper presents an empirical
comparison of the MBBC algorithm with three other Bayesian classifiers: Naive
Bayes, Tree-Augmented Naive Bayes and a general Bayesian network. All of these
are implemented using the K2 framework of Cooper and Herskovits. The
classifiers are compared in terms of their performance (using simple accuracy
measures and ROC curves) and speed, on a range of standard benchmark data sets.
It is concluded that MBBC is competitive in terms of speed and accuracy with
the other algorithms considered.Comment: 9 pages: Technical Report No. NUIG-IT-011002, Department of
Information Technology, National University of Ireland, Galway (2002
Recommended from our members
A systematic review of software development cost estimation studies
This paper aims to provide a basis for the improvement of software estimation research through a systematic review of previous work. The review identifies 304 software cost estimation papers in 76 journals and classifies the papers according to research topic, estimation approach, research approach, study context and data set. A web-based library of these cost estimation papers is provided to ease the identification of relevant estimation research results. The review results combined with other knowledge provide support for recommendations for future software cost estimation research, including: 1) Increase the breadth of the search for relevant studies, 2) Search manually for relevant papers within a carefully selected set of journals when completeness is essential, 3) Conduct more studies on estimation methods commonly used by the software industry, and, 4) Increase the awareness of how properties of the data sets impact the results when evaluating estimation methods
- âŠ