Search CORE

33,390 research outputs found

Leveraging Crowdsourcing Data For Deep Active Learning - An Application: Learning Intents in Alexa

Author: Damianou Andreas
Drake Thomas
Maarek Yoelle
Yang Jie
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

This paper presents a generic Bayesian framework that enables any deep learning model to actively learn from targeted crowds. Our framework inherits from recent advances in Bayesian deep learning, and extends existing work by considering the targeted crowdsourcing approach, where multiple annotators with unknown expertise contribute an uncontrolled amount (often limited) of annotations. Our framework leverages the low-rank structure in annotations to learn individual annotator expertise, which then helps to infer the true labels from noisy and sparse annotations. It provides a unified Bayesian model to simultaneously infer the true labels and train the deep learning model in order to reach an optimal learning efficacy. Finally, our framework exploits the uncertainty of the deep learning model during prediction as well as the annotators' estimated expertise to minimize the number of required annotations and annotators for optimally training the deep learning model. We evaluate the effectiveness of our framework for intent classification in Alexa (Amazon's personal assistant), using both synthetic and real-world datasets. Experiments show that our framework can accurately learn annotator expertise, infer true labels, and effectively reduce the amount of annotations in model training as compared to state-of-the-art approaches. We further discuss the potential of our proposed framework in bridging machine learning and crowdsourcing towards improved human-in-the-loop systems

arXiv.org e-Print Archive

Crossref

Quality of Information in Mobile Crowdsensing: Survey and Research Challenges

Author: Bhattacharjee Shameek
Das Sajal
Ghosh Nirnay
Melodia Tommaso
Restuccia Francesco
Publication venue
Publication date: 06/09/2017
Field of study

Smartphones have become the most pervasive devices in people's lives, and are clearly transforming the way we live and perceive technology. Today's smartphones benefit from almost ubiquitous Internet connectivity and come equipped with a plethora of inexpensive yet powerful embedded sensors, such as accelerometer, gyroscope, microphone, and camera. This unique combination has enabled revolutionary applications based on the mobile crowdsensing paradigm, such as real-time road traffic monitoring, air and noise pollution, crime control, and wildlife monitoring, just to name a few. Differently from prior sensing paradigms, humans are now the primary actors of the sensing process, since they become fundamental in retrieving reliable and up-to-date information about the event being monitored. As humans may behave unreliably or maliciously, assessing and guaranteeing Quality of Information (QoI) becomes more important than ever. In this paper, we provide a new framework for defining and enforcing the QoI in mobile crowdsensing, and analyze in depth the current state-of-the-art on the topic. We also outline novel research challenges, along with possible directions of future work.Comment: To appear in ACM Transactions on Sensor Networks (TOSN

arXiv.org e-Print Archive

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

A traffic classification method using machine learning algorithm

Author: Chishti Hamayoun Rauf
Publication venue: University of Bedfordshire
Publication date: 01/01/2013
Field of study

Applying concepts of attack investigation in IT industry, this idea has been developed to design a Traffic Classification Method using Data Mining techniques at the intersection of Machine Learning Algorithm, Which will classify the normal and malicious traffic. This classification will help to learn about the unknown attacks faced by IT industry. The notion of traffic classification is not a new concept; plenty of work has been done to classify the network traffic for heterogeneous application nowadays. Existing techniques such as (payload based, port based and statistical based) have their own pros and cons which will be discussed in this literature later, but classification using Machine Learning techniques is still an open field to explore and has provided very promising results up till now

University of Bedfordshire Repository

Utility in WTP space: a tool to address confounding random scale effects in destination choice to the Alps

Author: Scarpa Riccardo
Thiene Mara
Train Kenneth
Publication venue
Publication date: 01/12/2006
Field of study

Destination choice models with individual-specific taste variation have become the presumptive analytical approach in applied nonmarket valuation. Under the usual specification, tastes are represented by coefficients of site attributes that enter utility, and the distribution of these coefficients is estimated. The distribution of willingness-to-pay (WTP) for site attributes is then derived from the estimated distribution of coefficients. Though conceptually appealing this procedure often results in untenable distributions of willingness to pay. An alternative procedure is to estimate the distribution of willingness to pay directly, through a re-parameterization of the model. We compare hierarchical Bayes and maximum simulated likelihood estimates under both approaches, using data on site choice in the Alps. We find that models parameterized in terms of WTP provide more reasonable estimates for the distribution of WTP, and also fit the data better than models parameterized in terms of attribute coefficients. This approach to parameterizing utility is hence deemed promising for applied nonmarket valuation

Research Commons@Waikato

Data Mining in Electronic Commerce

Author: Banks David L.
Said Yasmin H.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 07/09/2006
Field of study

Modern business is rushing toward e-commerce. If the transition is done properly, it enables better management, new services, lower transaction costs and better customer relations. Success depends on skilled information technologists, among whom are statisticians. This paper focuses on some of the contributions that statisticians are making to help change the business world, especially through the development and application of data mining methods. This is a very large area, and the topics we cover are chosen to avoid overlap with other papers in this special issue, as well as to respect the limitations of our expertise. Inevitably, electronic commerce has raised and is raising fresh research problems in a very wide range of statistical areas, and we try to emphasize those challenges.Comment: Published at http://dx.doi.org/10.1214/088342306000000204 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Ranking relations using analogies in biological and information networks

Author: Airoldi EM
Ghahramani Z
Heller K
Silva R
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/06/2010
Field of study

Analogical reasoning depends fundamentally on the ability to learn and generalize about relations between objects. We develop an approach to relational learning which, given a set of pairs of objects

\mathbf{S}=\{A^{(1)}:B^{(1)},A^{(2)}:B^{(2)},\ldots,A^{(N)}:B ^{(N)}\}

, measures how well other pairs A:B fit in with the set

\mathbf{S}

. Our work addresses the following question: is the relation between objects A and B analogous to those relations found in

\mathbf{S}

? Such questions are particularly relevant in information retrieval, where an investigator might want to search for analogous pairs of objects that match the query set of interest. There are many ways in which objects can be related, making the task of measuring analogies very challenging. Our approach combines a similarity measure on function spaces with Bayesian analysis to produce a ranking. It requires data containing features of the objects of interest and a link matrix specifying which relationships exist; no further attributes of such relationships are necessary. We illustrate the potential of our method on text analysis and information networks. An application on discovering functional interactions between pairs of proteins is discussed in detail, where we show that our approach can work in practice even if a small set of protein pairs is provided.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS321 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Harvard University - DASH

UCL Discovery

CUED - Cambridge University Engineering Department

Evaluation of the Performance of the Markov Blanket Bayesian Classifier Algorithm

Author: Madden Michael G.
Publication venue
Publication date: 01/01/2002
Field of study

The Markov Blanket Bayesian Classifier is a recently-proposed algorithm for construction of probabilistic classifiers. This paper presents an empirical comparison of the MBBC algorithm with three other Bayesian classifiers: Naive Bayes, Tree-Augmented Naive Bayes and a general Bayesian network. All of these are implemented using the K2 framework of Cooper and Herskovits. The classifiers are compared in terms of their performance (using simple accuracy measures and ROC curves) and speed, on a range of standard benchmark data sets. It is concluded that MBBC is competitive in terms of speed and accuracy with the other algorithms considered.Comment: 9 pages: Technical Report No. NUIG-IT-011002, Department of Information Technology, National University of Ireland, Galway (2002

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

A systematic review of software development cost estimation studies

Author: Jørgensen M
Shepperd MJ
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

This paper aims to provide a basis for the improvement of software estimation research through a systematic review of previous work. The review identifies 304 software cost estimation papers in 76 journals and classifies the papers according to research topic, estimation approach, research approach, study context and data set. A web-based library of these cost estimation papers is provided to ease the identification of relevant estimation research results. The review results combined with other knowledge provide support for recommendations for future software cost estimation research, including: 1) Increase the breadth of the search for relevant studies, 2) Search manually for relevant papers within a carefully selected set of journals when completeness is essential, 3) Conduct more studies on estimation methods commonly used by the software industry, and, 4) Increase the awareness of how properties of the data sets impact the results when evaluating estimation methods

Brunel University Research Archive