Search CORE

14,470 research outputs found

Supervised Collective Classification for Crowdsourcing

Author: Chen Pin-Yu
Cheng Shin-Ming
Chu Fu-Jen
Lien Chia-Wei
Ting Pai-Shun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/09/2015
Field of study

Crowdsourcing utilizes the wisdom of crowds for collective classification via information (e.g., labels of an item) provided by labelers. Current crowdsourcing algorithms are mainly unsupervised methods that are unaware of the quality of crowdsourced data. In this paper, we propose a supervised collective classification algorithm that aims to identify reliable labelers from the training data (e.g., items with known labels). The reliability (i.e., weighting factor) of each labeler is determined via a saddle point algorithm. The results on several crowdsourced data show that supervised methods can achieve better classification accuracy than unsupervised methods, and our proposed method outperforms other algorithms.Comment: to appear in IEEE Global Communications Conference (GLOBECOM) Workshop on Networking and Collaboration Issues for the Internet of Everythin

arXiv.org e-Print Archive

Crossref

Semi-Supervised Sparse Coding

Author: Gao Xin
Wang Jim Jing-Yan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/01/2015
Field of study

Sparse coding approximates the data sample as a sparse linear combination of some basic codewords and uses the sparse codes as new presentations. In this paper, we investigate learning discriminative sparse codes by sparse coding in a semi-supervised manner, where only a few training samples are labeled. By using the manifold structure spanned by the data set of both labeled and unlabeled samples and the constraints provided by the labels of the labeled samples, we learn the variable class labels for all the samples. Furthermore, to improve the discriminative ability of the learned sparse codes, we assume that the class labels could be predicted from the sparse codes directly using a linear classifier. By solving the codebook, sparse codes, class labels and classifier parameters simultaneously in a unified objective function, we develop a semi-supervised sparse coding algorithm. Experiments on two real-world pattern recognition problems demonstrate the advantage of the proposed methods over supervised sparse coding methods on partially labeled data sets

arXiv.org e-Print Archive

Crossref

Fidelity-Weighted Learning

Author: Dehghani Mostafa
Gouws Stephan
Kamps Jaap
Mehrjou Arash
Schölkopf Bernhard
Publication venue
Publication date: 01/01/2018
Field of study

Training deep neural networks requires many training samples, but in practice training labels are expensive to obtain and may be of varying quality, as some may be from trusted expert labelers while others might be from heuristics or other sources of weak supervision such as crowd-sourcing. This creates a fundamental quality versus-quantity trade-off in the learning process. Do we learn from the small amount of high-quality data or the potentially large amount of weakly-labeled data? We argue that if the learner could somehow know and take the label-quality into account when learning the data representation, we could get the best of both worlds. To this end, we propose "fidelity-weighted learning" (FWL), a semi-supervised student-teacher approach for training deep neural networks using weakly-labeled data. FWL modulates the parameter updates to a student network (trained on the task we care about) on a per-sample basis according to the posterior confidence of its label-quality estimated by a teacher (who has access to the high-quality labels). Both student and teacher are learned from the data. We evaluate FWL on two tasks in information retrieval and natural language processing where we outperform state-of-the-art alternative semi-supervised methods, indicating that our approach makes better use of strong and weak labels, and leads to better task-dependent data representations.Comment: Published as a conference paper at ICLR 201

arXiv.org e-Print Archive

MPG.PuRe

Learning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains

Author: Chawla N. V.
Karakoulas Grigoris
Publication venue: 'AI Access Foundation'
Publication date: 09/09/2011
Field of study

There has been increased interest in devising learning techniques that combine unlabeled data with labeled data ? i.e. semi-supervised learning. However, to the best of our knowledge, no study has been performed across various techniques and different types and amounts of labeled and unlabeled data. Moreover, most of the published work on semi-supervised learning techniques assumes that the labeled and unlabeled data come from the same distribution. It is possible for the labeling process to be associated with a selection bias such that the distributions of data points in the labeled and unlabeled sets are different. Not correcting for such bias can result in biased function approximation with potentially poor performance. In this paper, we present an empirical study of various semi-supervised learning techniques on a variety of datasets. We attempt to answer various questions such as the effect of independence or relevance amongst features, the effect of the size of the labeled and unlabeled sets and the effect of noise. We also investigate the impact of sample-selection bias on the semi-supervised learning techniques under study and implement a bivariate probit technique particularly designed to correct for such bias

arXiv.org e-Print Archive

Crossref

Semi-Supervised Approach to Monitoring Clinical Depressive Symptoms in Social Media

Author: Al-Olimat Hussein S.
Bajaj Goonmeet
Banerjee Tanvi
Ebrahimi Monireh
Pathak Jyotishman
Sheth Amit
Thirunarayan Krishnaprasad
Yazdavar Amir Hossein
Publication venue
Publication date: 01/01/2017
Field of study

With the rise of social media, millions of people are routinely expressing their moods, feelings, and daily struggles with mental health issues on social media platforms like Twitter. Unlike traditional observational cohort studies conducted through questionnaires and self-reported surveys, we explore the reliable detection of clinical depression from tweets obtained unobtrusively. Based on the analysis of tweets crawled from users with self-reported depressive symptoms in their Twitter profiles, we demonstrate the potential for detecting clinical depression symptoms which emulate the PHQ-9 questionnaire clinicians use today. Our study uses a semi-supervised statistical model to evaluate how the duration of these symptoms and their expression on Twitter (in terms of word usage patterns and topical preferences) align with the medical findings reported via the PHQ-9. Our proactive and automatic screening tool is able to identify clinical depressive symptoms with an accuracy of 68% and precision of 72%.Comment: 8 pages, Advances in Social Networks Analysis and Mining (ASONAM), 2017 IEEE/ACM International Conferenc

arXiv.org e-Print Archive

CORE