Search CORE

4 research outputs found

Efficient Learning with Soft Label Information and Multiple Annotators

Author: Nguyen Quang
Publication venue
Publication date: 29/05/2014
Field of study

Nowadays, large real-world data sets are collected in science, engineering, health care and other fields. These data provide us with a great resource for building automated learning systems. However, for many machine learning applications, data need to be annotated (labelled) by human before they can be used for learning. Unfortunately, the annotation process by a human expert is often very time-consuming and costly. As the result, the amount of labeled training data instances to learn from may be limited, which in turn influences the learning process and the quality of learned models. In this thesis, we investigate ways of improving the learning process in supervised classification settings in which labels are provided by human annotators. First, we study and propose a new classification learning framework, that learns, in addition to binary class label information, also from soft-label information reflecting the certainty or belief in the class label. We propose multiple methods, based on regression, max-margin and ranking methodologies, that use the soft label information in order to learn better classifiers with smaller training data and hence smaller annotation effort. We also study our soft-label approach when examples to be labeled next are selected online using active learning. Second, we study ways of distributing the annotation effort among multiple experts. We develop a new multiple-annotator learning framework that explicitly models and embraces annotator differences and biases in order to learn a consensus and annotator specific models. We demonstrate the benefits and advantages of our frameworks on both UCI data sets and our real-world clinical data extracted from Electronic Health Records

D-Scholarship@Pitt

Efficiently and Effectively Learning Models of Similarity from Human Feedback

Author: Heim Eric
Publication venue
Publication date: 19/01/2016
Field of study

Vital to the success of many machine learning tasks is the ability to reason about how objects relate. For this, machine learning methods utilize a model of similarity that describes how objects are to be compared. While traditional methods commonly compare objects as feature vectors by standard measures such as the Euclidean distance or cosine similarity, other models of similarity can be used that include auxiliary information outside of that which is conveyed through features. To build such models, information must be given about object relationships that is beneﬁcial to the task being considered. In many tasks, such as object recognition, ranking, product recommendation, and data visualization, a model based on human perception can lead to high performance. Other tasks require models that reﬂect certain domain expertise. In both cases, humans are able to provide information that can be used to build useful models of similarity. It is this reason that motivates similarity-learning methods that use human feedback to guide the construction of models of similarity. Associated with the task of learning similarity from human feedback are many practical challenges that must be considered. In this dissertation we explicitly deﬁne these challenges as being those of efﬁciency and effectiveness. Efﬁciency deals with both making the most of obtained feedback, as well as, reducing the computational run time of the learning algorithms themselves. Effectiveness concerns itself with producing models that accurately reﬂect the given feedback, but also with ensuring the queries posed to humans are those they can answer easily and without errors. After deﬁning these challenges, we create novel learning methods that explicitly focus on one or more of these challenges as a means to improve on the state-of-the-art in similarity-learning. Speciﬁcally, we develop methods for learning models of perceptual similarity, as well as models that reﬂect domain expertise. In doing so, we enable similarity-learning methods to be practically applied in more real-world problem settings

D-Scholarship@Pitt

Learning classification with auxiliary probabilistic information

Author: Hamed Valizadegan
Milos Hauskrecht
Quang Nguyen
Publication venue
Publication date: 06/05/2012
Field of study

Abstract—Finding ways of incorporating auxiliary information or auxiliary data into the learning process has been the topic of active data mining and machine learning research in recent years. In this work we study and develop a new framework for classification learning problem in which, in addition to class labels, the learner is provided with an auxiliary (probabilistic) information that reflects how strong the expert feels about the class label. This approach can be extremely useful for many practical classification tasks that rely on subjective label assessment and where the cost of acquiring additional auxiliary information is negligible when compared to the cost of the example analysis and labelling. We develop classification algorithms capable of using the auxiliary information to make the learning process more efficient in terms of the sample complexity. We demonstrate the benefit of the approach on a number of synthetic and real world data sets by comparing it to the learning with class labels only. Keywords-classification learning; sample complexity; learning with auxiliary label information I

CiteSeerX

Crossref