4 research outputs found

    Efficient Learning with Soft Label Information and Multiple Annotators

    Get PDF
    Nowadays, large real-world data sets are collected in science, engineering, health care and other fields. These data provide us with a great resource for building automated learning systems. However, for many machine learning applications, data need to be annotated (labelled) by human before they can be used for learning. Unfortunately, the annotation process by a human expert is often very time-consuming and costly. As the result, the amount of labeled training data instances to learn from may be limited, which in turn influences the learning process and the quality of learned models. In this thesis, we investigate ways of improving the learning process in supervised classification settings in which labels are provided by human annotators. First, we study and propose a new classification learning framework, that learns, in addition to binary class label information, also from soft-label information reflecting the certainty or belief in the class label. We propose multiple methods, based on regression, max-margin and ranking methodologies, that use the soft label information in order to learn better classifiers with smaller training data and hence smaller annotation effort. We also study our soft-label approach when examples to be labeled next are selected online using active learning. Second, we study ways of distributing the annotation effort among multiple experts. We develop a new multiple-annotator learning framework that explicitly models and embraces annotator differences and biases in order to learn a consensus and annotator specific models. We demonstrate the benefits and advantages of our frameworks on both UCI data sets and our real-world clinical data extracted from Electronic Health Records

    Efficiently and Effectively Learning Models of Similarity from Human Feedback

    Get PDF
    Vital to the success of many machine learning tasks is the ability to reason about how objects relate. For this, machine learning methods utilize a model of similarity that describes how objects are to be compared. While traditional methods commonly compare objects as feature vectors by standard measures such as the Euclidean distance or cosine similarity, other models of similarity can be used that include auxiliary information outside of that which is conveyed through features. To build such models, information must be given about object relationships that is beneficial to the task being considered. In many tasks, such as object recognition, ranking, product recommendation, and data visualization, a model based on human perception can lead to high performance. Other tasks require models that reflect certain domain expertise. In both cases, humans are able to provide information that can be used to build useful models of similarity. It is this reason that motivates similarity-learning methods that use human feedback to guide the construction of models of similarity. Associated with the task of learning similarity from human feedback are many practical challenges that must be considered. In this dissertation we explicitly define these challenges as being those of efficiency and effectiveness. Efficiency deals with both making the most of obtained feedback, as well as, reducing the computational run time of the learning algorithms themselves. Effectiveness concerns itself with producing models that accurately reflect the given feedback, but also with ensuring the queries posed to humans are those they can answer easily and without errors. After defining these challenges, we create novel learning methods that explicitly focus on one or more of these challenges as a means to improve on the state-of-the-art in similarity-learning. Specifically, we develop methods for learning models of perceptual similarity, as well as models that reflect domain expertise. In doing so, we enable similarity-learning methods to be practically applied in more real-world problem settings

    Learning classification with auxiliary probabilistic information

    No full text
    Abstract—Finding ways of incorporating auxiliary information or auxiliary data into the learning process has been the topic of active data mining and machine learning research in recent years. In this work we study and develop a new framework for classification learning problem in which, in addition to class labels, the learner is provided with an auxiliary (probabilistic) information that reflects how strong the expert feels about the class label. This approach can be extremely useful for many practical classification tasks that rely on subjective label assessment and where the cost of acquiring additional auxiliary information is negligible when compared to the cost of the example analysis and labelling. We develop classification algorithms capable of using the auxiliary information to make the learning process more efficient in terms of the sample complexity. We demonstrate the benefit of the approach on a number of synthetic and real world data sets by comparing it to the learning with class labels only. Keywords-classification learning; sample complexity; learning with auxiliary label information I
    corecore