53,827 research outputs found

    On the discovery of continuous truth: a semi-supervised approach with partial ground truths

    Get PDF
    In many applications, the information regarding to the same object can be collected from multiple sources. However, these multi-source data are not reported consistently. In the light of this challenge, truth discovery is emerged to identify truth for each object from multi-source data. Most existing truth discovery methods assume that ground truths are completely unknown, and they focus on the exploration of unsupervised approaches to jointly estimate object truths and source reliabilities. However, in many real world applications, a set of ground truths could be partially available. In this paper, we propose a semi-supervised truth discovery framework to estimate continuous object truths. With the help of ground truths, even a small amount, the accuracy of truth discovery can be improved. We formulate the semi-supervised truth discovery problem as an optimization task where object truths and source reliabilities are modeled as variables. The ground truths are modeled as a regularization term and its contribution to the source weight estimation can be controlled by a parameter. The experiments show that the proposed method is more accurate and efficient than the existing truth discovery methods

    Weighted Semi-Supervised Approaches for Predictive Modeling and Truth Discovery

    Get PDF
    Multi-View Learning (MVL) is a framework which combines data from heteroge- neous sources in an efficient manner in which the different views learn from each other, thereby improving the overall prediction of the task. By not combining the data from different views together, we preserve the underlying statistical property of each view thereby learning from data in their original feature space. Additionally, MVL also mitigates the problem of high dimensionality when data from multiple sources are integrated. We have exploited this property of MVL to predict chemical-target and drug-disease associations. Every chemical or drug can be represented in diverse feature spaces that could be viewed as multiple views. Similarly multi-task learning (MTL) frameworks enables the joint learning of related tasks that improves the overall performances of the tasks than learning them individually. This factor allows us to learn related targets and related diseases together. An empirical study has been carried out to study the combined effects of multi-view multi-task learning (MVMTL) to pre- dict chemical-target interactions and drug-disease associations. The first half of the thesis focuses on two methods that closely resemble MVMTL. We first explain the weighted Multi-View learning (wMVL) framework that systemat- ically learns from heterogeneous data sources by weighting the views in terms of their predictive power. We extend the work to include multi-task learning and formulate the second method called Multi-Task with weighted Multi-View Learning (MTwMVL). The performance of these two methods have been evaluated by cheminformatics data sets. iiWe change gears for the second part of this thesis towards truth discovery (TD). Truth discovery closely resembles a multi-view setting but the two strongly differ in certain aspects. While the underlying assumption in multi-view learning is that the different views have label consistency, truth finding differs in its setup where the main objective is to find the true value of an object given that different sources might conflict with each other and claim different values for that object. The sources could be considered as views and the primary strategy in truth finding is to estimate the reliability of each source and its contribution to the truth. There are many methods that address various challenges and aspects of truth discovery and we have in this thesis looked at TD in a semi-supervised setting. As the third contribution to this dissertation, we adopt a semi-supervised truth dis- covery framework in which we consider the labeled objects and unlabeled objects as two closely related tasks with one task having strong labels while the other task hav- ing weak labels. We show that a small set of ground truth helps in achieving better accuracy than the unsupervised methods

    FSL-BM: Fuzzy Supervised Learning with Binary Meta-Feature for Classification

    Full text link
    This paper introduces a novel real-time Fuzzy Supervised Learning with Binary Meta-Feature (FSL-BM) for big data classification task. The study of real-time algorithms addresses several major concerns, which are namely: accuracy, memory consumption, and ability to stretch assumptions and time complexity. Attaining a fast computational model providing fuzzy logic and supervised learning is one of the main challenges in the machine learning. In this research paper, we present FSL-BM algorithm as an efficient solution of supervised learning with fuzzy logic processing using binary meta-feature representation using Hamming Distance and Hash function to relax assumptions. While many studies focused on reducing time complexity and increasing accuracy during the last decade, the novel contribution of this proposed solution comes through integration of Hamming Distance, Hash function, binary meta-features, binary classification to provide real time supervised method. Hash Tables (HT) component gives a fast access to existing indices; and therefore, the generation of new indices in a constant time complexity, which supersedes existing fuzzy supervised algorithms with better or comparable results. To summarize, the main contribution of this technique for real-time Fuzzy Supervised Learning is to represent hypothesis through binary input as meta-feature space and creating the Fuzzy Supervised Hash table to train and validate model.Comment: FICC201
    • …
    corecore