Search CORE

31,116 research outputs found

Node Classification in Uncertain Graphs

Author: Aggarwal Charu
Dallachiesa Michele
Palpanas Themis
Publication venue
Publication date: 01/01/2014
Field of study

In many real applications that use and analyze networked data, the links in the network graph may be erroneous, or derived from probabilistic techniques. In such cases, the node classification problem can be challenging, since the unreliability of the links may affect the final results of the classification process. If the information about link reliability is not used explicitly, the classification accuracy in the underlying network may be affected adversely. In this paper, we focus on situations that require the analysis of the uncertainty that is present in the graph structure. We study the novel problem of node classification in uncertain graphs, by treating uncertainty as a first-class citizen. We propose two techniques based on a Bayes model and automatic parameter selection, and show that the incorporation of uncertainty in the classification process as a first-class citizen is beneficial. We experimentally evaluate the proposed approach using different real data sets, and study the behavior of the algorithms under different conditions. The results demonstrate the effectiveness and efficiency of our approach

arXiv.org e-Print Archive

CiteSeerX

Crossref

Adversarial Data Programming: Using GANs to Relax the Bottleneck of Curated Labeled Data

Author: Balasubramanian Vineeth N
Pal Arghya
Publication venue
Publication date: 01/01/2018
Field of study

Paucity of large curated hand-labeled training data for every domain-of-interest forms a major bottleneck in the deployment of machine learning models in computer vision and other fields. Recent work (Data Programming) has shown how distant supervision signals in the form of labeling functions can be used to obtain labels for given data in near-constant time. In this work, we present Adversarial Data Programming (ADP), which presents an adversarial methodology to generate data as well as a curated aggregated label has given a set of weak labeling functions. We validated our method on the MNIST, Fashion MNIST, CIFAR 10 and SVHN datasets, and it outperformed many state-of-the-art models. We conducted extensive experiments to study its usefulness, as well as showed how the proposed ADP framework can be used for transfer learning as well as multi-task learning, where data from two domains are generated simultaneously using the framework along with the label information. Our future work will involve understanding the theoretical implications of this new framework from a game-theoretic perspective, as well as explore the performance of the method on more complex datasets.Comment: CVPR 2018 main conference pape

arXiv.org e-Print Archive

Crossref

Research Archive of Indian Institute of Technology Hyderabad

Measuring Possible Future Selves: Using Natural Language Processing for Automated Analysis of Posts about Life Concerns

Author: Gokbag Birkan
Publication venue: 'The Ohio State University Libraries'
Publication date: 01/05/2020
Field of study

Individuals have specific perceptions regarding their lives pertaining to how well they are doing in particular life domains, what their ideas are, and what to pursue in the future. These concepts are called possible future selves (PFS), a schema that contains the ideas of people, who they currently are, and who they wish to be in the future. The goal of this research project is to create a program to capture PFS using natural language processing. This program will allow automated analysis to measure people's perceptions and goals in a particular life domain and assess their view of the importance regarding their thoughts on each part of their PFS. The data used in this study were adopted from Kennard, Willis, Robinson, and Knobloch-Westerwick (2015) in which 214 women, aged between 21-35 years, viewed magazine portrayals of women in gender-congruent and gender-incongruent roles. The participants were prompted to write about their PFS with the questions: "Over the past 7 days, how much have you thought about your current life situation and your future? What were your thoughts? How much have you thought about your goals in life and your relationships? What were your thoughts?" The text PFS responses were then coded for mentions of different life domains and the emotions explicitly expressed from the text-data by human coders. Combinations of machine learning techniques were utilized to show the robustness of machine learning in predicting PFS. Long Short-Term Memory networks (LSTM), Convolutional Neural Networks (CNN), and decision trees were used in the ensemble learning of the machine learning model. Two different training and evaluation methods were used to find the most optimal machine learning approach in analyzing PFS. The machine learning approach was found successful in predicting PFS with high accuracy, labeling a person's concerns over PFS the same as human coders have done in The Allure of Aphrodite. While the models were inaccurate in spotting some measures, for example labeling a person's career concern in the present with around 60% accuracy, it was accurate finding a concern in a person's past romantic life with above 95% accuracy. Overall, the accuracy was found to be around 83% for life-domain concerns.Undergraduate Research Scholarship by the College of EngineeringNo embargoAcademic Major: Computer Science and Engineerin

KnowledgeBank at OSU

GOGGLES: Automatic Image Labeling with Affinity Coding

Author: Chaba Sanya
Chau Duen Horng
Chu Xu
Das Nilaksh
Gandhi Sakshi
Wu Renzhi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/03/2020
Field of study

Generating large labeled training data is becoming the biggest bottleneck in building and deploying supervised machine learning models. Recently, the data programming paradigm has been proposed to reduce the human cost in labeling training data. However, data programming relies on designing labeling functions which still requires significant domain expertise. Also, it is prohibitively difficult to write labeling functions for image datasets as it is hard to express domain knowledge using raw features for images (pixels). We propose affinity coding, a new domain-agnostic paradigm for automated training data labeling. The core premise of affinity coding is that the affinity scores of instance pairs belonging to the same class on average should be higher than those of pairs belonging to different classes, according to some affinity functions. We build the GOGGLES system that implements affinity coding for labeling image datasets by designing a novel set of reusable affinity functions for images, and propose a novel hierarchical generative model for class inference using a small development set. We compare GOGGLES with existing data programming systems on 5 image labeling tasks from diverse domains. GOGGLES achieves labeling accuracies ranging from a minimum of 71% to a maximum of 98% without requiring any extensive human annotation. In terms of end-to-end performance, GOGGLES outperforms the state-of-the-art data programming system Snuba by 21% and a state-of-the-art few-shot learning technique by 5%, and is only 7% away from the fully supervised upper bound.Comment: Published at 2020 ACM SIGMOD International Conference on Management of Dat

arXiv.org e-Print Archive

Crossref

Transforming Graph Representations for Statistical Relational Learning

Author: Aha David W.
McDowell Luke K.
Neville Jennifer
Rossi Ryan A.
Publication venue
Publication date: 01/01/2012
Field of study

Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of statistical relational learning (SRL) algorithms to these domains. In this article, we examine a range of representation issues for graph-based relational data. Since the choice of relational data representation for the nodes, links, and features can dramatically affect the capabilities of SRL algorithms, we survey approaches and opportunities for relational representation transformation designed to improve the performance of these algorithms. This leads us to introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. In particular, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey and compare competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed

arXiv.org e-Print Archive

CiteSeerX

Cross-Modal Data Programming Enables Rapid Medical Machine Learning

Author: Dunnmon Jared
Goldman Roger
Khandwala Nishith
Lee-Messer Christopher
Lungren Matthew
Markert Matthew
Ratner Alexander
Rubin Daniel
Ré Christopher
Saab Khaled
Sagreiya Hersh
Publication venue
Publication date: 26/03/2019
Field of study

Labeling training datasets has become a key barrier to building medical machine learning models. One strategy is to generate training labels programmatically, for example by applying natural language processing pipelines to text reports associated with imaging studies. We propose cross-modal data programming, which generalizes this intuitive strategy in a theoretically-grounded way that enables simpler, clinician-driven input, reduces required labeling time, and improves with additional unlabeled data. In this approach, clinicians generate training labels for models defined over a target modality (e.g. images or time series) by writing rules over an auxiliary modality (e.g. text reports). The resulting technical challenge consists of estimating the accuracies and correlations of these rules; we extend a recent unsupervised generative modeling technique to handle this cross-modal setting in a provably consistent way. Across four applications in radiography, computed tomography, and electroencephalography, and using only several hours of clinician time, our approach matches or exceeds the efficacy of physician-months of hand-labeling with statistical significance, demonstrating a fundamentally faster and more flexible way of building machine learning models in medicine

arXiv.org e-Print Archive

eScholarship - University of California