31,116 research outputs found
Node Classification in Uncertain Graphs
In many real applications that use and analyze networked data, the links in
the network graph may be erroneous, or derived from probabilistic techniques.
In such cases, the node classification problem can be challenging, since the
unreliability of the links may affect the final results of the classification
process. If the information about link reliability is not used explicitly, the
classification accuracy in the underlying network may be affected adversely. In
this paper, we focus on situations that require the analysis of the uncertainty
that is present in the graph structure. We study the novel problem of node
classification in uncertain graphs, by treating uncertainty as a first-class
citizen. We propose two techniques based on a Bayes model and automatic
parameter selection, and show that the incorporation of uncertainty in the
classification process as a first-class citizen is beneficial. We
experimentally evaluate the proposed approach using different real data sets,
and study the behavior of the algorithms under different conditions. The
results demonstrate the effectiveness and efficiency of our approach
Adversarial Data Programming: Using GANs to Relax the Bottleneck of Curated Labeled Data
Paucity of large curated hand-labeled training data for every
domain-of-interest forms a major bottleneck in the deployment of machine
learning models in computer vision and other fields. Recent work (Data
Programming) has shown how distant supervision signals in the form of labeling
functions can be used to obtain labels for given data in near-constant time. In
this work, we present Adversarial Data Programming (ADP), which presents an
adversarial methodology to generate data as well as a curated aggregated label
has given a set of weak labeling functions. We validated our method on the
MNIST, Fashion MNIST, CIFAR 10 and SVHN datasets, and it outperformed many
state-of-the-art models. We conducted extensive experiments to study its
usefulness, as well as showed how the proposed ADP framework can be used for
transfer learning as well as multi-task learning, where data from two domains
are generated simultaneously using the framework along with the label
information. Our future work will involve understanding the theoretical
implications of this new framework from a game-theoretic perspective, as well
as explore the performance of the method on more complex datasets.Comment: CVPR 2018 main conference pape
Measuring Possible Future Selves: Using Natural Language Processing for Automated Analysis of Posts about Life Concerns
Individuals have specific perceptions regarding their lives pertaining to how well they are doing in particular life domains, what their ideas are, and what to pursue in the future. These concepts are called possible future selves (PFS), a schema that contains the ideas of people, who they currently are, and who they wish to be in the future. The goal of this research project is to create a program to capture PFS using natural language processing. This program will allow automated analysis to measure people's perceptions and goals in a particular life domain and assess their view of the importance regarding their thoughts on each part of their PFS.
The data used in this study were adopted from Kennard, Willis, Robinson, and Knobloch-Westerwick (2015) in which 214 women, aged between 21-35 years, viewed magazine portrayals of women in gender-congruent and gender-incongruent roles. The participants were prompted to write about their PFS with the questions: "Over the past 7 days, how much have you thought about your current life situation and your future? What were your thoughts? How much have you thought about your goals in life and your relationships? What were your thoughts?" The text PFS responses were then coded for mentions of different life domains and the emotions explicitly expressed from the text-data by human coders.
Combinations of machine learning techniques were utilized to show the robustness of machine learning in predicting PFS. Long Short-Term Memory networks (LSTM), Convolutional Neural Networks (CNN), and decision trees were used in the ensemble learning of the machine learning model. Two different training and evaluation methods were used to find the most optimal machine learning approach in analyzing PFS.
The machine learning approach was found successful in predicting PFS with high accuracy, labeling a person's concerns over PFS the same as human coders have done in The Allure of Aphrodite. While the models were inaccurate in spotting some measures, for example labeling a person's career concern in the present with around 60% accuracy, it was accurate finding a concern in a person's past romantic life with above 95% accuracy. Overall, the accuracy was found to be around 83% for life-domain concerns.Undergraduate Research Scholarship by the College of EngineeringNo embargoAcademic Major: Computer Science and Engineerin
GOGGLES: Automatic Image Labeling with Affinity Coding
Generating large labeled training data is becoming the biggest bottleneck in
building and deploying supervised machine learning models. Recently, the data
programming paradigm has been proposed to reduce the human cost in labeling
training data. However, data programming relies on designing labeling functions
which still requires significant domain expertise. Also, it is prohibitively
difficult to write labeling functions for image datasets as it is hard to
express domain knowledge using raw features for images (pixels).
We propose affinity coding, a new domain-agnostic paradigm for automated
training data labeling. The core premise of affinity coding is that the
affinity scores of instance pairs belonging to the same class on average should
be higher than those of pairs belonging to different classes, according to some
affinity functions. We build the GOGGLES system that implements affinity coding
for labeling image datasets by designing a novel set of reusable affinity
functions for images, and propose a novel hierarchical generative model for
class inference using a small development set.
We compare GOGGLES with existing data programming systems on 5 image labeling
tasks from diverse domains. GOGGLES achieves labeling accuracies ranging from a
minimum of 71% to a maximum of 98% without requiring any extensive human
annotation. In terms of end-to-end performance, GOGGLES outperforms the
state-of-the-art data programming system Snuba by 21% and a state-of-the-art
few-shot learning technique by 5%, and is only 7% away from the fully
supervised upper bound.Comment: Published at 2020 ACM SIGMOD International Conference on Management
of Dat
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
Cross-Modal Data Programming Enables Rapid Medical Machine Learning
Labeling training datasets has become a key barrier to building medical
machine learning models. One strategy is to generate training labels
programmatically, for example by applying natural language processing pipelines
to text reports associated with imaging studies. We propose cross-modal data
programming, which generalizes this intuitive strategy in a
theoretically-grounded way that enables simpler, clinician-driven input,
reduces required labeling time, and improves with additional unlabeled data. In
this approach, clinicians generate training labels for models defined over a
target modality (e.g. images or time series) by writing rules over an auxiliary
modality (e.g. text reports). The resulting technical challenge consists of
estimating the accuracies and correlations of these rules; we extend a recent
unsupervised generative modeling technique to handle this cross-modal setting
in a provably consistent way. Across four applications in radiography, computed
tomography, and electroencephalography, and using only several hours of
clinician time, our approach matches or exceeds the efficacy of
physician-months of hand-labeling with statistical significance, demonstrating
a fundamentally faster and more flexible way of building machine learning
models in medicine
- …