114 research outputs found
Overcoming uncertainty for within-network relational machine learning
People increasingly communicate through email and social networks to maintain friendships and conduct business, as well as share online content such as pictures, videos and products. Relational machine learning (RML) utilizes a set of observed attributes and network structure to predict corresponding labels for items; for example, to predict individuals engaged in securities fraud, we can utilize phone calls and workplace information to make joint predictions over the individuals. However, in large scale and partially observed network domains, missing labels and edges can significantly impact standard relational machine learning methods by introducing bias into the learning and inference processes. In this thesis, we identify the effects on parameter estimation, correct the biases, and model the uncertainty of the missing data to improve predictive performance. In particular, we investigate this issue on a variety of modeling scenarios and prediction problems.^ First, we introduce the Transitive Chung Lu random graph model for modeling the conditional distribution of edges given a partially observed network. This model fits within a class of scalable generative graph models with scalable sampling processes that we generalize to model distributions of networks with correlated attribute variables via Attributed Graph Models. Second, we utilize TCL to incorporate edge probabilities into relational learning and inference models for partially observed network domains. As part of this work, give a linear time algorithm to perform variational inference over a squared network. We apply the resulting semi-supervised model, Probabilistic Relational EM (PR-EM) to the Active Exploration domain to iteratively locate positive examples in partially observed networks. Due to the sampling process, this domain exhibits extreme bias for learning and inference: we show that PR-EM operates with high accuracy despite the difficult domain. Third, we investigate the performance applying Relational EM methods for semi-supervised relational learning in partially labeled networks and find that fixed point estimates have considerable approximation errors during learning and inference. To solve this, we propose the stochastic Relational Stochastic EM and Relational Data Augmentation methods for semi-supervised relational learning and demonstrate that these approaches improve over the Relational EM method. Fourth, we improve on existing semi-supervised learning methods by imposing hard constraints on the inference steps, allowing semi-supervised methods to learn using better approximations during learning and inference for partially labeled networks. In particular, we find that we can correct for the approximated parameter learning errors during the collective inference step by imposing a Maximum Entropy constraint. We find that this correction allows us to utilize a better approximation over the unlabeled data. In addition, we prove that given an allowable error, this method is only a constant overhead to the original collective inference method. Overall, all of the methods presented in this thesis have provable subquadratic runtimes. We demonstrate each on large scale networks, in some cases including networks with millions of vertices and/or edges. Across all these approaches, we show that incorporating the uncertainty into the modeling process improves modeling and predictive performance
Choosing between Auctions and Negotiations in Online B2B Markets for IT Services: The Effect of Prior Relationships and Performance
The choice of contract allocation mechanism in procurement affects such aspects of transactions as information exchange between buyer and supplier, supplier competition, pricing and, eventually, performance. In this study we investigate the buyer’s choice between reverse auctions and bilateral negotiations as an allocation mechanism for IT services contracts. Prior studies into allocation mechanism choice focused on factors pertaining to discrete exchange situation, such as con-tract complexity or availability of suppliers. We broaden the research by focusing on buyers’ past exchange relationships with vendors. Based on the literature on the economics of contracting and agency theory, we hypothesize that prior re-peat interaction with vendors favors the use of negotiations over auctions in the next transaction, while the need to explore the marketplace due to buyer’s inexperience or dissatisfaction with vendor’s performance in the most recent project leads to the use of auctions instead of negotiations. We find support for these hypotheses in a longitudinal dataset of 2,081 IT projects realized by 91 repeat buyers at a leading online services marketplace over a period of eight years. Taken together, the results show that analyzing B2B auctions and negotiations should move beyond analyzing discrete instances and instead analyze them in the context of the individual firm’s history and supplier strategy.outsourcing;IT services;online marketplace;reverse auctions
Recommended from our members
The analysis of social network data: an exciting frontier for statisticians
The catalyst for this paper is the recent interest in the relationship between social networks and an individual's health, which has arisen following a series of papers by Nicholas Christakis and James Fowler on person- to-person spread of health behaviors. In this issue, they provide a detailed explanation of their methods that offers insights, justifications, and responses to criticisms [1]. In this paper, we introduce some of the key statistical methods used in social network analysis and indicate where those used by Christakis and Fowler (CF) fit into the general framework. The intent is to provide the background necessary for readers to be able to make their own evaluation of the work by CF and understand the challenges of research involving social networks. We entertain possible solutions to some of the difficulties encountered in accounting for confounding effects in analyses of peer effects and provide comments on the contributions of CF
Scalable Text and Link Analysis with Mixed-Topic Link Models
Many data sets contain rich information about objects, as well as pairwise
relations between them. For instance, in networks of websites, scientific
papers, and other documents, each node has content consisting of a collection
of words, as well as hyperlinks or citations to other nodes. In order to
perform inference on such data sets, and make predictions and recommendations,
it is useful to have models that are able to capture the processes which
generate the text at each node and the links between them. In this paper, we
combine classic ideas in topic modeling with a variant of the mixed-membership
block model recently developed in the statistical physics community. The
resulting model has the advantage that its parameters, including the mixture of
topics of each document and the resulting overlapping communities, can be
inferred with a simple and scalable expectation-maximization algorithm. We test
our model on three data sets, performing unsupervised topic classification and
link prediction. For both tasks, our model outperforms several existing
state-of-the-art methods, achieving higher accuracy with significantly less
computation, analyzing a data set with 1.3 million words and 44 thousand links
in a few minutes.Comment: 11 pages, 4 figure
Predicting Semantic Relations using Global Graph Properties
Semantic graphs, such as WordNet, are resources which curate natural language
on two distinguishable layers. On the local level, individual relations between
synsets (semantic building blocks) such as hypernymy and meronymy enhance our
understanding of the words used to express their meanings. Globally, analysis
of graph-theoretic properties of the entire net sheds light on the structure of
human language as a whole. In this paper, we combine global and local
properties of semantic graphs through the framework of Max-Margin Markov Graph
Models (M3GM), a novel extension of Exponential Random Graph Model (ERGM) that
scales to large multi-relational graphs. We demonstrate how such global
modeling improves performance on the local task of predicting semantic
relations between synsets, yielding new state-of-the-art results on the WN18RR
dataset, a challenging version of WordNet link prediction in which "easy"
reciprocal cases are removed. In addition, the M3GM model identifies
multirelational motifs that are characteristic of well-formed lexical semantic
ontologies.Comment: EMNLP 201
- …