27 research outputs found
Variational Bayes via Propositionalization
We propose a unified approach to VB (variational Bayes) in
symbolic-statistical modeling via propositionalization.
By propositionalization we mean, broadly, expressing and
computing probabilistic models such as BNs (Bayesian
networks) and PCFGs (probabilistic context free grammars)
in terms of propositional logic that considers
propositional variables as binary random variables.
Our proposal is motivated by three observations. The
first one is that PPC (propostionalized probability
computation), i.e. probability computation formalized in
a propositional setting, has turned out to be general and
efficient when variable values are sparsely
interdependent. Examples include (discrete) BNs, PCFGs
and more generally PRISM which is a Turing complete logic
programming language with EM learning ability we have been
developing, and computes probabilities using graphically
represented AND/OR boolean formulas. Efficiency of PPC is
classically testified by the Inside-Outside algorithm in
the case of PCFGs and by recent PPC approaches in the case
of BNs such as the one by Darwiche et al. that exploits
probability and CSI (context specific independence).
Dechter et al. also revealed that PPC is a general
computation scheme for BNs by their formulation of AND/OR
search spaces.
Second of all, while VB has been around for sometime as a
practically effective approach to Bayesian modeling, it\u27s
use is still somewhat restricted to simple models such as
BNs and HMMs (hidden Markov models) though its usefulness
is established through a variety of applications from
model selection to prediction. On the other hand it is
already proved that VB can be extended to PCFGs and is
efficiently implementable using dynamic programming. Note
that PCFGs are just one class of PPC and much more general
PPC is realized by PRISM. Accordingly if VB is extened to
PRISM\u27s PPC, we will obtain VB for general probabilistic
models, far wider than BNs and PCFGs.
The last observation is that once VB becomes available in
PRISM, it saves us a lot of time and energy. First we do
not have to derive a new VB algorithm from scratch for
each model and implement it. All we have to do is just to
write a probabilistic model at predicate level. The rest
of work will be carried out automatically in a unified
manner by the PRISM system as it happens in the case of EM
learning. Deriving and implementing a VB algorithm is a
tedious error-prone process, and ensuring its correctness
would be difficult beyond PCFGs without formal semantics.
PRISM augmented with VB will completely eliminate such
needs and make it easy to explore and test new Bayesian
models by helping the user cope with data sparseness and
avoid over-fitting
Learning Heterogeneous Similarity Measures for Hybrid-Recommendations in Meta-Mining
The notion of meta-mining has appeared recently and extends the traditional
meta-learning in two ways. First it does not learn meta-models that provide
support only for the learning algorithm selection task but ones that support
the whole data-mining process. In addition it abandons the so called black-box
approach to algorithm description followed in meta-learning. Now in addition to
the datasets, algorithms also have descriptors, workflows as well. For the
latter two these descriptions are semantic, describing properties of the
algorithms. With the availability of descriptors both for datasets and data
mining workflows the traditional modelling techniques followed in
meta-learning, typically based on classification and regression algorithms, are
no longer appropriate. Instead we are faced with a problem the nature of which
is much more similar to the problems that appear in recommendation systems. The
most important meta-mining requirements are that suggestions should use only
datasets and workflows descriptors and the cold-start problem, e.g. providing
workflow suggestions for new datasets.
In this paper we take a different view on the meta-mining modelling problem
and treat it as a recommender problem. In order to account for the meta-mining
specificities we derive a novel metric-based-learning recommender approach. Our
method learns two homogeneous metrics, one in the dataset and one in the
workflow space, and a heterogeneous one in the dataset-workflow space. All
learned metrics reflect similarities established from the dataset-workflow
preference matrix. We demonstrate our method on meta-mining over biological
(microarray datasets) problems. The application of our method is not limited to
the meta-mining problem, its formulations is general enough so that it can be
applied on problems with similar requirements
Modeling Complex Networks For (Electronic) Commerce
NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc
Learning classifiers from linked data
The emergence of many interlinked, physically distributed, and autonomously maintained linked data sources amounts to the rapid growth of Linked Open Data (LOD) cloud, which offers unprecedented opportunities for predictive modeling and knowledge discovery from such data. However existing machine learning approaches are limited in their applicability because it is neither desirable nor feasible to gather all of the data in a centralized location for analysis due to access, memory, bandwidth, or computational restrictions. In some applications additional schema such as subclass hierarchies may be available and exploited by the learner. Furthermore, in other applications, the attributes that are relevant for specific prediction tasks are not known a priori and hence need to be discovered by the algorithm. Against this background, we present a series of approaches that attempt to address such scenarios. First, we show how to learn Relational Bayesian Classifiers (RBCs) from a single but remote data store using statistical queries, and we extend to the setting where the attributes that are relevant for prediction are not known a priori, by selectively crawling the data store for attributes of interest. Next, we introduce an algorithm for learning classifiers from a remote data store enriched with subclass hierarchies. Our algorithm encodes the constraints specified in a subclass hierarchy using latent variables in a directed graphical model, and adopts the Variational Bayesian EM approach to efficiently learn parameters. In retrospect, we observe that in learning from linked data it is often useful to represent an instance as tuples of bags of attribute values. With this inspiration, we introduce, formulate, and present solutions for a novel type of learning problem which we call distributional instance classification. Finally, building up from the foundations, we consider the problem of learning predictive models from multiple interlinked data stores. We introduce a distributed learning framework, and identify three special cases of linked data fragmentation then describe effective strategies for learning predictive models in each case. Further, we consider a novel application of a matrix reconstruction technique from the field of Computerized Tomography to approximate the statistics needed by the learning algorithm from projections using count queries, thus dramatically reducing the amount of information transmitted from the remote data sources to the learner
Representation Learning for Words and Entities
This thesis presents new methods for unsupervised learning of distributed
representations of words and entities from text and knowledge bases. The first
algorithm presented in the thesis is a multi-view algorithm for learning
representations of words called Multiview Latent Semantic Analysis (MVLSA). By
incorporating up to 46 different types of co-occurrence statistics for the same
vocabulary of english words, I show that MVLSA outperforms other
state-of-the-art word embedding models. Next, I focus on learning entity
representations for search and recommendation and present the second method of
this thesis, Neural Variational Set Expansion (NVSE). NVSE is also an
unsupervised learning method, but it is based on the Variational Autoencoder
framework. Evaluations with human annotators show that NVSE can facilitate
better search and recommendation of information gathered from noisy, automatic
annotation of unstructured natural language corpora. Finally, I move from
unstructured data and focus on structured knowledge graphs. I present novel
approaches for learning embeddings of vertices and edges in a knowledge graph
that obey logical constraints.Comment: phd thesis, Machine Learning, Natural Language Processing,
Representation Learning, Knowledge Graphs, Entities, Word Embeddings, Entity
Embedding
Modeling Complex Networks For (Electronic) Commerce
NYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc
Classification in Networked Data: A Toolkit and a Univariate Case Study
This paper1 is about classifying entities that are interlinked with entities for which the class is
known. After surveying prior work, we present NetKit, a modular toolkit for classification in networked
data, and a case-study of its application to networked data used in prior machine learning
research. NetKit is based on a node-centric framework in which classifiers comprise a local classifier,
a relational classifier, and a collective inference procedure. Various existing node-centric
relational learning algorithms can be instantiated with appropriate choices for these components,
and new combinations of components realize new algorithms. The case study focuses on univariate
network classification, for which the only information used is the structure of class linkage in
the network (i.e., only links and some class labels). To our knowledge, no work previously has
evaluated systematically the power of class-linkage alone for classification in machine learning
benchmark data sets. The results demonstrate that very simple network-classification models perform
quite well—well enough that they should be used regularly as baseline classifiers for studies
of learning with networked data. The simplest method (which performs remarkably well) highlights
the close correspondence between several existing methods introduced for different purposes—that
is, Gaussian-field classifiers, Hopfield networks, and relational-neighbor classifiers. The case study
also shows that there are two sets of techniques that are preferable in different situations, namely
when few versus many labels are known initially. We also demonstrate that link selection plays an
important role similar to traditional feature selectionNYU, Stern School of Business, IOMS Department, Center for Digital Economy Researc
Representation Learning for Words and Entities
This thesis presents new methods for unsupervised learning of distributed representations of words and entities from text and knowledge bases. The first algorithm presented in the thesis is a multi-view algorithm for learning representations of words called Multiview LSA (MVLSA). Through experiments on close to 50 different views, I show that MVLSA outperforms other state-of-the-art word embedding models. After that, I focus on learning entity representations for search and recommendation and present the second algorithm of this thesis called Neural Variational Set Expansion (NVSE). NVSE is also an unsupervised learning method, but it is based on the Variational Autoencoder framework. Evaluations with human annotators show that NVSE can facilitate better search and recommendation of information gathered from noisy, automatic annotation of unstructured natural language corpora. Finally, I move from unstructured data and focus on structured knowledge graphs. Moreover, I present novel approaches for learning embeddings of vertices and edges in a knowledge graph that obey logical constraints