18 research outputs found
The Bregman Variational Dual-Tree Framework
Graph-based methods provide a powerful tool set for many non-parametric
frameworks in Machine Learning. In general, the memory and computational
complexity of these methods is quadratic in the number of examples in the data
which makes them quickly infeasible for moderate to large scale datasets. A
significant effort to find more efficient solutions to the problem has been
made in the literature. One of the state-of-the-art methods that has been
recently introduced is the Variational Dual-Tree (VDT) framework. Despite some
of its unique features, VDT is currently restricted only to Euclidean spaces
where the Euclidean distance quantifies the similarity. In this paper, we
extend the VDT framework beyond the Euclidean distance to more general Bregman
divergences that include the Euclidean distance as a special case. By
exploiting the properties of the general Bregman divergence, we show how the
new framework can maintain all the pivotal features of the VDT framework and
yet significantly improve its performance in non-Euclidean domains. We apply
the proposed framework to different text categorization problems and
demonstrate its benefits over the original VDT.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty
in Artificial Intelligence (UAI2013
NON-PARAMETRIC GRAPH-BASED METHODS FOR LARGE SCALE PROBLEMS
The notion of similarity between observations plays a very fundamental role in many Machine Learning and Data Mining algorithms. In
many of these methods, the fundamental problem of prediction, which is making assessments and/or inferences about the future observations from the
past ones, boils down to how ``similar'' the future cases are to the already observed ones. However, similarity is not always
obtained through the traditional distance metrics. Data-driven similarity metrics, in particular, come into play where the traditional absolute
metrics are not sufficient for the task in hand due to special structure of the observed data. A common approach for computing data-driven similarity
is to somehow
aggregate the local absolute similarities (which are not data-driven and can be computed in a closed-from) to infer a global
data-driven similarity value between any pair of observations. The graph-based methods offer a natural framework to do so.
Incorporating these methods, many of the Machine Learning algorithms, that are
designed to work with absolute distances, can be applied on those problems with data-driven distances. This makes graph-based methods
very effective tools for many real-world problems.
In this thesis, the major problem that I want to address is the scalability of the graph-based methods. With the rise of large-scale,
high-dimensional datasets in many real-world applications, many Machine Learning algorithms do not scale up well applying to these problems.
The graph-based methods are no exception
either. Both the large number of observations and the high dimensionality hurt graph-based methods,
computationally and statistically. While the large number of observations imposes more of a computational problem, the high dimensionality
problem has more of a statistical nature. In this thesis, I address both of these issues in depth and review the common solutions for them proposed
in the literature. Moreover, for each of these problems, I propose novel solutions with experimental results depicting the merits of the proposed
algorithms. Finally, I discuss the contribution of the proposed work from a broader viewpoint and draw some future directions of the current work
Learning Nonlinear Loop Invariants with Gated Continuous Logic Networks (Extended Version)
Verifying real-world programs often requires inferring loop invariants with
nonlinear constraints. This is especially true in programs that perform many
numerical operations, such as control systems for avionics or industrial
plants. Recently, data-driven methods for loop invariant inference have shown
promise, especially on linear invariants. However, applying data-driven
inference to nonlinear loop invariants is challenging due to the large numbers
of and magnitudes of high-order terms, the potential for overfitting on a small
number of samples, and the large space of possible inequality bounds.
In this paper, we introduce a new neural architecture for general SMT
learning, the Gated Continuous Logic Network (G-CLN), and apply it to nonlinear
loop invariant learning. G-CLNs extend the Continuous Logic Network (CLN)
architecture with gating units and dropout, which allow the model to robustly
learn general invariants over large numbers of terms. To address overfitting
that arises from finite program sampling, we introduce fractional sampling---a
sound relaxation of loop semantics to continuous functions that facilitates
unbounded sampling on real domain. We additionally design a new CLN activation
function, the Piecewise Biased Quadratic Unit (PBQU), for naturally learning
tight inequality bounds.
We incorporate these methods into a nonlinear loop invariant inference system
that can learn general nonlinear loop invariants. We evaluate our system on a
benchmark of nonlinear loop invariants and show it solves 26 out of 27
problems, 3 more than prior work, with an average runtime of 53.3 seconds. We
further demonstrate the generic learning ability of G-CLNs by solving all 124
problems in the linear Code2Inv benchmark. We also perform a quantitative
stability evaluation and show G-CLNs have a convergence rate of on
quadratic problems, a improvement over CLN models
Machine Learning at Microsoft with ML .NET
Machine Learning is transitioning from an art and science into a technology
available to every developer. In the near future, every application on every
platform will incorporate trained models to encode data-based decisions that
would be impossible for developers to author. This presents a significant
engineering challenge, since currently data science and modeling are largely
decoupled from standard software development processes. This separation makes
incorporating machine learning capabilities inside applications unnecessarily
costly and difficult, and furthermore discourage developers from embracing ML
in first place. In this paper we present ML .NET, a framework developed at
Microsoft over the last decade in response to the challenge of making it easy
to ship machine learning models in large software applications. We present its
architecture, and illuminate the application demands that shaped it.
Specifically, we introduce DataView, the core data abstraction of ML .NET which
allows it to capture full predictive pipelines efficiently and consistently
across training and inference lifecycles. We close the paper with a
surprisingly favorable performance study of ML .NET compared to more recent
entrants, and a discussion of some lessons learned
Latent Variable Model for Learning in Pairwise Markov Networks
Pairwise Markov Networks (PMN) are an important class of Markov networks which, due to their simplicity, are widely used in many applications such as image analysis, bioinformatics, sensor networks, etc. However, learning of Markov networks from data is a challenging task; there are many possible structures one must consider and each of these structures comes with its own parameters making it easy to overfit the model with limited data. To deal with the problem, recent learning methods build upon the L1 regularization to express the bias towards sparse network structures. In this paper, we propose a new and more flexible framework that let us bias the structure, that can, for example, encode the preference to networks with certain local substructures which as a whole exhibit some special global structure. We experiment with and show the benefit of our framework on two types of problems: learning of modular networks and learning of traffic networks models