20,085 research outputs found
Challenges in Disentangling Independent Factors of Variation
We study the problem of building models that disentangle independent factors
of variation. Such models could be used to encode features that can efficiently
be used for classification and to transfer attributes between different images
in image synthesis. As data we use a weakly labeled training set. Our weak
labels indicate what single factor has changed between two data samples,
although the relative value of the change is unknown. This labeling is of
particular interest as it may be readily available without annotation costs. To
make use of weak labels we introduce an autoencoder model and train it through
constraints on image pairs and triplets. We formally prove that without
additional knowledge there is no guarantee that two images with the same factor
of variation will be mapped to the same feature. We call this issue the
reference ambiguity. Moreover, we show the role of the feature dimensionality
and adversarial training. We demonstrate experimentally that the proposed model
can successfully transfer attributes on several datasets, but show also cases
when the reference ambiguity occurs.Comment: Submitted to ICLR 201
Explainable Text Classification in Legal Document Review A Case Study of Explainable Predictive Coding
In today's legal environment, lawsuits and regulatory investigations require
companies to embark upon increasingly intensive data-focused engagements to
identify, collect and analyze large quantities of data. When documents are
staged for review the process can require companies to dedicate an
extraordinary level of resources, both with respect to human resources, but
also with respect to the use of technology-based techniques to intelligently
sift through data. For several years, attorneys have been using a variety of
tools to conduct this exercise, and most recently, they are accepting the use
of machine learning techniques like text classification to efficiently cull
massive volumes of data to identify responsive documents for use in these
matters. In recent years, a group of AI and Machine Learning researchers have
been actively researching Explainable AI. In an explainable AI system, actions
or decisions are human understandable. In typical legal `document review'
scenarios, a document can be identified as responsive, as long as one or more
of the text snippets in a document are deemed responsive. In these scenarios,
if predictive coding can be used to locate these responsive snippets, then
attorneys could easily evaluate the model's document classification decision.
When deployed with defined and explainable results, predictive coding can
drastically enhance the overall quality and speed of the document review
process by reducing the time it takes to review documents. The authors of this
paper propose the concept of explainable predictive coding and simple
explainable predictive coding methods to locate responsive snippets within
responsive documents. We also report our preliminary experimental results using
the data from an actual legal matter that entailed this type of document
review.Comment: 2018 IEEE International Conference on Big Dat
Visualizing and Understanding Sum-Product Networks
Sum-Product Networks (SPNs) are recently introduced deep tractable
probabilistic models by which several kinds of inference queries can be
answered exactly and in a tractable time. Up to now, they have been largely
used as black box density estimators, assessed only by comparing their
likelihood scores only. In this paper we explore and exploit the inner
representations learned by SPNs. We do this with a threefold aim: first we want
to get a better understanding of the inner workings of SPNs; secondly, we seek
additional ways to evaluate one SPN model and compare it against other
probabilistic models, providing diagnostic tools to practitioners; lastly, we
want to empirically evaluate how good and meaningful the extracted
representations are, as in a classic Representation Learning framework. In
order to do so we revise their interpretation as deep neural networks and we
propose to exploit several visualization techniques on their node activations
and network outputs under different types of inference queries. To investigate
these models as feature extractors, we plug some SPNs, learned in a greedy
unsupervised fashion on image datasets, in supervised classification learning
tasks. We extract several embedding types from node activations by filtering
nodes by their type, by their associated feature abstraction level and by their
scope. In a thorough empirical comparison we prove them to be competitive
against those generated from popular feature extractors as Restricted Boltzmann
Machines. Finally, we investigate embeddings generated from random
probabilistic marginal queries as means to compare other tractable
probabilistic models on a common ground, extending our experiments to Mixtures
of Trees.Comment: Machine Learning Journal paper (First Online), 24 page
Tensor Contraction Layers for Parsimonious Deep Nets
Tensors offer a natural representation for many kinds of data frequently
encountered in machine learning. Images, for example, are naturally represented
as third order tensors, where the modes correspond to height, width, and
channels. Tensor methods are noted for their ability to discover
multi-dimensional dependencies, and tensor decompositions in particular, have
been used to produce compact low-rank approximations of data. In this paper, we
explore the use of tensor contractions as neural network layers and investigate
several ways to apply them to activation tensors. Specifically, we propose the
Tensor Contraction Layer (TCL), the first attempt to incorporate tensor
contractions as end-to-end trainable neural network layers. Applied to existing
networks, TCLs reduce the dimensionality of the activation tensors and thus the
number of model parameters. We evaluate the TCL on the task of image
recognition, augmenting two popular networks (AlexNet, VGG). The resulting
models are trainable end-to-end. Applying the TCL to the task of image
recognition, using the CIFAR100 and ImageNet datasets, we evaluate the effect
of parameter reduction via tensor contraction on performance. We demonstrate
significant model compression without significant impact on the accuracy and,
in some cases, improved performance
- …