7,426 research outputs found
Dynamic Feature Acquisition Using Denoising Autoencoders
In real-world scenarios, different features have different acquisition costs
at test-time which necessitates cost-aware methods to optimize the cost and
performance trade-off. This paper introduces a novel and scalable approach for
cost-aware feature acquisition at test-time. The method incrementally asks for
features based on the available context that are known feature values. The
proposed method is based on sensitivity analysis in neural networks and density
estimation using denoising autoencoders with binary representation layers. In
the proposed architecture, a denoising autoencoder is used to handle unknown
features (i.e., features that are yet to be acquired), and the sensitivity of
predictions with respect to each unknown feature is used as a context-dependent
measure of informativeness. We evaluated the proposed method on eight different
real-world datasets as well as one synthesized dataset and compared its
performance with several other approaches in the literature. According to the
results, the suggested method is capable of efficiently acquiring features at
test-time in a cost- and context-aware fashion
Deep Embedding Forest: Forest-based Serving with Deep Embedding Features
Deep Neural Networks (DNN) have demonstrated superior ability to extract high
level embedding vectors from low level features. Despite the success, the
serving time is still the bottleneck due to expensive run-time computation of
multiple layers of dense matrices. GPGPU, FPGA, or ASIC-based serving systems
require additional hardware that are not in the mainstream design of most
commercial applications. In contrast, tree or forest-based models are widely
adopted because of low serving cost, but heavily depend on carefully engineered
features. This work proposes a Deep Embedding Forest model that benefits from
the best of both worlds. The model consists of a number of embedding layers and
a forest/tree layer. The former maps high dimensional (hundreds of thousands to
millions) and heterogeneous low-level features to the lower dimensional
(thousands) vectors, and the latter ensures fast serving.
Built on top of a representative DNN model called Deep Crossing, and two
forest/tree-based models including XGBoost and LightGBM, a two-step Deep
Embedding Forest algorithm is demonstrated to achieve on-par or slightly better
performance as compared with the DNN counterpart, with only a fraction of
serving time on conventional hardware. After comparing with a joint
optimization algorithm called partial fuzzification, also proposed in this
paper, it is concluded that the two-step Deep Embedding Forest has achieved
near optimal performance. Experiments based on large scale data sets (up to 1
billion samples) from a major sponsored search engine proves the efficacy of
the proposed model.Comment: 14 pages, 3 figures, 5 table
Structured Recommendation
Current recommender systems largely focus on static, unstructured content. In
many scenarios, we would like to recommend content that has structure, such as
a trajectory of points-of-interests in a city, or a playlist of songs. Dubbed
Structured Recommendation, this problem differs from the typical structured
prediction problem in that there are multiple correct answers for a given
input. Motivated by trajectory recommendation, we focus on sequential
structures but in contrast to classical Viterbi decoding we require that valid
predictions are sequences with no repeated elements. We propose an approach to
sequence recommendation based on the structured support vector machine. For
prediction, we modify the inference procedure to avoid predicting loops in the
sequence. For training, we modify the objective function to account for the
existence of multiple ground truths for a given input. We also modify the
loss-augmented inference procedure to exclude the known ground truths.
Experiments on real-world trajectory recommendation datasets show the benefits
of our approach over existing, non-structured recommendation approaches.Comment: 18 page
Controlling Search in Very large Commonsense Knowledge Bases: A Machine Learning Approach
Very large commonsense knowledge bases (KBs) often have thousands to millions
of axioms, of which relatively few are relevant for answering any given query.
A large number of irrelevant axioms can easily overwhelm resolution-based
theorem provers. Therefore, methods that help the reasoner identify useful
inference paths form an essential part of large-scale reasoning systems. In
this paper, we describe two ordering heuristics for optimization of reasoning
in such systems. First, we discuss how decision trees can be used to select
inference steps that are more likely to succeed. Second, we identify a small
set of problem instance features that suffice to guide searches away from
intractable regions of the search space. We show the efficacy of these
techniques via experiments on thousands of queries from the Cyc KB. Results
show that these methods lead to an order of magnitude reduction in inference
time.Comment: 6 page
Video Summarization with Attention-Based Encoder-Decoder Networks
This paper addresses the problem of supervised video summarization by
formulating it as a sequence-to-sequence learning problem, where the input is a
sequence of original video frames, the output is a keyshot sequence. Our key
idea is to learn a deep summarization network with attention mechanism to mimic
the way of selecting the keyshots of human. To this end, we propose a novel
video summarization framework named Attentive encoder-decoder networks for
Video Summarization (AVS), in which the encoder uses a Bidirectional Long
Short-Term Memory (BiLSTM) to encode the contextual information among the input
video frames. As for the decoder, two attention-based LSTM networks are
explored by using additive and multiplicative objective functions,
respectively. Extensive experiments are conducted on three video summarization
benchmark datasets, i.e., SumMe, and TVSum. The results demonstrate the
superiority of the proposed AVS-based approaches against the state-of-the-art
approaches,with remarkable improvements from 0.8% to 3% on two
datasets,respectively..Comment: 9 pages, 7 figure
Statistical inference for template-based protein structure prediction
Protein structure prediction is one of the most important problems in
computational biology. The most successful computational approach, also called
template-based modeling, identifies templates with solved crystal structures
for the query proteins and constructs three dimensional models based on
sequence/structure alignments. Although substantial effort has been made to
improve protein sequence alignment, the accuracy of alignments between
distantly related proteins is still unsatisfactory. In this thesis, I will
introduce a number of statistical machine learning methods to build accurate
alignments between a protein sequence and its template structures, especially
for proteins having only distantly related templates. For a protein with only
one good template, we develop a regression-tree based Conditional Random Fields
(CRF) model for pairwise protein sequence/structure alignment. By learning a
nonlinear threading scoring function, we are able to leverage the correlation
among different sequence and structural features. We also introduce an
information-theoretic measure to guide the learning algorithm to better exploit
the structural features for low-homology proteins with little evolutionary
information in their sequence profile. For a protein with multiple good
templates, we design a probabilistic consistency approach to thread the protein
to all templates simultaneously. By minimizing the discordance between the
pairwise alignments of the protein and templates, we are able to construct a
multiple sequence/structure alignment, which leads to better structure
predictions than any single-template based prediction
How Can We Know What Language Models Know?
Recent work has presented intriguing results examining the knowledge
contained in language models (LM) by having the LM fill in the blanks of
prompts such as "Obama is a _ by profession". These prompts are usually
manually created, and quite possibly sub-optimal; another prompt such as "Obama
worked as a _" may result in more accurately predicting the correct profession.
Because of this, given an inappropriate prompt, we might fail to retrieve facts
that the LM does know, and thus any given prompt only provides a lower bound
estimate of the knowledge contained in an LM. In this paper, we attempt to more
accurately estimate the knowledge contained in LMs by automatically discovering
better prompts to use in this querying process. Specifically, we propose
mining-based and paraphrasing-based methods to automatically generate
high-quality and diverse prompts, as well as ensemble methods to combine
answers from different prompts. Extensive experiments on the LAMA benchmark for
extracting relational knowledge from LMs demonstrate that our methods can
improve accuracy from 31.1% to 39.6%, providing a tighter lower bound on what
LMs know. We have released the code and the resulting LM Prompt And Query
Archive (LPAQA) at https://github.com/jzbjyb/LPAQA.Comment: TACL 202
Declarative Recursive Computation on an RDBMS, or, Why You Should Use a Database For Distributed Machine Learning
A number of popular systems, most notably Google's TensorFlow, have been
implemented from the ground up to support machine learning tasks. We consider
how to make a very small set of changes to a modern relational database
management system (RDBMS) to make it suitable for distributed learning
computations. Changes include adding better support for recursion, and
optimization and execution of very large compute plans. We also show that there
are key advantages to using an RDBMS as a machine learning platform. In
particular, learning based on a database management system allows for trivial
scaling to large data sets and especially large models, where different
computational units operate on different parts of a model that may be too large
to fit into RAM
Collective Entity Disambiguation with Structured Gradient Tree Boosting
We present a gradient-tree-boosting-based structured learning model for
jointly disambiguating named entities in a document. Gradient tree boosting is
a widely used machine learning algorithm that underlies many top-performing
natural language processing systems. Surprisingly, most works limit the use of
gradient tree boosting as a tool for regular classification or regression
problems, despite the structured nature of language. To the best of our
knowledge, our work is the first one that employs the structured gradient tree
boosting (SGTB) algorithm for collective entity disambiguation. By defining
global features over previous disambiguation decisions and jointly modeling
them with local features, our system is able to produce globally optimized
entity assignments for mentions in a document. Exact inference is prohibitively
expensive for our globally normalized model. To solve this problem, we propose
Bidirectional Beam Search with Gold path (BiBSG), an approximate inference
algorithm that is a variant of the standard beam search algorithm. BiBSG makes
use of global information from both past and future to perform better local
search. Experiments on standard benchmark datasets show that SGTB significantly
improves upon published results. Specifically, SGTB outperforms the previous
state-of-the-art neural system by near 1\% absolute accuracy on the popular
AIDA-CoNLL dataset.Comment: Accepted by NAACL 201
Towards Prediction Explainability through Sparse Communication
Explainability is a topic of growing importance in NLP. In this work, we
provide a unified perspective of explainability as a communication problem
between an explainer and a layperson about a classifier's decision. We use this
framework to compare several prior approaches for extracting explanations,
including gradient methods, representation erasure, and attention mechanisms,
in terms of their communication success. In addition, we reinterpret these
methods at the light of classical feature selection, and we use this as
inspiration to propose new embedded methods for explainability, through the use
of selective, sparse attention. Experiments in text classification, natural
language entailment, and machine translation, using different configurations of
explainers and laypeople (including both machines and humans), reveal an
advantage of attention-based explainers over gradient and erasure methods.
Furthermore, human evaluation experiments show promising results with post-hoc
explainers trained to optimize communication success and faithfulness
- …