Search CORE

7,426 research outputs found

Dynamic Feature Acquisition Using Denoising Autoencoders

Author: Darabi Sajad
Kachuee Mohammad
Moatamed Babak
Sarrafzadeh Majid
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/11/2018
Field of study

In real-world scenarios, different features have different acquisition costs at test-time which necessitates cost-aware methods to optimize the cost and performance trade-off. This paper introduces a novel and scalable approach for cost-aware feature acquisition at test-time. The method incrementally asks for features based on the available context that are known feature values. The proposed method is based on sensitivity analysis in neural networks and density estimation using denoising autoencoders with binary representation layers. In the proposed architecture, a denoising autoencoder is used to handle unknown features (i.e., features that are yet to be acquired), and the sensitivity of predictions with respect to each unknown feature is used as a context-dependent measure of informativeness. We evaluated the proposed method on eight different real-world datasets as well as one synthesized dataset and compared its performance with several other approaches in the literature. According to the results, the suggested method is capable of efficiently acquiring features at test-time in a cost- and context-aware fashion

arXiv.org e-Print Archive

Deep Embedding Forest: Forest-based Serving with Deep Embedding Features

Author: Mao JC
Rahmanian Holakou
Shan Ying
Yu Dong
Zhang Yi
Zhu Jie
Publication venue
Publication date: 15/03/2017
Field of study

Deep Neural Networks (DNN) have demonstrated superior ability to extract high level embedding vectors from low level features. Despite the success, the serving time is still the bottleneck due to expensive run-time computation of multiple layers of dense matrices. GPGPU, FPGA, or ASIC-based serving systems require additional hardware that are not in the mainstream design of most commercial applications. In contrast, tree or forest-based models are widely adopted because of low serving cost, but heavily depend on carefully engineered features. This work proposes a Deep Embedding Forest model that benefits from the best of both worlds. The model consists of a number of embedding layers and a forest/tree layer. The former maps high dimensional (hundreds of thousands to millions) and heterogeneous low-level features to the lower dimensional (thousands) vectors, and the latter ensures fast serving. Built on top of a representative DNN model called Deep Crossing, and two forest/tree-based models including XGBoost and LightGBM, a two-step Deep Embedding Forest algorithm is demonstrated to achieve on-par or slightly better performance as compared with the DNN counterpart, with only a fraction of serving time on conventional hardware. After comparing with a joint optimization algorithm called partial fuzzification, also proposed in this paper, it is concluded that the two-step Deep Embedding Forest has achieved near optimal performance. Experiments based on large scale data sets (up to 1 billion samples) from a major sponsored search engine proves the efficacy of the proposed model.Comment: 14 pages, 3 figures, 5 table

arXiv.org e-Print Archive

Structured Recommendation

Author: Chen Dawei
Menon Aditya Krishna
Ong Cheng Soon
Xie Lexing
Publication venue
Publication date: 27/06/2017
Field of study

Current recommender systems largely focus on static, unstructured content. In many scenarios, we would like to recommend content that has structure, such as a trajectory of points-of-interests in a city, or a playlist of songs. Dubbed Structured Recommendation, this problem differs from the typical structured prediction problem in that there are multiple correct answers for a given input. Motivated by trajectory recommendation, we focus on sequential structures but in contrast to classical Viterbi decoding we require that valid predictions are sequences with no repeated elements. We propose an approach to sequence recommendation based on the structured support vector machine. For prediction, we modify the inference procedure to avoid predicting loops in the sequence. For training, we modify the objective function to account for the existence of multiple ground truths for a given input. We also modify the loss-augmented inference procedure to exclude the known ground truths. Experiments on real-world trajectory recommendation datasets show the benefits of our approach over existing, non-structured recommendation approaches.Comment: 18 page

arXiv.org e-Print Archive

Controlling Search in Very large Commonsense Knowledge Bases: A Machine Learning Approach

Author: Goolsbey Keith
Sharma Abhishek
Witbrock Michael
Publication venue
Publication date: 14/03/2016
Field of study

Very large commonsense knowledge bases (KBs) often have thousands to millions of axioms, of which relatively few are relevant for answering any given query. A large number of irrelevant axioms can easily overwhelm resolution-based theorem provers. Therefore, methods that help the reasoner identify useful inference paths form an essential part of large-scale reasoning systems. In this paper, we describe two ordering heuristics for optimization of reasoning in such systems. First, we discuss how decision trees can be used to select inference steps that are more likely to succeed. Second, we identify a small set of problem instance features that suffice to guide searches away from intractable regions of the search space. We show the efficacy of these techniques via experiments on thousands of queries from the Cyc KB. Results show that these methods lead to an order of magnitude reduction in inference time.Comment: 6 page

arXiv.org e-Print Archive

Video Summarization with Attention-Based Encoder-Decoder Networks

Author: Ji Zhong
Li Xuelong
Pang Yanwei
Xiong Kailin
Publication venue
Publication date: 15/04/2018
Field of study

This paper addresses the problem of supervised video summarization by formulating it as a sequence-to-sequence learning problem, where the input is a sequence of original video frames, the output is a keyshot sequence. Our key idea is to learn a deep summarization network with attention mechanism to mimic the way of selecting the keyshots of human. To this end, we propose a novel video summarization framework named Attentive encoder-decoder networks for Video Summarization (AVS), in which the encoder uses a Bidirectional Long Short-Term Memory (BiLSTM) to encode the contextual information among the input video frames. As for the decoder, two attention-based LSTM networks are explored by using additive and multiplicative objective functions, respectively. Extensive experiments are conducted on three video summarization benchmark datasets, i.e., SumMe, and TVSum. The results demonstrate the superiority of the proposed AVS-based approaches against the state-of-the-art approaches,with remarkable improvements from 0.8% to 3% on two datasets,respectively..Comment: 9 pages, 7 figure

arXiv.org e-Print Archive

Statistical inference for template-based protein structure prediction

Author: Peng Jian
Publication venue
Publication date: 19/06/2013
Field of study

Protein structure prediction is one of the most important problems in computational biology. The most successful computational approach, also called template-based modeling, identifies templates with solved crystal structures for the query proteins and constructs three dimensional models based on sequence/structure alignments. Although substantial effort has been made to improve protein sequence alignment, the accuracy of alignments between distantly related proteins is still unsatisfactory. In this thesis, I will introduce a number of statistical machine learning methods to build accurate alignments between a protein sequence and its template structures, especially for proteins having only distantly related templates. For a protein with only one good template, we develop a regression-tree based Conditional Random Fields (CRF) model for pairwise protein sequence/structure alignment. By learning a nonlinear threading scoring function, we are able to leverage the correlation among different sequence and structural features. We also introduce an information-theoretic measure to guide the learning algorithm to better exploit the structural features for low-homology proteins with little evolutionary information in their sequence profile. For a protein with multiple good templates, we design a probabilistic consistency approach to thread the protein to all templates simultaneously. By minimizing the discordance between the pairwise alignments of the protein and templates, we are able to construct a multiple sequence/structure alignment, which leads to better structure predictions than any single-template based prediction

arXiv.org e-Print Archive

How Can We Know What Language Models Know?

Author: Araki Jun
Jiang Zhengbao
Neubig Graham
Xu Frank F.
Publication venue
Publication date: 03/05/2020
Field of study

Recent work has presented intriguing results examining the knowledge contained in language models (LM) by having the LM fill in the blanks of prompts such as "Obama is a _ by profession". These prompts are usually manually created, and quite possibly sub-optimal; another prompt such as "Obama worked as a _" may result in more accurately predicting the correct profession. Because of this, given an inappropriate prompt, we might fail to retrieve facts that the LM does know, and thus any given prompt only provides a lower bound estimate of the knowledge contained in an LM. In this paper, we attempt to more accurately estimate the knowledge contained in LMs by automatically discovering better prompts to use in this querying process. Specifically, we propose mining-based and paraphrasing-based methods to automatically generate high-quality and diverse prompts, as well as ensemble methods to combine answers from different prompts. Extensive experiments on the LAMA benchmark for extracting relational knowledge from LMs demonstrate that our methods can improve accuracy from 31.1% to 39.6%, providing a tighter lower bound on what LMs know. We have released the code and the resulting LM Prompt And Query Archive (LPAQA) at https://github.com/jzbjyb/LPAQA.Comment: TACL 202

arXiv.org e-Print Archive

Declarative Recursive Computation on an RDBMS, or, Why You Should Use a Database For Distributed Machine Learning

Author: Cai Zhuhua
Gao Zekai J.
Jankov Dimitrije
Jermaine Chris
Luo Shangyu
Yuan Binhang
Zou Jia
Publication venue: 'VLDB Endowment'
Publication date: 24/04/2019
Field of study

A number of popular systems, most notably Google's TensorFlow, have been implemented from the ground up to support machine learning tasks. We consider how to make a very small set of changes to a modern relational database management system (RDBMS) to make it suitable for distributed learning computations. Changes include adding better support for recursion, and optimization and execution of very large compute plans. We also show that there are key advantages to using an RDBMS as a machine learning platform. In particular, learning based on a database management system allows for trivial scaling to large data sets and especially large models, where different computational units operate on different parts of a model that may be too large to fit into RAM

arXiv.org e-Print Archive

Collective Entity Disambiguation with Structured Gradient Tree Boosting

Author: Irsoy Ozan
Rahman Kazi Shefaet
Yang Yi
Publication venue
Publication date: 23/04/2018
Field of study

We present a gradient-tree-boosting-based structured learning model for jointly disambiguating named entities in a document. Gradient tree boosting is a widely used machine learning algorithm that underlies many top-performing natural language processing systems. Surprisingly, most works limit the use of gradient tree boosting as a tool for regular classification or regression problems, despite the structured nature of language. To the best of our knowledge, our work is the first one that employs the structured gradient tree boosting (SGTB) algorithm for collective entity disambiguation. By defining global features over previous disambiguation decisions and jointly modeling them with local features, our system is able to produce globally optimized entity assignments for mentions in a document. Exact inference is prohibitively expensive for our globally normalized model. To solve this problem, we propose Bidirectional Beam Search with Gold path (BiBSG), an approximate inference algorithm that is a variant of the standard beam search algorithm. BiBSG makes use of global information from both past and future to perform better local search. Experiments on standard benchmark datasets show that SGTB significantly improves upon published results. Specifically, SGTB outperforms the previous state-of-the-art neural system by near 1\% absolute accuracy on the popular AIDA-CoNLL dataset.Comment: Accepted by NAACL 201

arXiv.org e-Print Archive

Towards Prediction Explainability through Sparse Communication

Author: Martins André F. T.
Treviso Marcos V.
Publication venue
Publication date: 28/04/2020
Field of study

Explainability is a topic of growing importance in NLP. In this work, we provide a unified perspective of explainability as a communication problem between an explainer and a layperson about a classifier's decision. We use this framework to compare several prior approaches for extracting explanations, including gradient methods, representation erasure, and attention mechanisms, in terms of their communication success. In addition, we reinterpret these methods at the light of classical feature selection, and we use this as inspiration to propose new embedded methods for explainability, through the use of selective, sparse attention. Experiments in text classification, natural language entailment, and machine translation, using different configurations of explainers and laypeople (including both machines and humans), reveal an advantage of attention-based explainers over gradient and erasure methods. Furthermore, human evaluation experiments show promising results with post-hoc explainers trained to optimize communication success and faithfulness

arXiv.org e-Print Archive