13,832 research outputs found
Neural Networks Compression for Language Modeling
In this paper, we consider several compression techniques for the language
modeling problem based on recurrent neural networks (RNNs). It is known that
conventional RNNs, e.g, LSTM-based networks in language modeling, are
characterized with either high space complexity or substantial inference time.
This problem is especially crucial for mobile applications, in which the
constant interaction with the remote server is inappropriate. By using the Penn
Treebank (PTB) dataset we compare pruning, quantization, low-rank
factorization, tensor train decomposition for LSTM networks in terms of model
size and suitability for fast inference.Comment: Keywords: LSTM, RNN, language modeling, low-rank factorization,
pruning, quantization. Published by Springer in the LNCS series, 7th
International Conference on Pattern Recognition and Machine Intelligence,
201
Compression of Recurrent Neural Networks for Efficient Language Modeling
Recurrent neural networks have proved to be an effective method for
statistical language modeling. However, in practice their memory and run-time
complexity are usually too large to be implemented in real-time offline mobile
applications. In this paper we consider several compression techniques for
recurrent neural networks including Long-Short Term Memory models. We make
particular attention to the high-dimensional output problem caused by the very
large vocabulary size. We focus on effective compression methods in the context
of their exploitation on devices: pruning, quantization, and matrix
decomposition approaches (low-rank factorization and tensor train
decomposition, in particular). For each model we investigate the trade-off
between its size, suitability for fast inference and perplexity. We propose a
general pipeline for applying the most suitable methods to compress recurrent
neural networks for language modeling. It has been shown in the experimental
study with the Penn Treebank (PTB) dataset that the most efficient results in
terms of speed and compression-perplexity balance are obtained by matrix
decomposition techniques.Comment: 25 pages, 3 tables, 4 figure
Tensor-Train Recurrent Neural Networks for Video Classification
The Recurrent Neural Networks and their variants have shown promising
performances in sequence modeling tasks such as Natural Language Processing.
These models, however, turn out to be impractical and difficult to train when
exposed to very high-dimensional inputs due to the large input-to-hidden weight
matrix. This may have prevented RNNs' large-scale application in tasks that
involve very high input dimensions such as video modeling; current approaches
reduce the input dimensions using various feature extractors. To address this
challenge, we propose a new, more general and efficient approach by factorizing
the input-to-hidden weight matrix using Tensor-Train decomposition which is
trained simultaneously with the weights themselves. We test our model on
classification tasks using multiple real-world video datasets and achieve
competitive performances with state-of-the-art models, even though our model
architecture is orders of magnitude less complex. We believe that the proposed
approach provides a novel and fundamental building block for modeling
high-dimensional sequential data with RNN architectures and opens up many
possibilities to transfer the expressive and advanced architectures from other
domains such as NLP to modeling high-dimensional sequential data
Statistical Machine Translation Features with Multitask Tensor Networks
We present a three-pronged approach to improving Statistical Machine
Translation (SMT), building on recent success in the application of neural
networks to SMT. First, we propose new features based on neural networks to
model various non-local translation phenomena. Second, we augment the
architecture of the neural network with tensor layers that capture important
higher-order interaction among the network units. Third, we apply multitask
learning to estimate the neural network parameters jointly. Each of our
proposed methods results in significant improvements that are complementary.
The overall improvement is +2.7 and +1.8 BLEU points for Arabic-English and
Chinese-English translation over a state-of-the-art system that already
includes neural network features.Comment: 11 pages (9 content + 2 references), 2 figures, accepted to ACL 2015
as a long pape
A Quantum Many-body Wave Function Inspired Language Modeling Approach
The recently proposed quantum language model (QLM) aimed at a principled
approach to modeling term dependency by applying the quantum probability
theory. The latest development for a more effective QLM has adopted word
embeddings as a kind of global dependency information and integrated the
quantum-inspired idea in a neural network architecture. While these
quantum-inspired LMs are theoretically more general and also practically
effective, they have two major limitations. First, they have not taken into
account the interaction among words with multiple meanings, which is common and
important in understanding natural language text. Second, the integration of
the quantum-inspired LM with the neural network was mainly for effective
training of parameters, yet lacking a theoretical foundation accounting for
such integration. To address these two issues, in this paper, we propose a
Quantum Many-body Wave Function (QMWF) inspired language modeling approach. The
QMWF inspired LM can adopt the tensor product to model the aforesaid
interaction among words. It also enables us to reveal the inherent necessity of
using Convolutional Neural Network (CNN) in QMWF language modeling.
Furthermore, our approach delivers a simple algorithm to represent and match
text/sentence pairs. Systematic evaluation shows the effectiveness of the
proposed QMWF-LM algorithm, in comparison with the state of the art
quantum-inspired LMs and a couple of CNN-based methods, on three typical
Question Answering (QA) datasets.Comment: 10 pages,4 figures,CIK
A Tensor Based Data Model for Polystore: An Application to Social Networks Data
In this article, we show how the mathematical object tensor can be used to
build a multi-paradigm model for the storage of social data in data warehouses.
From an architectural point of view, our approach allows to link different
storage systems (polystore) and limits the impact of ETL tools performing model
transformations required to feed different analysis algorithms. Therefore,
systems can take advantage of multiple data models both in terms of query
execution performance and the semantic expressiveness of data representation.
The proposed model allows to reach the logical independence between data and
programs implementing analysis algorithms. With a concrete case study on
message virality on Twitter during the French presidential election of 2017, we
highlight some of the contributions of our model
Tensor network language model
We propose a new statistical model suitable for machine learning of systems
with long distance correlations such as natural languages. The model is based
on directed acyclic graph decorated by multi-linear tensor maps in the vertices
and vector spaces in the edges, called tensor network. Such tensor networks
have been previously employed for effective numerical computation of the
renormalization group flow on the space of effective quantum field theories and
lattice models of statistical mechanics. We provide explicit algebro-geometric
analysis of the parameter moduli space for tree graphs, discuss model
properties and applications such as statistical translation.Comment: 21 page
Knowledge Graph Embeddings and Explainable AI
Knowledge graph embeddings are now a widely adopted approach to knowledge
representation in which entities and relationships are embedded in vector
spaces. In this chapter, we introduce the reader to the concept of knowledge
graph embeddings by explaining what they are, how they can be generated and how
they can be evaluated. We summarize the state-of-the-art in this field by
describing the approaches that have been introduced to represent knowledge in
the vector space. In relation to knowledge representation, we consider the
problem of explainability, and discuss models and methods for explaining
predictions obtained via knowledge graph embeddings.Comment: Federico Bianchi, Gaetano Rossiello, Luca Costabello, Matteo
Plamonari, Pasquale Minervini, Knowledge Graph Embeddings and Explainable AI.
In: Ilaria Tiddi, Freddy Lecue, Pascal Hitzler (eds.), Knowledge Graphs for
eXplainable AI -- Foundations, Applications and Challenges. Studies on the
Semantic Web, IOS Press, Amsterdam, 202
TensorFlow: A system for large-scale machine learning
TensorFlow is a machine learning system that operates at large scale and in
heterogeneous environments. TensorFlow uses dataflow graphs to represent
computation, shared state, and the operations that mutate that state. It maps
the nodes of a dataflow graph across many machines in a cluster, and within a
machine across multiple computational devices, including multicore CPUs,
general-purpose GPUs, and custom designed ASICs known as Tensor Processing
Units (TPUs). This architecture gives flexibility to the application developer:
whereas in previous "parameter server" designs the management of shared state
is built into the system, TensorFlow enables developers to experiment with
novel optimizations and training algorithms. TensorFlow supports a variety of
applications, with particularly strong support for training and inference on
deep neural networks. Several Google services use TensorFlow in production, we
have released it as an open-source project, and it has become widely used for
machine learning research. In this paper, we describe the TensorFlow dataflow
model in contrast to existing systems, and demonstrate the compelling
performance that TensorFlow achieves for several real-world applications.Comment: 18 pages, 9 figures; v2 has a spelling correction in the metadat
Coupled Recurrent Models for Polyphonic Music Composition
This paper introduces a novel recurrent model for music composition that is
tailored to the structure of polyphonic music. We propose an efficient new
conditional probabilistic factorization of musical scores, viewing a score as a
collection of concurrent, coupled sequences: i.e. voices. To model the
conditional distributions, we borrow ideas from both convolutional and
recurrent neural models; we argue that these ideas are natural for capturing
music's pitch invariances, temporal structure, and polyphony. We train models
for single-voice and multi-voice composition on 2,300 scores from the
KernScores dataset.Comment: 13 pages; long version of the paper appearing in ISMIR 201
- …