402 research outputs found
Attention Is All You Need for Chinese Word Segmentation
Taking greedy decoding algorithm as it should be, this work focuses on
further strengthening the model itself for Chinese word segmentation (CWS),
which results in an even more fast and more accurate CWS model. Our model
consists of an attention only stacked encoder and a light enough decoder for
the greedy segmentation plus two highway connections for smoother training, in
which the encoder is composed of a newly proposed Transformer variant,
Gaussian-masked Directional (GD) Transformer, and a biaffine attention scorer.
With the effective encoder design, our model only needs to take unigram
features for scoring. Our model is evaluated on SIGHAN Bakeoff benchmark
datasets. The experimental results show that with the highest segmentation
speed, the proposed model achieves new state-of-the-art or comparable
performance against strong baselines in terms of strict closed test setting.Comment: 11 pages, to appear in EMNLP 2020 as a long pape
The Foundations of Deep Learning with a Path Towards General Intelligence
Like any field of empirical science, AI may be approached axiomatically. We
formulate requirements for a general-purpose, human-level AI system in terms of
postulates. We review the methodology of deep learning, examining the explicit
and tacit assumptions in deep learning research. Deep Learning methodology
seeks to overcome limitations in traditional machine learning research as it
combines facets of model richness, generality, and practical applicability. The
methodology so far has produced outstanding results due to a productive synergy
of function approximation, under plausible assumptions of irreducibility and
the efficiency of back-propagation family of algorithms. We examine these
winning traits of deep learning, and also observe the various known failure
modes of deep learning. We conclude by giving recommendations on how to extend
deep learning methodology to cover the postulates of general-purpose AI
including modularity, and cognitive architecture. We also relate deep learning
to advances in theoretical neuroscience research.Comment: Submitted to AGI 201
Event Representation Learning Enhanced with External Commonsense Knowledge
Prior work has proposed effective methods to learn event representations that
can capture syntactic and semantic information over text corpus, demonstrating
their effectiveness for downstream tasks such as script event prediction. On
the other hand, events extracted from raw texts lacks of commonsense knowledge,
such as the intents and emotions of the event participants, which are useful
for distinguishing event pairs when there are only subtle differences in their
surface realizations. To address this issue, this paper proposes to leverage
external commonsense knowledge about the intent and sentiment of the event.
Experiments on three event-related tasks, i.e., event similarity, script event
prediction and stock market prediction, show that our model obtains much better
event embeddings for the tasks, achieving 78% improvements on hard similarity
task, yielding more precise inferences on subsequent events under given
contexts, and better accuracies in predicting the volatilities of the stock
market
Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering
The dominant neural architectures in question answer retrieval are based on
recurrent or convolutional encoders configured with complex word matching
layers. Given that recent architectural innovations are mostly new word
interaction layers or attention-based matching mechanisms, it seems to be a
well-established fact that these components are mandatory for good performance.
Unfortunately, the memory and computation cost incurred by these complex
mechanisms are undesirable for practical applications. As such, this paper
tackles the question of whether it is possible to achieve competitive
performance with simple neural architectures. We propose a simple but novel
deep learning architecture for fast and efficient question-answer ranking and
retrieval. More specifically, our proposed model, \textsc{HyperQA}, is a
parameter efficient neural network that outperforms other parameter intensive
models such as Attentive Pooling BiLSTMs and Multi-Perspective CNNs on multiple
QA benchmarks. The novelty behind \textsc{HyperQA} is a pairwise ranking
objective that models the relationship between question and answer embeddings
in Hyperbolic space instead of Euclidean space. This empowers our model with a
self-organizing ability and enables automatic discovery of latent hierarchies
while learning embeddings of questions and answers. Our model requires no
feature engineering, no similarity matrix matching, no complicated attention
mechanisms nor over-parameterized layers and yet outperforms and remains
competitive to many models that have these functionalities on multiple
benchmarks.Comment: Accepted at WSDM 201
Utilizing FastText for Venue Recommendation
Venue recommendation systems model the past interactions (i.e., check-ins) of
the users and recommend venues. Traditional recommendation systems employ
collaborative filtering, content-based filtering or matrix factorization.
Recently, vector space embedding and deep learning algorithms are also used for
recommendation. In this work, I propose a method for recommending top-k venues
by utilizing the sequentiality feature of check-ins and a recent vector space
embedding method, namely the FastText. Our proposed method; forms groups of
check-ins, learns the vector space representations of the venues and utilizes
the learned embeddings to make venue recommendations. I measure the performance
of the proposed method using a Foursquare check-in dataset.The results show
that the proposed method performs better than the state-of-the-art methods
An Auto-Encoder Matching Model for Learning Utterance-Level Semantic Dependency in Dialogue Generation
Generating semantically coherent responses is still a major challenge in
dialogue generation. Different from conventional text generation tasks, the
mapping between inputs and responses in conversations is more complicated,
which highly demands the understanding of utterance-level semantic dependency,
a relation between the whole meanings of inputs and outputs. To address this
problem, we propose an Auto-Encoder Matching (AEM) model to learn such
dependency. The model contains two auto-encoders and one mapping module. The
auto-encoders learn the semantic representations of inputs and responses, and
the mapping module learns to connect the utterance-level representations.
Experimental results from automatic and human evaluations demonstrate that our
model is capable of generating responses of high coherence and fluency compared
to baseline models. The code is available at https://github.com/lancopku/AMMComment: Accepted by EMNLP 201
CNN-based Dual-Chain Models for Knowledge Graph Learning
Knowledge graph learning plays a critical role in integrating domain specific
knowledge bases when deploying machine learning and data mining models in
practice. Existing methods on knowledge graph learning primarily focus on
modeling the relations among entities as translations among the relations and
entities, and many of these methods are not able to handle zero-shot problems,
when new entities emerge. In this paper, we present a new convolutional neural
network (CNN)-based dual-chain model. Different from translation based methods,
in our model, interactions among relations and entities are directly captured
via CNN over their embeddings. Moreover, a secondary chain of learning is
conducted simultaneously to incorporate additional information and to enable
better performance. We also present an extension of this model, which
incorporates descriptions of entities and learns a second set of entity
embeddings from the descriptions. As a result, the extended model is able to
effectively handle zero-shot problems. We conducted comprehensive experiments,
comparing our methods with 15 methods on 8 benchmark datasets. Extensive
experimental results demonstrate that our proposed methods achieve or
outperform the state-of-the-art results on knowledge graph learning, and
outperform other methods on zero-shot problems. In addition, our methods
applied to real-world biomedical data are able to produce results that conform
to expert domain knowledge
Low Rank Regularization: A Review
Low rank regularization, in essence, involves introducing a low rank or
approximately low rank assumption for matrix we aim to learn, which has
achieved great success in many fields including machine learning, data mining
and computer version. Over the last decade, much progress has been made in
theories and practical applications. Nevertheless, the intersection between
them is very slight. In order to construct a bridge between practical
applications and theoretical research, in this paper we provide a comprehensive
survey for low rank regularization. We first review several traditional machine
learning models using low rank regularization, and then show their (or their
variants) applications in solving practical issues, such as non-rigid structure
from motion and image denoising. Subsequently, we summarize the regularizers
and optimization methods that achieve great success in traditional machine
learning tasks but are rarely seen in solving practical issues. Finally, we
provide a discussion and comparison for some representative regularizers
including convex and non-convex relaxations. Extensive experimental results
demonstrate that non-convex regularizers can provide a large advantage over the
nuclear norm, the regularizer widely used in solving practical issues.Comment: 16 pages,4 figures,4 table
Predicting the Semantic Textual Similarity with Siamese CNN and LSTM
Semantic Textual Similarity (STS) is the basis of many applications in
Natural Language Processing (NLP). Our system combines convolution and
recurrent neural networks to measure the semantic similarity of sentences. It
uses a convolution network to take account of the local context of words and an
LSTM to consider the global context of sentences. This combination of networks
helps to preserve the relevant information of sentences and improves the
calculation of the similarity between sentences. Our model has achieved good
results and is competitive with the best state-of-the-art systems
Science Driven Innovations Powering Mobile Product: Cloud AI vs. Device AI Solutions on Smart Device
Recent years have witnessed the increasing popularity of mobile devices (such
as iphone) due to the convenience that it brings to human lives. On one hand,
rich user profiling and behavior data (including per-app level, app-interaction
level and system-interaction level) from heterogeneous information sources make
it possible to provide much better services (such as recommendation,
advertisement targeting) to customers, which further drives revenue from
understanding users' behaviors and improving user' engagement. In order to
delight the customers, intelligent personal assistants (such as Amazon Alexa,
Google Home and Google Now) are highly desirable to provide real-time audio,
video and image recognition, natural language understanding, comfortable user
interaction interface, satisfactory recommendation and effective advertisement
targeting.
This paper presents the research efforts we have conducted on mobile devices
which aim to provide much smarter and more convenient services by leveraging
statistics and big data science, machine learning and deep learning, user
modeling and marketing techniques to bring in significant user growth and user
engagement and satisfactions (and happiness) on mobile devices. The developed
new features are built at either cloud side or device side, harmonically
working together to enhance the current service with the purpose of increasing
users' happiness. We illustrate how we design these new features from system
and algorithm perspective using different case studies, through which one can
easily understand how science driven innovations help to provide much better
service in technology and bring more revenue liftup in business. In the
meantime, these research efforts have clear scientific contributions and
published in top venues, which are playing more and more important roles for
mobile AI products
- …