Search CORE

402 research outputs found

Attention Is All You Need for Chinese Word Segmentation

Author: Duan Sufeng
Zhao Hai
Publication venue
Publication date: 06/10/2020
Field of study

Taking greedy decoding algorithm as it should be, this work focuses on further strengthening the model itself for Chinese word segmentation (CWS), which results in an even more fast and more accurate CWS model. Our model consists of an attention only stacked encoder and a light enough decoder for the greedy segmentation plus two highway connections for smoother training, in which the encoder is composed of a newly proposed Transformer variant, Gaussian-masked Directional (GD) Transformer, and a biaffine attention scorer. With the effective encoder design, our model only needs to take unigram features for scoring. Our model is evaluated on SIGHAN Bakeoff benchmark datasets. The experimental results show that with the highest segmentation speed, the proposed model achieves new state-of-the-art or comparable performance against strong baselines in terms of strict closed test setting.Comment: 11 pages, to appear in EMNLP 2020 as a long pape

arXiv.org e-Print Archive

The Foundations of Deep Learning with a Path Towards General Intelligence

Author: Özkural Eray
Publication venue
Publication date: 22/06/2018
Field of study

Like any field of empirical science, AI may be approached axiomatically. We formulate requirements for a general-purpose, human-level AI system in terms of postulates. We review the methodology of deep learning, examining the explicit and tacit assumptions in deep learning research. Deep Learning methodology seeks to overcome limitations in traditional machine learning research as it combines facets of model richness, generality, and practical applicability. The methodology so far has produced outstanding results due to a productive synergy of function approximation, under plausible assumptions of irreducibility and the efficiency of back-propagation family of algorithms. We examine these winning traits of deep learning, and also observe the various known failure modes of deep learning. We conclude by giving recommendations on how to extend deep learning methodology to cover the postulates of general-purpose AI including modularity, and cognitive architecture. We also relate deep learning to advances in theoretical neuroscience research.Comment: Submitted to AGI 201

arXiv.org e-Print Archive

Event Representation Learning Enhanced with External Commonsense Knowledge

Author: Ding Xiao
Duan Junwen
Li Zhongyang
Liao Kuo
Liu Ting
Publication venue
Publication date: 24/06/2020
Field of study

Prior work has proposed effective methods to learn event representations that can capture syntactic and semantic information over text corpus, demonstrating their effectiveness for downstream tasks such as script event prediction. On the other hand, events extracted from raw texts lacks of commonsense knowledge, such as the intents and emotions of the event participants, which are useful for distinguishing event pairs when there are only subtle differences in their surface realizations. To address this issue, this paper proposes to leverage external commonsense knowledge about the intent and sentiment of the event. Experiments on three event-related tasks, i.e., event similarity, script event prediction and stock market prediction, show that our model obtains much better event embeddings for the tasks, achieving 78% improvements on hard similarity task, yielding more precise inferences on subsequent events under given contexts, and better accuracies in predicting the volatilities of the stock market

arXiv.org e-Print Archive

Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering

Author: Hui Siu Cheung
Tay Yi
Tuan Luu Anh
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/11/2017
Field of study

The dominant neural architectures in question answer retrieval are based on recurrent or convolutional encoders configured with complex word matching layers. Given that recent architectural innovations are mostly new word interaction layers or attention-based matching mechanisms, it seems to be a well-established fact that these components are mandatory for good performance. Unfortunately, the memory and computation cost incurred by these complex mechanisms are undesirable for practical applications. As such, this paper tackles the question of whether it is possible to achieve competitive performance with simple neural architectures. We propose a simple but novel deep learning architecture for fast and efficient question-answer ranking and retrieval. More specifically, our proposed model, \textsc{HyperQA}, is a parameter efficient neural network that outperforms other parameter intensive models such as Attentive Pooling BiLSTMs and Multi-Perspective CNNs on multiple QA benchmarks. The novelty behind \textsc{HyperQA} is a pairwise ranking objective that models the relationship between question and answer embeddings in Hyperbolic space instead of Euclidean space. This empowers our model with a self-organizing ability and enables automatic discovery of latent hierarchies while learning embeddings of questions and answers. Our model requires no feature engineering, no similarity matrix matching, no complicated attention mechanisms nor over-parameterized layers and yet outperforms and remains competitive to many models that have these functionalities on multiple benchmarks.Comment: Accepted at WSDM 201

arXiv.org e-Print Archive

Utilizing FastText for Venue Recommendation

Author: Ozsoy Makbule Gulcin
Publication venue
Publication date: 14/05/2020
Field of study

Venue recommendation systems model the past interactions (i.e., check-ins) of the users and recommend venues. Traditional recommendation systems employ collaborative filtering, content-based filtering or matrix factorization. Recently, vector space embedding and deep learning algorithms are also used for recommendation. In this work, I propose a method for recommending top-k venues by utilizing the sequentiality feature of check-ins and a recent vector space embedding method, namely the FastText. Our proposed method; forms groups of check-ins, learns the vector space representations of the venues and utilizes the learned embeddings to make venue recommendations. I measure the performance of the proposed method using a Foursquare check-in dataset.The results show that the proposed method performs better than the state-of-the-art methods

arXiv.org e-Print Archive

An Auto-Encoder Matching Model for Learning Utterance-Level Semantic Dependency in Dialogue Generation

Author: Lin Junyang
Luo Liangchen
Sun Xu
Xu Jingjing
Zeng Qi
Publication venue
Publication date: 27/08/2018
Field of study

Generating semantically coherent responses is still a major challenge in dialogue generation. Different from conventional text generation tasks, the mapping between inputs and responses in conversations is more complicated, which highly demands the understanding of utterance-level semantic dependency, a relation between the whole meanings of inputs and outputs. To address this problem, we propose an Auto-Encoder Matching (AEM) model to learn such dependency. The model contains two auto-encoders and one mapping module. The auto-encoders learn the semantic representations of inputs and responses, and the mapping module learns to connect the utterance-level representations. Experimental results from automatic and human evaluations demonstrate that our model is capable of generating responses of high coherence and fluency compared to baseline models. The code is available at https://github.com/lancopku/AMMComment: Accepted by EMNLP 201

arXiv.org e-Print Archive

CNN-based Dual-Chain Models for Knowledge Graph Learning

Author: Min Renqiang
Ning Xia
Peng Bo
Publication venue
Publication date: 26/11/2019
Field of study

Knowledge graph learning plays a critical role in integrating domain specific knowledge bases when deploying machine learning and data mining models in practice. Existing methods on knowledge graph learning primarily focus on modeling the relations among entities as translations among the relations and entities, and many of these methods are not able to handle zero-shot problems, when new entities emerge. In this paper, we present a new convolutional neural network (CNN)-based dual-chain model. Different from translation based methods, in our model, interactions among relations and entities are directly captured via CNN over their embeddings. Moreover, a secondary chain of learning is conducted simultaneously to incorporate additional information and to enable better performance. We also present an extension of this model, which incorporates descriptions of entities and learns a second set of entity embeddings from the descriptions. As a result, the extended model is able to effectively handle zero-shot problems. We conducted comprehensive experiments, comparing our methods with 15 methods on 8 benchmark datasets. Extensive experimental results demonstrate that our proposed methods achieve or outperform the state-of-the-art results on knowledge graph learning, and outperform other methods on zero-shot problems. In addition, our methods applied to real-world biomedical data are able to produce results that conform to expert domain knowledge

arXiv.org e-Print Archive

Low Rank Regularization: A Review

Author: Hu Zhanxuan
Li Xuelong
Nie Feiping
Wang Rong
Publication venue
Publication date: 09/12/2020
Field of study

Low rank regularization, in essence, involves introducing a low rank or approximately low rank assumption for matrix we aim to learn, which has achieved great success in many fields including machine learning, data mining and computer version. Over the last decade, much progress has been made in theories and practical applications. Nevertheless, the intersection between them is very slight. In order to construct a bridge between practical applications and theoretical research, in this paper we provide a comprehensive survey for low rank regularization. We first review several traditional machine learning models using low rank regularization, and then show their (or their variants) applications in solving practical issues, such as non-rigid structure from motion and image denoising. Subsequently, we summarize the regularizers and optimization methods that achieve great success in traditional machine learning tasks but are rarely seen in solving practical issues. Finally, we provide a discussion and comparison for some representative regularizers including convex and non-convex relaxations. Extensive experimental results demonstrate that non-convex regularizers can provide a large advantage over the nuclear norm, the regularizer widely used in solving practical issues.Comment: 16 pages,4 figures,4 table

arXiv.org e-Print Archive

Predicting the Semantic Textual Similarity with Siamese CNN and LSTM

Author: Huet Stéphane
Linhares Andréa Carneiro
Pontes Elvys Linhares
Torres-Moreno Juan-Manuel
Publication venue
Publication date: 24/10/2018
Field of study

Semantic Textual Similarity (STS) is the basis of many applications in Natural Language Processing (NLP). Our system combines convolution and recurrent neural networks to measure the semantic similarity of sentences. It uses a convolution network to take account of the local context of words and an LSTM to consider the global context of sentences. This combination of networks helps to preserve the relevant information of sentences and improves the calculation of the similarity between sentences. Our model has achieved good results and is competitive with the best state-of-the-art systems

arXiv.org e-Print Archive

Science Driven Innovations Powering Mobile Product: Cloud AI vs. Device AI Solutions on Smart Device

Author: Kong Deguang
Publication venue
Publication date: 20/11/2017
Field of study

Recent years have witnessed the increasing popularity of mobile devices (such as iphone) due to the convenience that it brings to human lives. On one hand, rich user profiling and behavior data (including per-app level, app-interaction level and system-interaction level) from heterogeneous information sources make it possible to provide much better services (such as recommendation, advertisement targeting) to customers, which further drives revenue from understanding users' behaviors and improving user' engagement. In order to delight the customers, intelligent personal assistants (such as Amazon Alexa, Google Home and Google Now) are highly desirable to provide real-time audio, video and image recognition, natural language understanding, comfortable user interaction interface, satisfactory recommendation and effective advertisement targeting. This paper presents the research efforts we have conducted on mobile devices which aim to provide much smarter and more convenient services by leveraging statistics and big data science, machine learning and deep learning, user modeling and marketing techniques to bring in significant user growth and user engagement and satisfactions (and happiness) on mobile devices. The developed new features are built at either cloud side or device side, harmonically working together to enhance the current service with the purpose of increasing users' happiness. We illustrate how we design these new features from system and algorithm perspective using different case studies, through which one can easily understand how science driven innovations help to provide much better service in technology and bring more revenue liftup in business. In the meantime, these research efforts have clear scientific contributions and published in top venues, which are playing more and more important roles for mobile AI products

arXiv.org e-Print Archive