Search CORE

115 research outputs found

Truly Multi-modal YouTube-8M Video Classification with Video, Audio, and Text

Author: Chandrasekhar Vijay
Chen Nancy
Cheung Ngai Man
D'Haro Luis Fernando
Fang Yuan
Kim Seokhwan
Kuan Kingsley
Lin Jie
Manek Gaurav
Piliouras Georgios
Ravaut Mathieu
Song Sibo
Tuan Luu Anh
Wang Zhe
Zeng Zeng
Zhu Hongyuan
Publication venue
Publication date: 09/07/2017
Field of study

The YouTube-8M video classification challenge requires teams to classify 0.7 million videos into one or more of 4,716 classes. In this Kaggle competition, we placed in the top 3% out of 650 participants using released video and audio features. Beyond that, we extend the original competition by including text information in the classification, making this a truly multi-modal approach with vision, audio and text. The newly introduced text data is termed as YouTube-8M-Text. We present a classification framework for the joint use of text, visual and audio features, and conduct an extensive set of experiments to quantify the benefit that this additional mode brings. The inclusion of text yields state-of-the-art results, e.g. 86.7% GAP on the YouTube-8M-Text validation dataset.Comment: 8 pages, Accepted to CVPR'17 Workshop on YouTube-8M Large-Scale Video Understandin

arXiv.org e-Print Archive

Job Prediction: From Deep Neural Network Models to Applications

Author: Nguyen Anh Gia-Tuan
Nguyen Ngan Luu-Thuy
Van Huynh Tin
Van Nguyen Kiet
Publication venue
Publication date: 31/01/2020
Field of study

Determining the job is suitable for a student or a person looking for work based on their job's descriptions such as knowledge and skills that are difficult, as well as how employers must find ways to choose the candidates that match the job they require. In this paper, we focus on studying the job prediction using different deep neural network models including TextCNN, Bi-GRU-LSTM-CNN, and Bi-GRU-CNN with various pre-trained word embeddings on the IT Job dataset. In addition, we also proposed a simple and effective ensemble model combining different deep neural network models. The experimental results illustrated that our proposed ensemble model achieved the highest result with an F1 score of 72.71%. Moreover, we analyze these experimental results to have insights about this problem to find better solutions in the future.Comment: Accepted by IEEE RIVF 2020 Conferenc

arXiv.org e-Print Archive

DeepWear: Adaptive Local Offloading for On-Wearable Deep Learning

Author: Huang Feifan
Liu Xuanzhe
Pushp Saumay
Qian Feng
Xu Mengwei
Zhu Mengze
Publication venue
Publication date: 30/03/2020
Field of study

Due to their on-body and ubiquitous nature, wearables can generate a wide range of unique sensor data creating countless opportunities for deep learning tasks. We propose DeepWear, a deep learning (DL) framework for wearable devices to improve the performance and reduce the energy footprint. DeepWear strategically offloads DL tasks from a wearable device to its paired handheld device through local network. Compared to the remote-cloud-based offloading, DeepWear requires no Internet connectivity, consumes less energy, and is robust to privacy breach. DeepWear provides various novel techniques such as context-aware offloading, strategic model partition, and pipelining support to efficiently utilize the processing capacity from nearby paired handhelds. Deployed as a user-space library, DeepWear offers developer-friendly APIs that are as simple as those in traditional DL libraries such as TensorFlow. We have implemented DeepWear on the Android OS and evaluated it on COTS smartphones and smartwatches with real DL models. DeepWear brings up to 5.08X and 23.0X execution speedup, as well as 53.5% and 85.5% energy saving compared to wearable-only and handheld-only strategies, respectively

arXiv.org e-Print Archive

Fine-Tuning BERT for Sentiment Analysis of Vietnamese Reviews

Author: Luong Ngoc Hoang
Ngo Quoc Hung
Nguyen Quoc Thai
Nguyen Thoai Linh
Publication venue
Publication date: 20/11/2020
Field of study

Sentiment analysis is an important task in the field ofNature Language Processing (NLP), in which users' feedbackdata on a specific issue are evaluated and analyzed. Manydeep learning models have been proposed to tackle this task, including the recently-introduced Bidirectional Encoder Rep-resentations from Transformers (BERT) model. In this paper,we experiment with two BERT fine-tuning methods for thesentiment analysis task on datasets of Vietnamese reviews: 1) a method that uses only the [CLS] token as the input for anattached feed-forward neural network, and 2) another methodin which all BERT output vectors are used as the input forclassification. Experimental results on two datasets show thatmodels using BERT slightly outperform other models usingGloVe and FastText. Also, regarding the datasets employed inthis study, our proposed BERT fine-tuning method produces amodel with better performance than the original BERT fine-tuning method

arXiv.org e-Print Archive

An External Knowledge Enhanced Multi-label Charge Prediction Approach with Label Number Learning

Author: Lin Li
Wei Duan
Publication venue
Publication date: 03/07/2019
Field of study

Multi-label charge prediction is a task to predict the corresponding accusations for legal cases, and recently becomes a hot topic. However, current studies use rough methods to deal with the label number. These methods manually set parameters to select label numbers, which has an effect in final prediction quality. We propose an external knowledge enhanced multi-label charge prediction approach that has two phases. One is charge label prediction phase with external knowledge from law provisions, the other one is number learning phase with a number learning network (NLN) designed. Our approach enhanced by external knowledge can automatically adjust the threshold to get label number of law cases. It combines the output probabilities of samples and their corresponding label numbers to get final prediction results. In experiments, our approach is connected to some state of-the art deep learning models. By testing on the biggest published Chinese law dataset, we find that our approach has improvements on these models. We future conduct experiments on multi-label samples from the dataset. In items of macro-F1, the improvement of baselines with our approach is 3%-5%; In items of micro-F1, the significant improvement of our approach is 5%-15%. The experiment results show the effectiveness our approach for multi-label charge prediction

arXiv.org e-Print Archive

Incorporating Effective Global Information via Adaptive Gate Attention for Text Classification

Author: Li Qing
Li Xianming
Li Zongxi
Xie Haoran
Zhao Yingbin
Publication venue
Publication date: 22/02/2020
Field of study

The dominant text classification studies focus on training classifiers using textual instances only or introducing external knowledge (e.g., hand-craft features and domain expert knowledge). In contrast, some corpus-level statistical features, like word frequency and distribution, are not well exploited. Our work shows that such simple statistical information can enhance classification performance both efficiently and significantly compared with several baseline models. In this paper, we propose a classifier with gate mechanism named Adaptive Gate Attention model with Global Information (AGA+GI), in which the adaptive gate mechanism incorporates global statistical features into latent semantic features and the attention layer captures dependency relationship within the sentence. To alleviate the overfitting issue, we propose a novel Leaky Dropout mechanism to improve generalization ability and performance stability. Our experiments show that the proposed method can achieve better accuracy than CNN-based and RNN-based approaches without global information on several benchmarks

arXiv.org e-Print Archive

"Wait, I'm Still Talking!" Predicting the Dialogue Interaction Behavior Using Imagine-Then-Arbitrate Model

Author: Chen Haiqing
Ji Feng
Kang Xiaoming
Li Guodun
Lin Zehao
Zhang Yin
Publication venue
Publication date: 26/05/2020
Field of study

Producing natural and accurate responses like human beings is the ultimate goal of intelligent dialogue agents. So far, most of the past works concentrate on selecting or generating one pertinent and fluent response according to current query and its context. These models work on a one-to-one environment, making one response to one utterance each round. However, in real human-human conversations, human often sequentially sends several short messages for readability instead of a long message in one turn. Thus messages will not end with an explicit ending signal, which is crucial for agents to decide when to reply. So the first step for an intelligent dialogue agent is not replying but deciding if it should reply at the moment. To address this issue, in this paper, we propose a novel Imagine-then-Arbitrate (ITA) neural dialogue model to help the agent decide whether to wait or to make a response directly. Our method has two imaginator modules and an arbitrator module. The two imaginators will learn the agent's and user's speaking style respectively, generate possible utterances as the input of the arbitrator, combining with dialogue history. And the arbitrator decides whether to wait or to make a response to the user directly. To verify the performance and effectiveness of our method, we prepared two dialogue datasets and compared our approach with several popular models. Experimental results show that our model performs well on addressing ending prediction issue and outperforms baseline models

arXiv.org e-Print Archive

Explicit Interaction Model towards Text Classification

Author: Chin Zhaozheng
Du Cunxiao
Feng Fuli
Gan Tian
Nie Liqiang
Zhu Lei
Publication venue
Publication date: 23/11/2018
Field of study

Text classification is one of the fundamental tasks in natural language processing. Recently, deep neural networks have achieved promising performance in the text classification task compared to shallow models. Despite of the significance of deep models, they ignore the fine-grained (matching signals between words and classes) classification clues since their classifications mainly rely on the text-level representations. To address this problem, we introduce the interaction mechanism to incorporate word-level matching signals into the text classification task. In particular, we design a novel framework, EXplicit interAction Model (dubbed as EXAM), equipped with the interaction mechanism. We justified the proposed approach on several benchmark datasets including both multi-label and multi-class text classification tasks. Extensive experimental results demonstrate the superiority of the proposed method. As a byproduct, we have released the codes and parameter settings to facilitate other researches.Comment: 8 page

arXiv.org e-Print Archive

POG: Personalized Outfit Generation for Fashion Recommendation at Alibaba iFashion

Author: Chen Wen
Guo Cheng
Guo Xin
Huang Pipei
Li Chao
Pfadler Andreas
Sun Fei
Xu Jiaming
Zhao Binqiang
Zhao Huan
Publication venue
Publication date: 19/05/2019
Field of study

Increasing demand for fashion recommendation raises a lot of challenges for online shopping platforms and fashion communities. In particular, there exist two requirements for fashion outfit recommendation: the Compatibility of the generated fashion outfits, and the Personalization in the recommendation process. In this paper, we demonstrate these two requirements can be satisfied via building a bridge between outfit generation and recommendation. Through large data analysis, we observe that people have similar tastes in individual items and outfits. Therefore, we propose a Personalized Outfit Generation (POG) model, which connects user preferences regarding individual items and outfits with Transformer architecture. Extensive offline and online experiments provide strong quantitative evidence that our method outperforms alternative methods regarding both compatibility and personalization metrics. Furthermore, we deploy POG on a platform named Dida in Alibaba to generate personalized outfits for the users of the online application iFashion. This work represents a first step towards an industrial-scale fashion outfit generation and recommendation solution, which goes beyond generating outfits based on explicit queries, or merely recommending from existing outfit pools. As part of this work, we release a large-scale dataset consisting of 1.01 million outfits with rich context information, and 0.28 billion user click actions from 3.57 million users. To the best of our knowledge, this dataset is the largest, publicly available, fashion related dataset, and the first to provide user behaviors relating to both outfits and fashion items.Comment: Till appear in KDD 201

arXiv.org e-Print Archive

How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence

Author: Liu Zhiyuan
Sun Maosong
Tu Cunchao
Xiao Chaojun
Zhang Tianyang
Zhong Haoxi
Publication venue
Publication date: 18/05/2020
Field of study

Legal Artificial Intelligence (LegalAI) focuses on applying the technology of artificial intelligence, especially natural language processing, to benefit tasks in the legal domain. In recent years, LegalAI has drawn increasing attention rapidly from both AI researchers and legal professionals, as LegalAI is beneficial to the legal system for liberating legal professionals from a maze of paperwork. Legal professionals often think about how to solve tasks from rule-based and symbol-based methods, while NLP researchers concentrate more on data-driven and embedding methods. In this paper, we introduce the history, the current state, and the future directions of research in LegalAI. We illustrate the tasks from the perspectives of legal professionals and NLP researchers and show several representative applications in LegalAI. We conduct experiments and provide an in-depth analysis of the advantages and disadvantages of existing works to explore possible future directions. You can find the implementation of our work from https://github.com/thunlp/CLAIM.Comment: Accepted by ACL 202

arXiv.org e-Print Archive