115 research outputs found
Truly Multi-modal YouTube-8M Video Classification with Video, Audio, and Text
The YouTube-8M video classification challenge requires teams to classify 0.7
million videos into one or more of 4,716 classes. In this Kaggle competition,
we placed in the top 3% out of 650 participants using released video and audio
features. Beyond that, we extend the original competition by including text
information in the classification, making this a truly multi-modal approach
with vision, audio and text. The newly introduced text data is termed as
YouTube-8M-Text. We present a classification framework for the joint use of
text, visual and audio features, and conduct an extensive set of experiments to
quantify the benefit that this additional mode brings. The inclusion of text
yields state-of-the-art results, e.g. 86.7% GAP on the YouTube-8M-Text
validation dataset.Comment: 8 pages, Accepted to CVPR'17 Workshop on YouTube-8M Large-Scale Video
Understandin
Job Prediction: From Deep Neural Network Models to Applications
Determining the job is suitable for a student or a person looking for work
based on their job's descriptions such as knowledge and skills that are
difficult, as well as how employers must find ways to choose the candidates
that match the job they require. In this paper, we focus on studying the job
prediction using different deep neural network models including TextCNN,
Bi-GRU-LSTM-CNN, and Bi-GRU-CNN with various pre-trained word embeddings on the
IT Job dataset. In addition, we also proposed a simple and effective ensemble
model combining different deep neural network models. The experimental results
illustrated that our proposed ensemble model achieved the highest result with
an F1 score of 72.71%. Moreover, we analyze these experimental results to have
insights about this problem to find better solutions in the future.Comment: Accepted by IEEE RIVF 2020 Conferenc
DeepWear: Adaptive Local Offloading for On-Wearable Deep Learning
Due to their on-body and ubiquitous nature, wearables can generate a wide
range of unique sensor data creating countless opportunities for deep learning
tasks. We propose DeepWear, a deep learning (DL) framework for wearable devices
to improve the performance and reduce the energy footprint. DeepWear
strategically offloads DL tasks from a wearable device to its paired handheld
device through local network. Compared to the remote-cloud-based offloading,
DeepWear requires no Internet connectivity, consumes less energy, and is robust
to privacy breach. DeepWear provides various novel techniques such as
context-aware offloading, strategic model partition, and pipelining support to
efficiently utilize the processing capacity from nearby paired handhelds.
Deployed as a user-space library, DeepWear offers developer-friendly APIs that
are as simple as those in traditional DL libraries such as TensorFlow. We have
implemented DeepWear on the Android OS and evaluated it on COTS smartphones and
smartwatches with real DL models. DeepWear brings up to 5.08X and 23.0X
execution speedup, as well as 53.5% and 85.5% energy saving compared to
wearable-only and handheld-only strategies, respectively
Fine-Tuning BERT for Sentiment Analysis of Vietnamese Reviews
Sentiment analysis is an important task in the field ofNature Language
Processing (NLP), in which users' feedbackdata on a specific issue are
evaluated and analyzed. Manydeep learning models have been proposed to tackle
this task, including the recently-introduced Bidirectional Encoder
Rep-resentations from Transformers (BERT) model. In this paper,we experiment
with two BERT fine-tuning methods for thesentiment analysis task on datasets of
Vietnamese reviews: 1) a method that uses only the [CLS] token as the input for
anattached feed-forward neural network, and 2) another methodin which all BERT
output vectors are used as the input forclassification. Experimental results on
two datasets show thatmodels using BERT slightly outperform other models
usingGloVe and FastText. Also, regarding the datasets employed inthis study,
our proposed BERT fine-tuning method produces amodel with better performance
than the original BERT fine-tuning method
An External Knowledge Enhanced Multi-label Charge Prediction Approach with Label Number Learning
Multi-label charge prediction is a task to predict the corresponding
accusations for legal cases, and recently becomes a hot topic. However, current
studies use rough methods to deal with the label number. These methods manually
set parameters to select label numbers, which has an effect in final prediction
quality. We propose an external knowledge enhanced multi-label charge
prediction approach that has two phases. One is charge label prediction phase
with external knowledge from law provisions, the other one is number learning
phase with a number learning network (NLN) designed. Our approach enhanced by
external knowledge can automatically adjust the threshold to get label number
of law cases. It combines the output probabilities of samples and their
corresponding label numbers to get final prediction results. In experiments,
our approach is connected to some state of-the art deep learning models. By
testing on the biggest published Chinese law dataset, we find that our approach
has improvements on these models. We future conduct experiments on multi-label
samples from the dataset. In items of macro-F1, the improvement of baselines
with our approach is 3%-5%; In items of micro-F1, the significant improvement
of our approach is 5%-15%. The experiment results show the effectiveness our
approach for multi-label charge prediction
Incorporating Effective Global Information via Adaptive Gate Attention for Text Classification
The dominant text classification studies focus on training classifiers using
textual instances only or introducing external knowledge (e.g., hand-craft
features and domain expert knowledge). In contrast, some corpus-level
statistical features, like word frequency and distribution, are not well
exploited. Our work shows that such simple statistical information can enhance
classification performance both efficiently and significantly compared with
several baseline models. In this paper, we propose a classifier with gate
mechanism named Adaptive Gate Attention model with Global Information (AGA+GI),
in which the adaptive gate mechanism incorporates global statistical features
into latent semantic features and the attention layer captures dependency
relationship within the sentence. To alleviate the overfitting issue, we
propose a novel Leaky Dropout mechanism to improve generalization ability and
performance stability. Our experiments show that the proposed method can
achieve better accuracy than CNN-based and RNN-based approaches without global
information on several benchmarks
"Wait, I'm Still Talking!" Predicting the Dialogue Interaction Behavior Using Imagine-Then-Arbitrate Model
Producing natural and accurate responses like human beings is the ultimate
goal of intelligent dialogue agents. So far, most of the past works concentrate
on selecting or generating one pertinent and fluent response according to
current query and its context. These models work on a one-to-one environment,
making one response to one utterance each round. However, in real human-human
conversations, human often sequentially sends several short messages for
readability instead of a long message in one turn. Thus messages will not end
with an explicit ending signal, which is crucial for agents to decide when to
reply. So the first step for an intelligent dialogue agent is not replying but
deciding if it should reply at the moment. To address this issue, in this
paper, we propose a novel Imagine-then-Arbitrate (ITA) neural dialogue model to
help the agent decide whether to wait or to make a response directly. Our
method has two imaginator modules and an arbitrator module. The two imaginators
will learn the agent's and user's speaking style respectively, generate
possible utterances as the input of the arbitrator, combining with dialogue
history. And the arbitrator decides whether to wait or to make a response to
the user directly. To verify the performance and effectiveness of our method,
we prepared two dialogue datasets and compared our approach with several
popular models. Experimental results show that our model performs well on
addressing ending prediction issue and outperforms baseline models
Explicit Interaction Model towards Text Classification
Text classification is one of the fundamental tasks in natural language
processing. Recently, deep neural networks have achieved promising performance
in the text classification task compared to shallow models. Despite of the
significance of deep models, they ignore the fine-grained (matching signals
between words and classes) classification clues since their classifications
mainly rely on the text-level representations. To address this problem, we
introduce the interaction mechanism to incorporate word-level matching signals
into the text classification task. In particular, we design a novel framework,
EXplicit interAction Model (dubbed as EXAM), equipped with the interaction
mechanism. We justified the proposed approach on several benchmark datasets
including both multi-label and multi-class text classification tasks. Extensive
experimental results demonstrate the superiority of the proposed method. As a
byproduct, we have released the codes and parameter settings to facilitate
other researches.Comment: 8 page
POG: Personalized Outfit Generation for Fashion Recommendation at Alibaba iFashion
Increasing demand for fashion recommendation raises a lot of challenges for
online shopping platforms and fashion communities. In particular, there exist
two requirements for fashion outfit recommendation: the Compatibility of the
generated fashion outfits, and the Personalization in the recommendation
process. In this paper, we demonstrate these two requirements can be satisfied
via building a bridge between outfit generation and recommendation. Through
large data analysis, we observe that people have similar tastes in individual
items and outfits. Therefore, we propose a Personalized Outfit Generation (POG)
model, which connects user preferences regarding individual items and outfits
with Transformer architecture. Extensive offline and online experiments provide
strong quantitative evidence that our method outperforms alternative methods
regarding both compatibility and personalization metrics. Furthermore, we
deploy POG on a platform named Dida in Alibaba to generate personalized outfits
for the users of the online application iFashion.
This work represents a first step towards an industrial-scale fashion outfit
generation and recommendation solution, which goes beyond generating outfits
based on explicit queries, or merely recommending from existing outfit pools.
As part of this work, we release a large-scale dataset consisting of 1.01
million outfits with rich context information, and 0.28 billion user click
actions from 3.57 million users. To the best of our knowledge, this dataset is
the largest, publicly available, fashion related dataset, and the first to
provide user behaviors relating to both outfits and fashion items.Comment: Till appear in KDD 201
How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence
Legal Artificial Intelligence (LegalAI) focuses on applying the technology of
artificial intelligence, especially natural language processing, to benefit
tasks in the legal domain. In recent years, LegalAI has drawn increasing
attention rapidly from both AI researchers and legal professionals, as LegalAI
is beneficial to the legal system for liberating legal professionals from a
maze of paperwork. Legal professionals often think about how to solve tasks
from rule-based and symbol-based methods, while NLP researchers concentrate
more on data-driven and embedding methods. In this paper, we introduce the
history, the current state, and the future directions of research in LegalAI.
We illustrate the tasks from the perspectives of legal professionals and NLP
researchers and show several representative applications in LegalAI. We conduct
experiments and provide an in-depth analysis of the advantages and
disadvantages of existing works to explore possible future directions. You can
find the implementation of our work from https://github.com/thunlp/CLAIM.Comment: Accepted by ACL 202
- …