3,836 research outputs found
Multi-Label Zero-Shot Human Action Recognition via Joint Latent Ranking Embedding
Human action recognition refers to automatic recognizing human actions from a
video clip. In reality, there often exist multiple human actions in a video
stream. Such a video stream is often weakly-annotated with a set of relevant
human action labels at a global level rather than assigning each label to a
specific video episode corresponding to a single action, which leads to a
multi-label learning problem. Furthermore, there are many meaningful human
actions in reality but it would be extremely difficult to collect/annotate
video clips regarding all of various human actions, which leads to a zero-shot
learning scenario. To the best of our knowledge, there is no work that has
addressed all the above issues together in human action recognition. In this
paper, we formulate a real-world human action recognition task as a multi-label
zero-shot learning problem and propose a framework to tackle this problem in a
holistic way. Our framework holistically tackles the issue of unknown temporal
boundaries between different actions for multi-label learning and exploits the
side information regarding the semantic relationship between different human
actions for knowledge transfer. Consequently, our framework leads to a joint
latent ranking embedding for multi-label zero-shot human action recognition. A
novel neural architecture of two component models and an alternate learning
algorithm are proposed to carry out the joint latent ranking embedding
learning. Thus, multi-label zero-shot recognition is done by measuring
relatedness scores of action labels to a test video clip in the joint latent
visual and semantic embedding spaces. We evaluate our framework with different
settings, including a novel data split scheme designed especially for
evaluating multi-label zero-shot learning, on two datasets: Breakfast and
Charades. The experimental results demonstrate the effectiveness of our
framework.Comment: 27 pages, 10 figures and 7 tables. Technical report submitted to a
journal. More experimental results/references were added and typos were
correcte
Deep Learning for Technical Document Classification
In large technology companies, the requirements for managing and organizing
technical documents created by engineers and managers have increased
dramatically in recent years, which has led to a higher demand for more
scalable, accurate, and automated document classification. Prior studies have
only focused on processing text for classification, whereas technical documents
often contain multimodal information. To leverage multimodal information for
document classification to improve the model performance, this paper presents a
novel multimodal deep learning architecture, TechDoc, which utilizes three
types of information, including natural language texts and descriptive images
within documents and the associations among the documents. The architecture
synthesizes the convolutional neural network, recurrent neural network, and
graph neural network through an integrated training process. We applied the
architecture to a large multimodal technical document database and trained the
model for classifying documents based on the hierarchical International Patent
Classification system. Our results show that TechDoc presents a greater
classification accuracy than the unimodal methods and other state-of-the-art
benchmarks. The trained model can potentially be scaled to millions of
real-world multimodal technical documents, which is useful for data and
knowledge management in large technology companies and organizations.Comment: 16 pages, 8 figures, 9 table
Multilabel Classification for News Article Using Long Short-Term Memory
oai:ojs.sjia.ilkom.unsri.ac.id:article/14Multilabel text classification is a task of categorizing text into one or more categories. Like other machine learning, multilabel classification performance is limited when there is small labeled data and leads to the difficulty of capturing semantic relationships. In this case, it requires a multi-label text classification technique that can group four labels from news articles. Deep Learning is a proposed method for solving problems in multi-label text classification techniques. By comparing the seven proposed Long Short-Term Memory (LSTM) models with large-scale datasets by dividing 4 LSTM models with 1 layer, 2 layer and 3-layer LSTM and Bidirectional LSTM to show that LSTM can achieve good performance in multi-label text classification. The results show that the evaluation of the performance of the 2-layer LSTM model in the training process obtained an accuracy of 96 with the highest testing accuracy of all models at 94.3. The performance results for model 3 with 1-layer LSTM obtained the average value of precision, recall, and f1-score equal to the 94 training process accuracy. This states that model 3 with 1-layer LSTM both training and testing process is better. The comparison among seven proposed LSTM models shows that model 3 with 1 layer LSTM is the best model
- …