47,552 research outputs found
Learning label dependency for multi-label classification
University of Technology Sydney. Faculty of Engineering and Information Technology.Multi-label classification is an important topic in the field of machine learning. In many real applications, there exist potential dependencies or correlations between labels, and exploiting the underlying knowledge could effectively improve the learning performance. Therefore, how to learn and utilize the dependencies between labels has become one of the key issues of multi-label classification.
This thesis firstly summarizes existing works and analyses their advantages and disadvantages. Several effective methods for multi-label classification are then proposed, focusing on ways of exploiting various types of label dependencies. The contributions of this thesis mainly include:
(1) A method that uses a tree-structured restricted Bayesian network to represent the dependency structure of labels is proposed. This work is inspired by the ClassifierChain method. Compared with ClassifierChain, our method's advantage is that the dependencies between labels are represented using a Bayesian network rather than a randomly selected chain, so more appropriate label dependencies could be determined. Furthermore, ensemble learning technique is used to construct and combine multiple tree-structured Bayesian networks, thus the mutual dependencies between labels could be fully exploited and the final model could be more robust. The experimental results verify the effectiveness of these methods. Compared with other baselines, the show better performance due to more appropriate label dependencies are captured.
(2) A common strategy of exploiting label dependencies is, for every label, to the labels it depends on and use these labels as auxiliary features in the training phase. The issues of this strategy are that the influence of label dependencies could be depressed by existing features and indirect label dependencies could not be taken into consideration. Therefore, a new learning paradigm that separates the influence of existing features and labels is introduced, and the impact of label dependencies could be well intensified in this way. Moreover, a method that models the propagation of label dependencies as a RWR process (Random Walk with Restart) is proposed. In this method, label dependencies are encoded as a graph, and the dynamic and indirect dependencies between labels are utilized through the RWR process over the label graph. The experimental results validate this method, showing that it outperforms other baselines in terms of learning a label ranking.
(3) Based on above method, a method that takes multiple factors into consideration when learning label dependencies is proposed. In this method, dependency between two labels is characterized from different perspectives, and is determined by learning a linear combination of multiple measures. A particular loss function is designed, and thus the optimal label dependencies, i.e., the dependency matrix in RWR process, can be obtained by minimizing the loss function. The advantage of this method include: a) label dependencies are measures and combined from different perspectives, and b) label dependences that are optimal to a particular loss function now are obtained. The experimental results indicate that this method could further learn a better label ranking compared with the previous one, given an explicit loss function.
(4) A novel method that learns label ranking by exploiting preferences between true labels and other labels is proposed. In this method, the original instance space and label space are mapped into a low-dimensional space using matrix factorization technique. Therefore, one advantage of the method is that the number of label is reduced greatly, and problem with massive labels now can be handle efficiently. Moreover, a loss function is formulated based on the assumption that an instance's true labels which have been given explicitly should be ranked before other labels which are not provided explicitly. It is then used to guide the process of matrix factorization and label ranking learning. The advantage of this novel assumption is that it alleviate issue in traditional assumption that if a label is not given explicitly, it should not be a true label. Therefore, this method is also applicable to data that are partially labelled. Its effectiveness is validated by the experimental result which shows that it could rank explicitly given label well before other labels for a given instance.
In summary, this thesis has proposed several effective methods that exploit label dependencies from different perspectives, and their effectiveness have been validated by experiments. These achievements lay a good foundation for further research and applications
Resource Constrained Structured Prediction
We study the problem of structured prediction under test-time budget
constraints. We propose a novel approach applicable to a wide range of
structured prediction problems in computer vision and natural language
processing. Our approach seeks to adaptively generate computationally costly
features during test-time in order to reduce the computational cost of
prediction while maintaining prediction performance. We show that training the
adaptive feature generation system can be reduced to a series of structured
learning problems, resulting in efficient training using existing structured
learning algorithms. This framework provides theoretical justification for
several existing heuristic approaches found in literature. We evaluate our
proposed adaptive system on two structured prediction tasks, optical character
recognition (OCR) and dependency parsing and show strong performance in
reduction of the feature costs without degrading accuracy
Improved Relation Extraction with Feature-Rich Compositional Embedding Models
Compositional embedding models build a representation (or embedding) for a
linguistic structure based on its component word embeddings. We propose a
Feature-rich Compositional Embedding Model (FCM) for relation extraction that
is expressive, generalizes to new domains, and is easy-to-implement. The key
idea is to combine both (unlexicalized) hand-crafted features with learned word
embeddings. The model is able to directly tackle the difficulties met by
traditional compositional embeddings models, such as handling arbitrary types
of sentence annotations and utilizing global information for composition. We
test the proposed model on two relation extraction tasks, and demonstrate that
our model outperforms both previous compositional models and traditional
feature rich models on the ACE 2005 relation extraction task, and the SemEval
2010 relation classification task. The combination of our model and a
log-linear classifier with hand-crafted features gives state-of-the-art
results.Comment: 12 pages for EMNLP 201
Neural End-to-End Learning for Computational Argumentation Mining
We investigate neural techniques for end-to-end computational argumentation
mining (AM). We frame AM both as a token-based dependency parsing and as a
token-based sequence tagging problem, including a multi-task learning setup.
Contrary to models that operate on the argument component level, we find that
framing AM as dependency parsing leads to subpar performance results. In
contrast, less complex (local) tagging models based on BiLSTMs perform robustly
across classification scenarios, being able to catch long-range dependencies
inherent to the AM problem. Moreover, we find that jointly learning 'natural'
subtasks, in a multi-task learning setup, improves performance.Comment: To be published at ACL 201
Better, Faster, Stronger Sequence Tagging Constituent Parsers
Sequence tagging models for constituent parsing are faster, but less accurate
than other types of parsers. In this work, we address the following weaknesses
of such constituent parsers: (a) high error rates around closing brackets of
long constituents, (b) large label sets, leading to sparsity, and (c) error
propagation arising from greedy decoding. To effectively close brackets, we
train a model that learns to switch between tagging schemes. To reduce
sparsity, we decompose the label set and use multi-task learning to jointly
learn to predict sublabels. Finally, we mitigate issues from greedy decoding
through auxiliary losses and sentence-level fine-tuning with policy gradient.
Combining these techniques, we clearly surpass the performance of sequence
tagging constituent parsers on the English and Chinese Penn Treebanks, and
reduce their parsing time even further. On the SPMRL datasets, we observe even
greater improvements across the board, including a new state of the art on
Basque, Hebrew, Polish and Swedish.Comment: NAACL 2019 (long papers). Contains corrigendu
A Joint Model for Definition Extraction with Syntactic Connection and Semantic Consistency
Definition Extraction (DE) is one of the well-known topics in Information
Extraction that aims to identify terms and their corresponding definitions in
unstructured texts. This task can be formalized either as a sentence
classification task (i.e., containing term-definition pairs or not) or a
sequential labeling task (i.e., identifying the boundaries of the terms and
definitions). The previous works for DE have only focused on one of the two
approaches, failing to model the inter-dependencies between the two tasks. In
this work, we propose a novel model for DE that simultaneously performs the two
tasks in a single framework to benefit from their inter-dependencies. Our model
features deep learning architectures to exploit the global structures of the
input sentences as well as the semantic consistencies between the terms and the
definitions, thereby improving the quality of the representation vectors for
DE. Besides the joint inference between sentence classification and sequential
labeling, the proposed model is fundamentally different from the prior work for
DE in that the prior work has only employed the local structures of the input
sentences (i.e., word-to-word relations), and not yet considered the semantic
consistencies between terms and definitions. In order to implement these novel
ideas, our model presents a multi-task learning framework that employs graph
convolutional neural networks and predicts the dependency paths between the
terms and the definitions. We also seek to enforce the consistency between the
representations of the terms and definitions both globally (i.e., increasing
semantic consistency between the representations of the entire sentences and
the terms/definitions) and locally (i.e., promoting the similarity between the
representations of the terms and the definitions)
- …