47,552 research outputs found

    Learning label dependency for multi-label classification

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Multi-label classification is an important topic in the field of machine learning. In many real applications, there exist potential dependencies or correlations between labels, and exploiting the underlying knowledge could effectively improve the learning performance. Therefore, how to learn and utilize the dependencies between labels has become one of the key issues of multi-label classification. This thesis firstly summarizes existing works and analyses their advantages and disadvantages. Several effective methods for multi-label classification are then proposed, focusing on ways of exploiting various types of label dependencies. The contributions of this thesis mainly include: (1) A method that uses a tree-structured restricted Bayesian network to represent the dependency structure of labels is proposed. This work is inspired by the ClassifierChain method. Compared with ClassifierChain, our method's advantage is that the dependencies between labels are represented using a Bayesian network rather than a randomly selected chain, so more appropriate label dependencies could be determined. Furthermore, ensemble learning technique is used to construct and combine multiple tree-structured Bayesian networks, thus the mutual dependencies between labels could be fully exploited and the final model could be more robust. The experimental results verify the effectiveness of these methods. Compared with other baselines, the show better performance due to more appropriate label dependencies are captured. (2) A common strategy of exploiting label dependencies is, for every label, to the labels it depends on and use these labels as auxiliary features in the training phase. The issues of this strategy are that the influence of label dependencies could be depressed by existing features and indirect label dependencies could not be taken into consideration. Therefore, a new learning paradigm that separates the influence of existing features and labels is introduced, and the impact of label dependencies could be well intensified in this way. Moreover, a method that models the propagation of label dependencies as a RWR process (Random Walk with Restart) is proposed. In this method, label dependencies are encoded as a graph, and the dynamic and indirect dependencies between labels are utilized through the RWR process over the label graph. The experimental results validate this method, showing that it outperforms other baselines in terms of learning a label ranking. (3) Based on above method, a method that takes multiple factors into consideration when learning label dependencies is proposed. In this method, dependency between two labels is characterized from different perspectives, and is determined by learning a linear combination of multiple measures. A particular loss function is designed, and thus the optimal label dependencies, i.e., the dependency matrix in RWR process, can be obtained by minimizing the loss function. The advantage of this method include: a) label dependencies are measures and combined from different perspectives, and b) label dependences that are optimal to a particular loss function now are obtained. The experimental results indicate that this method could further learn a better label ranking compared with the previous one, given an explicit loss function. (4) A novel method that learns label ranking by exploiting preferences between true labels and other labels is proposed. In this method, the original instance space and label space are mapped into a low-dimensional space using matrix factorization technique. Therefore, one advantage of the method is that the number of label is reduced greatly, and problem with massive labels now can be handle efficiently. Moreover, a loss function is formulated based on the assumption that an instance's true labels which have been given explicitly should be ranked before other labels which are not provided explicitly. It is then used to guide the process of matrix factorization and label ranking learning. The advantage of this novel assumption is that it alleviate issue in traditional assumption that if a label is not given explicitly, it should not be a true label. Therefore, this method is also applicable to data that are partially labelled. Its effectiveness is validated by the experimental result which shows that it could rank explicitly given label well before other labels for a given instance. In summary, this thesis has proposed several effective methods that exploit label dependencies from different perspectives, and their effectiveness have been validated by experiments. These achievements lay a good foundation for further research and applications

    Resource Constrained Structured Prediction

    Full text link
    We study the problem of structured prediction under test-time budget constraints. We propose a novel approach applicable to a wide range of structured prediction problems in computer vision and natural language processing. Our approach seeks to adaptively generate computationally costly features during test-time in order to reduce the computational cost of prediction while maintaining prediction performance. We show that training the adaptive feature generation system can be reduced to a series of structured learning problems, resulting in efficient training using existing structured learning algorithms. This framework provides theoretical justification for several existing heuristic approaches found in literature. We evaluate our proposed adaptive system on two structured prediction tasks, optical character recognition (OCR) and dependency parsing and show strong performance in reduction of the feature costs without degrading accuracy

    Improved Relation Extraction with Feature-Rich Compositional Embedding Models

    Full text link
    Compositional embedding models build a representation (or embedding) for a linguistic structure based on its component word embeddings. We propose a Feature-rich Compositional Embedding Model (FCM) for relation extraction that is expressive, generalizes to new domains, and is easy-to-implement. The key idea is to combine both (unlexicalized) hand-crafted features with learned word embeddings. The model is able to directly tackle the difficulties met by traditional compositional embeddings models, such as handling arbitrary types of sentence annotations and utilizing global information for composition. We test the proposed model on two relation extraction tasks, and demonstrate that our model outperforms both previous compositional models and traditional feature rich models on the ACE 2005 relation extraction task, and the SemEval 2010 relation classification task. The combination of our model and a log-linear classifier with hand-crafted features gives state-of-the-art results.Comment: 12 pages for EMNLP 201

    Neural End-to-End Learning for Computational Argumentation Mining

    Full text link
    We investigate neural techniques for end-to-end computational argumentation mining (AM). We frame AM both as a token-based dependency parsing and as a token-based sequence tagging problem, including a multi-task learning setup. Contrary to models that operate on the argument component level, we find that framing AM as dependency parsing leads to subpar performance results. In contrast, less complex (local) tagging models based on BiLSTMs perform robustly across classification scenarios, being able to catch long-range dependencies inherent to the AM problem. Moreover, we find that jointly learning 'natural' subtasks, in a multi-task learning setup, improves performance.Comment: To be published at ACL 201

    Better, Faster, Stronger Sequence Tagging Constituent Parsers

    Get PDF
    Sequence tagging models for constituent parsing are faster, but less accurate than other types of parsers. In this work, we address the following weaknesses of such constituent parsers: (a) high error rates around closing brackets of long constituents, (b) large label sets, leading to sparsity, and (c) error propagation arising from greedy decoding. To effectively close brackets, we train a model that learns to switch between tagging schemes. To reduce sparsity, we decompose the label set and use multi-task learning to jointly learn to predict sublabels. Finally, we mitigate issues from greedy decoding through auxiliary losses and sentence-level fine-tuning with policy gradient. Combining these techniques, we clearly surpass the performance of sequence tagging constituent parsers on the English and Chinese Penn Treebanks, and reduce their parsing time even further. On the SPMRL datasets, we observe even greater improvements across the board, including a new state of the art on Basque, Hebrew, Polish and Swedish.Comment: NAACL 2019 (long papers). Contains corrigendu

    A Joint Model for Definition Extraction with Syntactic Connection and Semantic Consistency

    Full text link
    Definition Extraction (DE) is one of the well-known topics in Information Extraction that aims to identify terms and their corresponding definitions in unstructured texts. This task can be formalized either as a sentence classification task (i.e., containing term-definition pairs or not) or a sequential labeling task (i.e., identifying the boundaries of the terms and definitions). The previous works for DE have only focused on one of the two approaches, failing to model the inter-dependencies between the two tasks. In this work, we propose a novel model for DE that simultaneously performs the two tasks in a single framework to benefit from their inter-dependencies. Our model features deep learning architectures to exploit the global structures of the input sentences as well as the semantic consistencies between the terms and the definitions, thereby improving the quality of the representation vectors for DE. Besides the joint inference between sentence classification and sequential labeling, the proposed model is fundamentally different from the prior work for DE in that the prior work has only employed the local structures of the input sentences (i.e., word-to-word relations), and not yet considered the semantic consistencies between terms and definitions. In order to implement these novel ideas, our model presents a multi-task learning framework that employs graph convolutional neural networks and predicts the dependency paths between the terms and the definitions. We also seek to enforce the consistency between the representations of the terms and definitions both globally (i.e., increasing semantic consistency between the representations of the entire sentences and the terms/definitions) and locally (i.e., promoting the similarity between the representations of the terms and the definitions)
    corecore