16 research outputs found
On Classification with Bags, Groups and Sets
Many classification problems can be difficult to formulate directly in terms
of the traditional supervised setting, where both training and test samples are
individual feature vectors. There are cases in which samples are better
described by sets of feature vectors, that labels are only available for sets
rather than individual samples, or, if individual labels are available, that
these are not independent. To better deal with such problems, several
extensions of supervised learning have been proposed, where either training
and/or test objects are sets of feature vectors. However, having been proposed
rather independently of each other, their mutual similarities and differences
have hitherto not been mapped out. In this work, we provide an overview of such
learning scenarios, propose a taxonomy to illustrate the relationships between
them, and discuss directions for further research in these areas
Meta-Prediction for Collective Classification
When data instances are inter-related, as are nodes in a social network or hyperlink graph, algorithms for collective classification (CC) can significantly improve accuracy. Recently, an algorithm for CC named Cautious ICA (ICAC) was shown to improve accuracy compared to the popular ICA algorithm. ICAC improves performance by initially favoring its more confident predictions during collective inference. In this paper, we introduce ICAMC, a new algorithm that outperforms ICAC when the attributes that describe each node are not highly predictive. ICAMC learns a meta-classifier that identifies which node label predictions are most likely to be correct. We show that this approach significantly increases accuracy on a range of real and synthetic data sets. We also describe new features for the meta-classifier and demonstrate that a simple search can identify an effective feature set that increases accuracy
A Two-Stream Mutual Attention Network for Semi-supervised Biomedical Segmentation with Noisy Labels
\begin{abstract} Learning-based methods suffer from a deficiency of clean
annotations, especially in biomedical segmentation. Although many
semi-supervised methods have been proposed to provide extra training data,
automatically generated labels are usually too noisy to retrain models
effectively. In this paper, we propose a Two-Stream Mutual Attention Network
(TSMAN) that weakens the influence of back-propagated gradients caused by
incorrect labels, thereby rendering the network robust to unclean data. The
proposed TSMAN consists of two sub-networks that are connected by three types
of attention models in different layers. The target of each attention model is
to indicate potentially incorrect gradients in a certain layer for both
sub-networks by analyzing their inferred features using the same input. In
order to achieve this purpose, the attention models are designed based on the
propagation analysis of noisy gradients at different layers. This allows the
attention models to effectively discover incorrect labels and weaken their
influence during the parameter updating process. By exchanging multi-level
features within the two-stream architecture, the effects of noisy labels in
each sub-network are reduced by decreasing the updating gradients. Furthermore,
a hierarchical distillation is developed to provide more reliable pseudo labels
for unlabelded data, which further boosts the performance of our retrained
TSMAN. The experiments using both the HVSMR 2016 and BRATS 2015 benchmarks
demonstrate that our semi-supervised learning framework surpasses the
state-of-the-art fully-supervised results
Restricted set classification: Who is there?
We consider a problem where a set X of N objects (instances) coming from c classes have to be classified simultaneously. A restriction is imposed on X in that the maximum possible number of objects from each class is known, hence we dubbed the problem who-is-there? We compare three approaches to this problem: (1) independent classification whereby each object is labelled in the class with the largest posterior probability; (2) a greedy approach which enforces the restriction; and (3) a theoretical approach which, in addition, maximises the likelihood of the label assignment, implemented through the Hungarian assignment algorithm. Our experimental study consists of two parts. The first part includes a custom-made chess data set where the pieces on the chess board must be recognised together from an image of the board. In the second part, we simulate the restricted set classification scenario using 96 datasets from a recently collated repository (University of Santiago de Compostela, USC). Our results show that the proposed approach (3) outperforms approaches (1) and (2).Spanish Ministry of
Economy and Competitiveness through project TIN 2015-67534-
Synthetic generators for simulating social networks
An application area of increasing importance is creating agent-based simulations to model human societies. One component of developing these simulations is the ability to generate realistic human social networks. Online social networking websites, such as Facebook, Google+, and Twitter, have increased in popularity in the last decade. Despite the increase in online social networking tools and the importance of studying human behavior in these networks, collecting data directly from these networks is not always feasible due to privacy concerns. Previous work in this area has primarily been limited to 1) network generators that aim to duplicate a small subset of the original network\u27s properties and 2) problem-specific generators for applications such as the evaluation of community detection algorithms. In this thesis, we extended two synthetic network generators to enable them to duplicate the properties of a specific dataset. In the first generator, we consider feature similarity and label homophily among individuals when forming links. The second generator is designed to handle multiplex networks that contain different link types. We evaluate the performance of both generators on existing real-world social network datasets, as well as comparing our methods with a related synthetic network generator. In this thesis, we demonstrate that the proposed synthetic network generators are both time efficient and require only limited parameter optimization
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed