8,304 research outputs found
Multi-view Co-training for microRNA Prediction
MicroRNA (miRNA) are short, non-coding RNAs involved in cell regulation at post-transcriptional and translational levels. Numerous computational predictors of miRNA been developed that generally classify miRNA based on either sequence- or expression-based features. While these methods are highly effective, they require large labelled training data sets, which are often not available for many species. Simultaneously, emerging high-throughput wet-lab experimental procedures are producing large unlabelled data sets of genomic sequence and RNA expression profiles. Existing methods use supervised machine learning and are therefore unable to leverage these unlabelled data. In this paper, we design and develop a multi-view co-training approach for the classification of miRNA to maximize the utility of unlabelled training data by taking advantage of multiple views of the problem. Starting with only 10 labelled training data, co-training is shown to significantly (p < 0.01) increase classification accuracy of both sequence- and expression-based classifiers, without requiring any new labelled training data. After 11 iterations of co-training, the expression-based view of miRNA classification experiences an average increase in AUPRC of 15.81% over six species, compared to 11.90% for self-training and 4.84% for passive learning. Similar results are observed for sequence-based classifiers with increases of 46.47%, 39.53% and 29.43%, for co-training, self-training, and passive learning, respectively. The final co-trained sequence and expression-based classifiers are integrated into a final confidence-based classifier which shows improved performance compared to both the expression (1.5%, p = 0.021) and sequence (3.7%, p = 0.006) views. This study represents the first application of multi-view co-training to miRNA prediction and shows great promise, particularly for understudied species with few available training data
Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective
This paper takes a problem-oriented perspective and presents a comprehensive
review of transfer learning methods, both shallow and deep, for cross-dataset
visual recognition. Specifically, it categorises the cross-dataset recognition
into seventeen problems based on a set of carefully chosen data and label
attributes. Such a problem-oriented taxonomy has allowed us to examine how
different transfer learning approaches tackle each problem and how well each
problem has been researched to date. The comprehensive problem-oriented review
of the advances in transfer learning with respect to the problem has not only
revealed the challenges in transfer learning for visual recognition, but also
the problems (e.g. eight of the seventeen problems) that have been scarcely
studied. This survey not only presents an up-to-date technical review for
researchers, but also a systematic approach and a reference for a machine
learning practitioner to categorise a real problem and to look up for a
possible solution accordingly
Semi-Supervised Learning with Scarce Annotations
While semi-supervised learning (SSL) algorithms provide an efficient way to
make use of both labelled and unlabelled data, they generally struggle when the
number of annotated samples is very small. In this work, we consider the
problem of SSL multi-class classification with very few labelled instances. We
introduce two key ideas. The first is a simple but effective one: we leverage
the power of transfer learning among different tasks and self-supervision to
initialize a good representation of the data without making use of any label.
The second idea is a new algorithm for SSL that can exploit well such a
pre-trained representation.
The algorithm works by alternating two phases, one fitting the labelled
points and one fitting the unlabelled ones, with carefully-controlled
information flow between them. The benefits are greatly reducing overfitting of
the labelled data and avoiding issue with balancing labelled and unlabelled
losses during training. We show empirically that this method can successfully
train competitive models with as few as 10 labelled data points per class. More
in general, we show that the idea of bootstrapping features using
self-supervised learning always improves SSL on standard benchmarks. We show
that our algorithm works increasingly well compared to other methods when
refining from other tasks or datasets.Comment: Workshop on Deep Vision, CVPR 202
Multi-Atlas Segmentation using Partially Annotated Data: Methods and Annotation Strategies
Multi-atlas segmentation is a widely used tool in medical image analysis,
providing robust and accurate results by learning from annotated atlas
datasets. However, the availability of fully annotated atlas images for
training is limited due to the time required for the labelling task.
Segmentation methods requiring only a proportion of each atlas image to be
labelled could therefore reduce the workload on expert raters tasked with
annotating atlas images. To address this issue, we first re-examine the
labelling problem common in many existing approaches and formulate its solution
in terms of a Markov Random Field energy minimisation problem on a graph
connecting atlases and the target image. This provides a unifying framework for
multi-atlas segmentation. We then show how modifications in the graph
configuration of the proposed framework enable the use of partially annotated
atlas images and investigate different partial annotation strategies. The
proposed method was evaluated on two Magnetic Resonance Imaging (MRI) datasets
for hippocampal and cardiac segmentation. Experiments were performed aimed at
(1) recreating existing segmentation techniques with the proposed framework and
(2) demonstrating the potential of employing sparsely annotated atlas data for
multi-atlas segmentation
Transfer Learning for Speech and Language Processing
Transfer learning is a vital technique that generalizes models trained for
one setting or task to other settings or tasks. For example in speech
recognition, an acoustic model trained for one language can be used to
recognize speech in another language, with little or no re-training data.
Transfer learning is closely related to multi-task learning (cross-lingual vs.
multilingual), and is traditionally studied in the name of `model adaptation'.
Recent advance in deep learning shows that transfer learning becomes much
easier and more effective with high-level abstract features learned by deep
models, and the `transfer' can be conducted not only between data distributions
and data types, but also between model structures (e.g., shallow nets and deep
nets) or even model types (e.g., Bayesian models and neural models). This
review paper summarizes some recent prominent research towards this direction,
particularly for speech and language processing. We also report some results
from our group and highlight the potential of this very interesting research
field.Comment: 13 pages, APSIPA 201
Semi-Supervised Learning, Causality and the Conditional Cluster Assumption
While the success of semi-supervised learning (SSL) is still not fully
understood, Sch\"olkopf et al. (2012) have established a link to the principle
of independent causal mechanisms. They conclude that SSL should be impossible
when predicting a target variable from its causes, but possible when predicting
it from its effects. Since both these cases are somewhat restrictive, we extend
their work by considering classification using cause and effect features at the
same time, such as predicting disease from both risk factors and symptoms.
While standard SSL exploits information contained in the marginal distribution
of all inputs (to improve the estimate of the conditional distribution of the
target given inputs), we argue that in our more general setting we should use
information in the conditional distribution of effect features given causal
features. We explore how this insight generalises the previous understanding,
and how it relates to and can be exploited algorithmically for SSL.Comment: 36th Conference on Uncertainty in Artificial Intelligence (2020)
(Previously presented at the NeurIPS 2019 workshop "Do the right thing":
machine learning and causal inference for improved decision making,
Vancouver, Canada.
- …