14,603 research outputs found
Learning from the past with experiment databases
Thousands of Machine Learning research papers contain experimental comparisons that usually have been conducted with a single focus of interest, and detailed results are usually lost after publication. Once past experiments are collected in experiment databases they allow for additional and possibly much broader investigation. In this paper, we show how to use such a repository to answer various interesting research questions about learning algorithms and to verify a number of recent studies. Alongside performing elaborate comparisons and rankings of algorithms, we also investigate the effects of algorithm parameters and data properties, and study the learning curves and bias-variance profiles of algorithms to gain deeper insights into their behavior
PRESISTANT: Learning based assistant for data pre-processing
Data pre-processing is one of the most time consuming and relevant steps in a
data analysis process (e.g., classification task). A given data pre-processing
operator (e.g., transformation) can have positive, negative or zero impact on
the final result of the analysis. Expert users have the required knowledge to
find the right pre-processing operators. However, when it comes to non-experts,
they are overwhelmed by the amount of pre-processing operators and it is
challenging for them to find operators that would positively impact their
analysis (e.g., increase the predictive accuracy of a classifier). Existing
solutions either assume that users have expert knowledge, or they recommend
pre-processing operators that are only "syntactically" applicable to a dataset,
without taking into account their impact on the final analysis. In this work,
we aim at providing assistance to non-expert users by recommending data
pre-processing operators that are ranked according to their impact on the final
analysis. We developed a tool PRESISTANT, that uses Random Forests to learn the
impact of pre-processing operators on the performance (e.g., predictive
accuracy) of 5 different classification algorithms, such as J48, Naive Bayes,
PART, Logistic Regression, and Nearest Neighbor. Extensive evaluations on the
recommendations provided by our tool, show that PRESISTANT can effectively help
non-experts in order to achieve improved results in their analytical tasks
Convolutional Sparse Kernel Network for Unsupervised Medical Image Analysis
The availability of large-scale annotated image datasets and recent advances
in supervised deep learning methods enable the end-to-end derivation of
representative image features that can impact a variety of image analysis
problems. Such supervised approaches, however, are difficult to implement in
the medical domain where large volumes of labelled data are difficult to obtain
due to the complexity of manual annotation and inter- and intra-observer
variability in label assignment. We propose a new convolutional sparse kernel
network (CSKN), which is a hierarchical unsupervised feature learning framework
that addresses the challenge of learning representative visual features in
medical image analysis domains where there is a lack of annotated training
data. Our framework has three contributions: (i) We extend kernel learning to
identify and represent invariant features across image sub-patches in an
unsupervised manner. (ii) We initialise our kernel learning with a layer-wise
pre-training scheme that leverages the sparsity inherent in medical images to
extract initial discriminative features. (iii) We adapt a multi-scale spatial
pyramid pooling (SPP) framework to capture subtle geometric differences between
learned visual features. We evaluated our framework in medical image retrieval
and classification on three public datasets. Our results show that our CSKN had
better accuracy when compared to other conventional unsupervised methods and
comparable accuracy to methods that used state-of-the-art supervised
convolutional neural networks (CNNs). Our findings indicate that our
unsupervised CSKN provides an opportunity to leverage unannotated big data in
medical imaging repositories.Comment: Accepted by Medical Image Analysis (with a new title 'Convolutional
Sparse Kernel Network for Unsupervised Medical Image Analysis'). The
manuscript is available from following link
(https://doi.org/10.1016/j.media.2019.06.005
Pre and Post-hoc Diagnosis and Interpretation of Malignancy from Breast DCE-MRI
We propose a new method for breast cancer screening from DCE-MRI based on a
post-hoc approach that is trained using weakly annotated data (i.e., labels are
available only at the image level without any lesion delineation). Our proposed
post-hoc method automatically diagnosis the whole volume and, for positive
cases, it localizes the malignant lesions that led to such diagnosis.
Conversely, traditional approaches follow a pre-hoc approach that initially
localises suspicious areas that are subsequently classified to establish the
breast malignancy -- this approach is trained using strongly annotated data
(i.e., it needs a delineation and classification of all lesions in an image).
Another goal of this paper is to establish the advantages and disadvantages of
both approaches when applied to breast screening from DCE-MRI. Relying on
experiments on a breast DCE-MRI dataset that contains scans of 117 patients,
our results show that the post-hoc method is more accurate for diagnosing the
whole volume per patient, achieving an AUC of 0.91, while the pre-hoc method
achieves an AUC of 0.81. However, the performance for localising the malignant
lesions remains challenging for the post-hoc method due to the weakly labelled
dataset employed during training.Comment: Submitted to Medical Image Analysi
Personal life event detection from social media
Creating video clips out of personal content from social media is on the rise. MuseumOfMe, Facebook Lookback, and Google Awesome are some popular examples. One core challenge to the creation of such life summaries is the identification of personal events, and their time frame. Such videos can greatly benefit from automatically distinguishing between social media content that is about someone's own wedding from that week, to an old wedding, or to that of a friend. In this paper, we describe our approach for identifying a number of common personal life events from social media content (in this paper we have used Twitter for our test), using multiple feature-based classifiers. Results show that combination of linguistic and social interaction features increases overall classification accuracy of most of the events while some events are relatively more difficult than others (e.g. new born with mean precision of .6 from all three models)
Artificial intelligence and computer-aided diagnosis in colonoscopy: current evidence and future directions
Computer-aided diagnosis offers a promising solution to reduce variation in colonoscopy performance. Pooled miss rates for polyps are as high as 22%, and associated interval colorectal cancers after colonoscopy are of concern. Optical biopsy, whereby in-vivo classification of polyps based on enhanced imaging replaces histopathology, has not been incorporated into routine practice because it is limited by interobserver variability and generally only meets accepted standards in expert settings. Real-time decision-support software has been developed to detect and characterise polyps, and also to offer feedback on the technical quality of inspection. Some of the current algorithms, particularly with recent advances in artificial intelligence techniques, match human expert performance for optical biopsy. In this Review, we summarise the evidence for clinical applications of computer-aided diagnosis and artificial intelligence in colonoscopy
- …