12,182 research outputs found
Zoom-in-Net: Deep Mining Lesions for Diabetic Retinopathy Detection
We propose a convolution neural network based algorithm for simultaneously
diagnosing diabetic retinopathy and highlighting suspicious regions. Our
contributions are two folds: 1) a network termed Zoom-in-Net which mimics the
zoom-in process of a clinician to examine the retinal images. Trained with only
image-level supervisions, Zoomin-Net can generate attention maps which
highlight suspicious regions, and predicts the disease level accurately based
on both the whole image and its high resolution suspicious patches. 2) Only
four bounding boxes generated from the automatically learned attention maps are
enough to cover 80% of the lesions labeled by an experienced ophthalmologist,
which shows good localization ability of the attention maps. By clustering
features at high response locations on the attention maps, we discover
meaningful clusters which contain potential lesions in diabetic retinopathy.
Experiments show that our algorithm outperform the state-of-the-art methods on
two datasets, EyePACS and Messidor.Comment: accepted by MICCAI 201
LSTM Networks for Data-Aware Remaining Time Prediction of Business Process Instances
Predicting the completion time of business process instances would be a very
helpful aid when managing processes under service level agreement constraints.
The ability to know in advance the trend of running process instances would
allow business managers to react in time, in order to prevent delays or
undesirable situations. However, making such accurate forecasts is not easy:
many factors may influence the required time to complete a process instance. In
this paper, we propose an approach based on deep Recurrent Neural Networks
(specifically LSTMs) that is able to exploit arbitrary information associated
to single events, in order to produce an as-accurate-as-possible prediction of
the completion time of running instances. Experiments on real-world datasets
confirm the quality of our proposal.Comment: Article accepted for publication in 2017 IEEE Symposium on Deep
Learning (IEEE DL'17) @ SSC
Learning Fast and Slow: PROPEDEUTICA for Real-time Malware Detection
In this paper, we introduce and evaluate PROPEDEUTICA, a novel methodology
and framework for efficient and effective real-time malware detection,
leveraging the best of conventional machine learning (ML) and deep learning
(DL) algorithms. In PROPEDEUTICA, all software processes in the system start
execution subjected to a conventional ML detector for fast classification. If a
piece of software receives a borderline classification, it is subjected to
further analysis via more performance expensive and more accurate DL methods,
via our newly proposed DL algorithm DEEPMALWARE. Further, we introduce delays
to the execution of software subjected to deep learning analysis as a way to
"buy time" for DL analysis and to rate-limit the impact of possible malware in
the system. We evaluated PROPEDEUTICA with a set of 9,115 malware samples and
877 commonly used benign software samples from various categories for the
Windows OS. Our results show that the false positive rate for conventional ML
methods can reach 20%, and for modern DL methods it is usually below 6%.
However, the classification time for DL can be 100X longer than conventional ML
methods. PROPEDEUTICA improved the detection F1-score from 77.54% (conventional
ML method) to 90.25%, and reduced the detection time by 54.86%. Further, the
percentage of software subjected to DL analysis was approximately 40% on
average. Further, the application of delays in software subjected to ML reduced
the detection time by approximately 10%. Finally, we found and discussed a
discrepancy between the detection accuracy offline (analysis after all traces
are collected) and on-the-fly (analysis in tandem with trace collection). Our
insights show that conventional ML and modern DL-based malware detectors in
isolation cannot meet the needs of efficient and effective malware detection:
high accuracy, low false positive rate, and short classification time.Comment: 17 pages, 7 figure
Highly comparative feature-based time-series classification
A highly comparative, feature-based approach to time series classification is
introduced that uses an extensive database of algorithms to extract thousands
of interpretable features from time series. These features are derived from
across the scientific time-series analysis literature, and include summaries of
time series in terms of their correlation structure, distribution, entropy,
stationarity, scaling properties, and fits to a range of time-series models.
After computing thousands of features for each time series in a training set,
those that are most informative of the class structure are selected using
greedy forward feature selection with a linear classifier. The resulting
feature-based classifiers automatically learn the differences between classes
using a reduced number of time-series properties, and circumvent the need to
calculate distances between time series. Representing time series in this way
results in orders of magnitude of dimensionality reduction, allowing the method
to perform well on very large datasets containing long time series or time
series of different lengths. For many of the datasets studied, classification
performance exceeded that of conventional instance-based classifiers, including
one nearest neighbor classifiers using Euclidean distances and dynamic time
warping and, most importantly, the features selected provide an understanding
of the properties of the dataset, insight that can guide further scientific
investigation
- …