10,565 research outputs found
Weakly supervised deep learning for the detection of domain generation algorithms
Domain generation algorithms (DGAs) have become commonplace in malware that seeks to establish command and control communication between an infected machine and the botmaster. DGAs dynamically and consistently generate large volumes of malicious domain names, only a few of which are registered by the botmaster, within a short time window around their generation time, and subsequently resolved when the malware on the infected machine tries to access them. Deep neural networks that can classify domain names as benign or malicious are of great interest in the real-time defense against DGAs. In contrast with traditional machine learning models, deep networks do not rely on human engineered features. Instead, they can learn features automatically from data, provided that they are supplied with sufficiently large amounts of suitable training data. Obtaining cleanly labeled ground truth data is difficult and time consuming. Heuristically labeled data could potentially provide a source of training data for weakly supervised training of DGA detectors. We propose a set of heuristics for automatically labeling domain names monitored in real traffic, and then train and evaluate classifiers with the proposed heuristically labeled dataset. We show through experiments on a dataset with 50 million domain names that such heuristically labeled data is very useful in practice to improve the predictive accuracy of deep learning-based DGA classifiers, and that these deep neural networks significantly outperform a random forest classifier with human engineered features
COCO_TS Dataset: Pixel-level Annotations Based on Weak Supervision for Scene Text Segmentation
The absence of large scale datasets with pixel-level supervisions is a
significant obstacle for the training of deep convolutional networks for scene
text segmentation. For this reason, synthetic data generation is normally
employed to enlarge the training dataset. Nonetheless, synthetic data cannot
reproduce the complexity and variability of natural images. In this paper, a
weakly supervised learning approach is used to reduce the shift between
training on real and synthetic data. Pixel-level supervisions for a text
detection dataset (i.e. where only bounding-box annotations are available) are
generated. In particular, the COCO-Text-Segmentation (COCO_TS) dataset, which
provides pixel-level supervisions for the COCO-Text dataset, is created and
released. The generated annotations are used to train a deep convolutional
neural network for semantic segmentation. Experiments show that the proposed
dataset can be used instead of synthetic data, allowing us to use only a
fraction of the training samples and significantly improving the performances
Deep learning in remote sensing: a review
Standing at the paradigm shift towards data-intensive science, machine
learning techniques are becoming increasingly important. In particular, as a
major breakthrough in the field, deep learning has proven as an extremely
powerful tool in many fields. Shall we embrace deep learning as the key to all?
Or, should we resist a 'black-box' solution? There are controversial opinions
in the remote sensing community. In this article, we analyze the challenges of
using deep learning for remote sensing data analysis, review the recent
advances, and provide resources to make deep learning in remote sensing
ridiculously simple to start with. More importantly, we advocate remote sensing
scientists to bring their expertise into deep learning, and use it as an
implicit general model to tackle unprecedented large-scale influential
challenges, such as climate change and urbanization.Comment: Accepted for publication IEEE Geoscience and Remote Sensing Magazin
Weakly-supervised localization of diabetic retinopathy lesions in retinal fundus images
Convolutional neural networks (CNNs) show impressive performance for image
classification and detection, extending heavily to the medical image domain.
Nevertheless, medical experts are sceptical in these predictions as the
nonlinear multilayer structure resulting in a classification outcome is not
directly graspable. Recently, approaches have been shown which help the user to
understand the discriminative regions within an image which are decisive for
the CNN to conclude to a certain class. Although these approaches could help to
build trust in the CNNs predictions, they are only slightly shown to work with
medical image data which often poses a challenge as the decision for a class
relies on different lesion areas scattered around the entire image. Using the
DiaretDB1 dataset, we show that on retina images different lesion areas
fundamental for diabetic retinopathy are detected on an image level with high
accuracy, comparable or exceeding supervised methods. On lesion level, we
achieve few false positives with high sensitivity, though, the network is
solely trained on image-level labels which do not include information about
existing lesions. Classifying between diseased and healthy images, we achieve
an AUC of 0.954 on the DiaretDB1.Comment: Accepted in Proc. IEEE International Conference on Image Processing
(ICIP), 201
- …