10,455 research outputs found
Active Mining of Parallel Video Streams
The practicality of a video surveillance system is adversely limited by the
amount of queries that can be placed on human resources and their vigilance in
response. To transcend this limitation, a major effort under way is to include
software that (fully or at least semi) automatically mines video footage,
reducing the burden imposed to the system. Herein, we propose a semi-supervised
incremental learning framework for evolving visual streams in order to develop
a robust and flexible track classification system. Our proposed method learns
from consecutive batches by updating an ensemble in each time. It tries to
strike a balance between performance of the system and amount of data which
needs to be labelled. As no restriction is considered, the system can address
many practical problems in an evolving multi-camera scenario, such as concept
drift, class evolution and various length of video streams which have not been
addressed before. Experiments were performed on synthetic as well as real-world
visual data in non-stationary environments, showing high accuracy with fairly
little human collaboration
Machine learning based hyperspectral image analysis: A survey
Hyperspectral sensors enable the study of the chemical properties of scene
materials remotely for the purpose of identification, detection, and chemical
composition analysis of objects in the environment. Hence, hyperspectral images
captured from earth observing satellites and aircraft have been increasingly
important in agriculture, environmental monitoring, urban planning, mining, and
defense. Machine learning algorithms due to their outstanding predictive power
have become a key tool for modern hyperspectral image analysis. Therefore, a
solid understanding of machine learning techniques have become essential for
remote sensing researchers and practitioners. This paper reviews and compares
recent machine learning-based hyperspectral image analysis methods published in
literature. We organize the methods by the image analysis task and by the type
of machine learning algorithm, and present a two-way mapping between the image
analysis tasks and the types of machine learning algorithms that can be applied
to them. The paper is comprehensive in coverage of both hyperspectral image
analysis tasks and machine learning algorithms. The image analysis tasks
considered are land cover classification, target detection, unmixing, and
physical parameter estimation. The machine learning algorithms covered are
Gaussian models, linear regression, logistic regression, support vector
machines, Gaussian mixture model, latent linear models, sparse linear models,
Gaussian mixture models, ensemble learning, directed graphical models,
undirected graphical models, clustering, Gaussian processes, Dirichlet
processes, and deep learning. We also discuss the open challenges in the field
of hyperspectral image analysis and explore possible future directions
Diversity in Machine Learning
Machine learning methods have achieved good performance and been widely
applied in various real-world applications. They can learn the model adaptively
and be better fit for special requirements of different tasks. Generally, a
good machine learning system is composed of plentiful training data, a good
model training process, and an accurate inference. Many factors can affect the
performance of the machine learning process, among which the diversity of the
machine learning process is an important one. The diversity can help each
procedure to guarantee a total good machine learning: diversity of the training
data ensures that the training data can provide more discriminative information
for the model, diversity of the learned model (diversity in parameters of each
model or diversity among different base models) makes each parameter/model
capture unique or complement information and the diversity in inference can
provide multiple choices each of which corresponds to a specific plausible
local optimal result. Even though the diversity plays an important role in
machine learning process, there is no systematical analysis of the
diversification in machine learning system. In this paper, we systematically
summarize the methods to make data diversification, model diversification, and
inference diversification in the machine learning process, respectively. In
addition, the typical applications where the diversity technology improved the
machine learning performance have been surveyed, including the remote sensing
imaging tasks, machine translation, camera relocalization, image segmentation,
object detection, topic modeling, and others. Finally, we discuss some
challenges of the diversity technology in machine learning and point out some
directions in future work.Comment: Accepted by IEEE Acces
Radiological images and machine learning: trends, perspectives, and prospects
The application of machine learning to radiological images is an increasingly
active research area that is expected to grow in the next five to ten years.
Recent advances in machine learning have the potential to recognize and
classify complex patterns from different radiological imaging modalities such
as x-rays, computed tomography, magnetic resonance imaging and positron
emission tomography imaging. In many applications, machine learning based
systems have shown comparable performance to human decision-making. The
applications of machine learning are the key ingredients of future clinical
decision making and monitoring systems. This review covers the fundamental
concepts behind various machine learning techniques and their applications in
several radiological imaging areas, such as medical image segmentation, brain
function studies and neurological disease diagnosis, as well as computer-aided
systems, image registration, and content-based image retrieval systems.
Synchronistically, we will briefly discuss current challenges and future
directions regarding the application of machine learning in radiological
imaging. By giving insight on how take advantage of machine learning powered
applications, we expect that clinicians can prevent and diagnose diseases more
accurately and efficiently.Comment: 13 figure
Unsupervised High-level Feature Learning by Ensemble Projection for Semi-supervised Image Classification and Image Clustering
This paper investigates the problem of image classification with limited or
no annotations, but abundant unlabeled data. The setting exists in many tasks
such as semi-supervised image classification, image clustering, and image
retrieval. Unlike previous methods, which develop or learn sophisticated
regularizers for classifiers, our method learns a new image representation by
exploiting the distribution patterns of all available data for the task at
hand. Particularly, a rich set of visual prototypes are sampled from all
available data, and are taken as surrogate classes to train discriminative
classifiers; images are projected via the classifiers; the projected values,
similarities to the prototypes, are stacked to build the new feature vector.
The training set is noisy. Hence, in the spirit of ensemble learning we create
a set of such training sets which are all diverse, leading to diverse
classifiers. The method is dubbed Ensemble Projection (EP). EP captures not
only the characteristics of individual images, but also the relationships among
images. It is conceptually simple and computationally efficient, yet effective
and flexible. Experiments on eight standard datasets show that: (1) EP
outperforms previous methods for semi-supervised image classification; (2) EP
produces promising results for self-taught image classification, where
unlabeled samples are a random collection of images rather than being from the
same distribution as the labeled ones; and (3) EP improves over the original
features for image clustering. The code of the method is available on the
project page.Comment: 22 pages, 8 figure
A Bayesian Perspective of Statistical Machine Learning for Big Data
Statistical Machine Learning (SML) refers to a body of algorithms and methods
by which computers are allowed to discover important features of input data
sets which are often very large in size. The very task of feature discovery
from data is essentially the meaning of the keyword `learning' in SML.
Theoretical justifications for the effectiveness of the SML algorithms are
underpinned by sound principles from different disciplines, such as Computer
Science and Statistics. The theoretical underpinnings particularly justified by
statistical inference methods are together termed as statistical learning
theory.
This paper provides a review of SML from a Bayesian decision theoretic point
of view -- where we argue that many SML techniques are closely connected to
making inference by using the so called Bayesian paradigm. We discuss many
important SML techniques such as supervised and unsupervised learning, deep
learning, online learning and Gaussian processes especially in the context of
very large data sets where these are often employed. We present a dictionary
which maps the key concepts of SML from Computer Science and Statistics. We
illustrate the SML techniques with three moderately large data sets where we
also discuss many practical implementation issues. Thus the review is
especially targeted at statisticians and computer scientists who are aspiring
to understand and apply SML for moderately large to big data sets.Comment: 26 pages, 3 figures, Review pape
ITCM: A Real Time Internet Traffic Classifier Monitor
The continual growth of high speed networks is a challenge for real-time
network analysis systems. The real time traffic classification is an issue for
corporations and ISPs (Internet Service Providers). This work presents the
design and implementation of a real time flow-based network traffic
classification system. The classifier monitor acts as a pipeline consisting of
three modules: packet capture and pre-processing, flow reassembly, and
classification with Machine Learning (ML). The modules are built as concurrent
processes with well defined data interfaces between them so that any module can
be improved and updated independently. In this pipeline, the flow reassembly
function becomes the bottleneck of the performance. In this implementation, was
used a efficient method of reassembly which results in a average delivery delay
of 0.49 seconds, approximately. For the classification module, the performances
of the K-Nearest Neighbor (KNN), C4.5 Decision Tree, Naive Bayes (NB), Flexible
Naive Bayes (FNB) and AdaBoost Ensemble Learning Algorithm are compared in
order to validate our approach.Comment: 16 pages, 3 figures, 7 tables, International Journal of Computer
Science & Information Technology (IJCSIT) Vol 6, No 6, December 201
Learning Representations for Outlier Detection on a Budget
The problem of detecting a small number of outliers in a large dataset is an
important task in many fields from fraud detection to high-energy physics. Two
approaches have emerged to tackle this problem: unsupervised and supervised.
Supervised approaches require a sufficient amount of labeled data and are
challenged by novel types of outliers and inherent class imbalance, whereas
unsupervised methods do not take advantage of available labeled training
examples and often exhibit poorer predictive performance. We propose BORE (a
Bagged Outlier Representation Ensemble) which uses unsupervised outlier scoring
functions (OSFs) as features in a supervised learning framework. BORE is able
to adapt to arbitrary OSF feature representations, to the imbalance in labeled
data as well as to prediction-time constraints on computational cost. We
demonstrate the good performance of BORE compared to a variety of competing
methods in the non-budgeted and the budgeted outlier detection problem on 12
real-world datasets
Structure fusion based on graph convolutional networks for semi-supervised classification
Suffering from the multi-view data diversity and complexity for
semi-supervised classification, most of existing graph convolutional networks
focus on the networks architecture construction or the salient graph structure
preservation, and ignore the the complete graph structure for semi-supervised
classification contribution. To mine the more complete distribution structure
from multi-view data with the consideration of the specificity and the
commonality, we propose structure fusion based on graph convolutional networks
(SF-GCN) for improving the performance of semi-supervised classification.
SF-GCN can not only retain the special characteristic of each view data by
spectral embedding, but also capture the common style of multi-view data by
distance metric between multi-graph structures. Suppose the linear relationship
between multi-graph structures, we can construct the optimization function of
structure fusion model by balancing the specificity loss and the commonality
loss. By solving this function, we can simultaneously obtain the fusion
spectral embedding from the multi-view data and the fusion structure as
adjacent matrix to input graph convolutional networks for semi-supervised
classification. Experiments demonstrate that the performance of SF-GCN
outperforms that of the state of the arts on three challenging datasets, which
are Cora,Citeseer and Pubmed in citation networks
Variational Adversarial Active Learning
Active learning aims to develop label-efficient algorithms by sampling the
most representative queries to be labeled by an oracle. We describe a
pool-based semi-supervised active learning algorithm that implicitly learns
this sampling mechanism in an adversarial manner. Unlike conventional active
learning algorithms, our approach is task agnostic, i.e., it does not depend on
the performance of the task for which we are trying to acquire labeled data.
Our method learns a latent space using a variational autoencoder (VAE) and an
adversarial network trained to discriminate between unlabeled and labeled data.
The mini-max game between the VAE and the adversarial network is played such
that while the VAE tries to trick the adversarial network into predicting that
all data points are from the labeled pool, the adversarial network learns how
to discriminate between dissimilarities in the latent space. We extensively
evaluate our method on various image classification and semantic segmentation
benchmark datasets and establish a new state of the art on
, , ,
, and . Our results demonstrate that our
adversarial approach learns an effective low dimensional latent space in
large-scale settings and provides for a computationally efficient sampling
method. Our code is available at https://github.com/sinhasam/vaal.Comment: First two authors contributed equally, listed alphabetically.
Accepted as Oral at ICCV 201
- …