9,952 research outputs found
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
Histopathological image analysis : a review
Over the past decade, dramatic increases in computational power and improvement in image analysis algorithms have allowed the development of powerful computer-assisted analytical approaches to radiological data. With the recent advent of whole slide digital scanners, tissue histopathology slides can now be digitized and stored in digital image form. Consequently, digitized tissue histopathology has now become amenable to the application of computerized image analysis and machine learning techniques. Analogous to the role of computer-assisted diagnosis (CAD) algorithms in medical imaging to complement the opinion of a radiologist, CAD algorithms have begun to be developed for disease detection, diagnosis, and prognosis prediction to complement the opinion of the pathologist. In this paper, we review the recent state of the art CAD technology for digitized histopathology. This paper also briefly describes the development and application of novel image analysis technology for a few specific histopathology related problems being pursued in the United States and Europe
Efficient Clustering on Riemannian Manifolds: A Kernelised Random Projection Approach
Reformulating computer vision problems over Riemannian manifolds has
demonstrated superior performance in various computer vision applications. This
is because visual data often forms a special structure lying on a lower
dimensional space embedded in a higher dimensional space. However, since these
manifolds belong to non-Euclidean topological spaces, exploiting their
structures is computationally expensive, especially when one considers the
clustering analysis of massive amounts of data. To this end, we propose an
efficient framework to address the clustering problem on Riemannian manifolds.
This framework implements random projections for manifold points via kernel
space, which can preserve the geometric structure of the original space, but is
computationally efficient. Here, we introduce three methods that follow our
framework. We then validate our framework on several computer vision
applications by comparing against popular clustering methods on Riemannian
manifolds. Experimental results demonstrate that our framework maintains the
performance of the clustering whilst massively reducing computational
complexity by over two orders of magnitude in some cases
Hyperspectral Unmixing Overview: Geometrical, Statistical, and Sparse Regression-Based Approaches
Imaging spectrometers measure electromagnetic energy scattered in their
instantaneous field view in hundreds or thousands of spectral channels with
higher spectral resolution than multispectral cameras. Imaging spectrometers
are therefore often referred to as hyperspectral cameras (HSCs). Higher
spectral resolution enables material identification via spectroscopic analysis,
which facilitates countless applications that require identifying materials in
scenarios unsuitable for classical spectroscopic analysis. Due to low spatial
resolution of HSCs, microscopic material mixing, and multiple scattering,
spectra measured by HSCs are mixtures of spectra of materials in a scene. Thus,
accurate estimation requires unmixing. Pixels are assumed to be mixtures of a
few materials, called endmembers. Unmixing involves estimating all or some of:
the number of endmembers, their spectral signatures, and their abundances at
each pixel. Unmixing is a challenging, ill-posed inverse problem because of
model inaccuracies, observation noise, environmental conditions, endmember
variability, and data set size. Researchers have devised and investigated many
models searching for robust, stable, tractable, and accurate unmixing
algorithms. This paper presents an overview of unmixing methods from the time
of Keshava and Mustard's unmixing tutorial [1] to the present. Mixing models
are first discussed. Signal-subspace, geometrical, statistical, sparsity-based,
and spatial-contextual unmixing algorithms are described. Mathematical problems
and potential solutions are described. Algorithm characteristics are
illustrated experimentally.Comment: This work has been accepted for publication in IEEE Journal of
Selected Topics in Applied Earth Observations and Remote Sensin
A survey on the application of deep learning for code injection detection
Abstract Code injection is one of the top cyber security attack vectors in the modern world. To overcome the limitations of conventional signature-based detection techniques, and to complement them when appropriate, multiple machine learning approaches have been proposed. While analysing these approaches, the surveys focus predominantly on the general intrusion detection, which can be further applied to specific vulnerabilities. In addition, among the machine learning steps, data preprocessing, being highly critical in the data analysis process, appears to be the least researched in the context of Network Intrusion Detection, namely in code injection. The goal of this survey is to fill in the gap through analysing and classifying the existing machine learning techniques applied to the code injection attack detection, with special attention to Deep Learning. Our analysis reveals that the way the input data is preprocessed considerably impacts the performance and attack detection rate. The proposed full preprocessing cycle demonstrates how various machine-learning-based approaches for detection of code injection attacks take advantage of different input data preprocessing techniques. The most used machine learning methods and preprocessing stages have been also identified
- …