821 research outputs found
A cDNA Microarray Gene Expression Data Classifier for Clinical Diagnostics Based on Graph Theory
Despite great advances in discovering cancer molecular profiles, the proper application of microarray technology to routine clinical diagnostics is still a challenge. Current practices in the classification of microarrays' data show two main limitations: the reliability of the training data sets used to build the classifiers, and the classifiers' performances, especially when the sample to be classified does not belong to any of the available classes. In this case, state-of-the-art algorithms usually produce a high rate of false positives that, in real diagnostic applications, are unacceptable. To address this problem, this paper presents a new cDNA microarray data classification algorithm based on graph theory and is able to overcome most of the limitations of known classification methodologies. The classifier works by analyzing gene expression data organized in an innovative data structure based on graphs, where vertices correspond to genes and edges to gene expression relationships. To demonstrate the novelty of the proposed approach, the authors present an experimental performance comparison between the proposed classifier and several state-of-the-art classification algorithm
On the use of reproducing kernel Hilbert spaces in functional classification
The H\'ajek-Feldman dichotomy establishes that two Gaussian measures are
either mutually absolutely continuous with respect to each other (and hence
there is a Radon-Nikodym density for each measure with respect to the other
one) or mutually singular. Unlike the case of finite dimensional Gaussian
measures, there are non-trivial examples of both situations when dealing with
Gaussian stochastic processes. This paper provides:
(a) Explicit expressions for the optimal (Bayes) rule and the minimal
classification error probability in several relevant problems of supervised
binary classification of mutually absolutely continuous Gaussian processes. The
approach relies on some classical results in the theory of Reproducing Kernel
Hilbert Spaces (RKHS).
(b) An interpretation, in terms of mutual singularity, for the "near perfect
classification" phenomenon described by Delaigle and Hall (2012). We show that
the asymptotically optimal rule proposed by these authors can be identified
with the sequence of optimal rules for an approximating sequence of
classification problems in the absolutely continuous case.
(c) A new model-based method for variable selection in binary classification
problems, which arises in a very natural way from the explicit knowledge of the
RN-derivatives and the underlying RKHS structure. Different classifiers might
be used from the selected variables. In particular, the classical, linear
finite-dimensional Fisher rule turns out to be consistent under some standard
conditions on the underlying functional model
Ensemble of one-class classifiers for personal risk detection based on wearable sensor data
This study introduces the One-Class K-means with Randomly-projected features Algorithm (OCKRA). OCKRA is an ensemble of one-class classifiers built over multiple projections of a dataset according to random feature subsets. Algorithms found in the literature spread over a wide range of applications where ensembles of one-class classifiers have been satisfactorily applied; however, none is oriented to the area under our study: personal risk detection. OCKRA has been designed with the aim of improving the detection performance in the problem posed by the Personal RIsk DEtection(PRIDE) dataset. PRIDE was built based on 23 test subjects, where the data for each user were captured using a set of sensors embedded in a wearable band. The performance of OCKRA was compared against support vector machine and three versions of the Parzen window classifier. On average, experimental results show that OCKRA outperformed the other classifiers for at least 0.53% of the area under the curve (AUC). In addition, OCKRA achieved an AUC above 90% for more than 57% of the users
A Graph-Based Semi-Supervised k Nearest-Neighbor Method for Nonlinear Manifold Distributed Data Classification
Nearest Neighbors (NN) is one of the most widely used supervised
learning algorithms to classify Gaussian distributed data, but it does not
achieve good results when it is applied to nonlinear manifold distributed data,
especially when a very limited amount of labeled samples are available. In this
paper, we propose a new graph-based NN algorithm which can effectively
handle both Gaussian distributed data and nonlinear manifold distributed data.
To achieve this goal, we first propose a constrained Tired Random Walk (TRW) by
constructing an -level nearest-neighbor strengthened tree over the graph,
and then compute a TRW matrix for similarity measurement purposes. After this,
the nearest neighbors are identified according to the TRW matrix and the class
label of a query point is determined by the sum of all the TRW weights of its
nearest neighbors. To deal with online situations, we also propose a new
algorithm to handle sequential samples based a local neighborhood
reconstruction. Comparison experiments are conducted on both synthetic data
sets and real-world data sets to demonstrate the validity of the proposed new
NN algorithm and its improvements to other version of NN algorithms.
Given the widespread appearance of manifold structures in real-world problems
and the popularity of the traditional NN algorithm, the proposed manifold
version NN shows promising potential for classifying manifold-distributed
data.Comment: 32 pages, 12 figures, 7 table
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
Entropy-based active learning for object recognition
Most methods for learning object categories require large amounts of labeled training data. However, obtaining such data can be a difficult and time-consuming endeavor. We have developed a novel, entropy-based ldquoactive learningrdquo approach which makes significant progress towards this problem. The main idea is to sequentially acquire labeled data by presenting an oracle (the user) with unlabeled images that will be particularly informative when labeled. Active learning adaptively prioritizes the order in which the training examples are acquired, which, as shown by our experiments, can significantly reduce the overall number of training examples required to reach near-optimal performance. At first glance this may seem counter-intuitive: how can the algorithm know whether a group of unlabeled images will be informative, when, by definition, there is no label directly associated with any of the images? Our approach is based on choosing an image to label that maximizes the expected amount of information we gain about the set of unlabeled images. The technique is demonstrated in several contexts, including improving the efficiency of Web image-search queries and open-world visual learning by an autonomous agent. Experiments on a large set of 140 visual object categories taken directly from text-based Web image searches show that our technique can provide large improvements (up to 10 x reduction in the number of training examples needed) over baseline techniques
Multi-Class Classification for Identifying JPEG Steganography Embedding Methods
Over 725 steganography tools are available over the Internet, each providing a method for covert transmission of secret messages. This research presents four steganalysis advancements that result in an algorithm that identifies the steganalysis tool used to embed a secret message in a JPEG image file. The algorithm includes feature generation, feature preprocessing, multi-class classification and classifier fusion. The first contribution is a new feature generation method which is based on the decomposition of discrete cosine transform (DCT) coefficients used in the JPEG image encoder. The generated features are better suited to identifying discrepancies in each area of the decomposed DCT coefficients. Second, the classification accuracy is further improved with the development of a feature ranking technique in the preprocessing stage for the kernel Fisher s discriminant (KFD) and support vector machines (SVM) classifiers in the kernel space during the training process. Third, for the KFD and SVM two-class classifiers a classification tree is designed from the kernel space to provide a multi-class classification solution for both methods. Fourth, by analyzing a set of classifiers, signature detectors, and multi-class classification methods a classifier fusion system is developed to increase the detection accuracy of identifying the embedding method used in generating the steganography images. Based on classifying stego images created from research and commercial JPEG steganography techniques, F5, JP Hide, JSteg, Model-based, Model-based Version 1.2, OutGuess, Steganos, StegHide and UTSA embedding methods, the performance of the system shows a statistically significant increase in classification accuracy of 5%. In addition, this system provides a solution for identifying steganographic fingerprints as well as the ability to include future multi-class classification tools
Fundamental remote sensing science research program. Part 1: Status report of the mathematical pattern recognition and image analysis project
The Mathematical Pattern Recognition and Image Analysis (MPRIA) Project is concerned with basic research problems related to the study of the Earth from remotely sensed measurement of its surface characteristics. The program goal is to better understand how to analyze the digital image that represents the spatial, spectral, and temporal arrangement of these measurements for purposing of making selected inference about the Earth
- …