5,742 research outputs found
A Method of Protein Model Classification and Retrieval Using Bag-of-Visual-Features
In this paper we propose a novel visual method for protein model classification and retrieval. Different from the conventional methods, the key idea of the proposed method is to extract image features of proteins and measure the visual similarity between proteins. Firstly, the multiview images are captured by vertices and planes of a given octahedron surrounding the protein. Secondly, the local features are extracted from each image of the different views by the SURF algorithm and are vector quantized into visual words using a visual codebook. Finally, KLD is employed to calculate the similarity distance between two feature vectors. Experimental results show that the proposed method has encouraging performances for protein retrieval and categorization as shown in the comparison with other methods
Supervised cross-modal factor analysis for multiple modal data classification
In this paper we study the problem of learning from multiple modal data for
purpose of document classification. In this problem, each document is composed
two different modals of data, i.e., an image and a text. Cross-modal factor
analysis (CFA) has been proposed to project the two different modals of data to
a shared data space, so that the classification of a image or a text can be
performed directly in this space. A disadvantage of CFA is that it has ignored
the supervision information. In this paper, we improve CFA by incorporating the
supervision information to represent and classify both image and text modals of
documents. We project both image and text data to a shared data space by factor
analysis, and then train a class label predictor in the shared space to use the
class label information. The factor analysis parameter and the predictor
parameter are learned jointly by solving one single objective function. With
this objective function, we minimize the distance between the projections of
image and text of the same document, and the classification error of the
projection measured by hinge loss function. The objective function is optimized
by an alternate optimization strategy in an iterative algorithm. Experiments in
two different multiple modal document data sets show the advantage of the
proposed algorithm over other CFA methods
Multiple Instance Learning: A Survey of Problem Characteristics and Applications
Multiple instance learning (MIL) is a form of weakly supervised learning
where training instances are arranged in sets, called bags, and a label is
provided for the entire bag. This formulation is gaining interest because it
naturally fits various problems and allows to leverage weakly labeled data.
Consequently, it has been used in diverse application fields such as computer
vision and document classification. However, learning from bags raises
important challenges that are unique to MIL. This paper provides a
comprehensive survey of the characteristics which define and differentiate the
types of MIL problems. Until now, these problem characteristics have not been
formally identified and described. As a result, the variations in performance
of MIL algorithms from one data set to another are difficult to explain. In
this paper, MIL problem characteristics are grouped into four broad categories:
the composition of the bags, the types of data distribution, the ambiguity of
instance labels, and the task to be performed. Methods specialized to address
each category are reviewed. Then, the extent to which these characteristics
manifest themselves in key MIL application areas are described. Finally,
experiments are conducted to compare the performance of 16 state-of-the-art MIL
methods on selected problem characteristics. This paper provides insight on how
the problem characteristics affect MIL algorithms, recommendations for future
benchmarking and promising avenues for research
Adaptive Graph via Multiple Kernel Learning for Nonnegative Matrix Factorization
Nonnegative Matrix Factorization (NMF) has been continuously evolving in
several areas like pattern recognition and information retrieval methods. It
factorizes a matrix into a product of 2 low-rank non-negative matrices that
will define parts-based, and linear representation of nonnegative data.
Recently, Graph regularized NMF (GrNMF) is proposed to find a compact
representation,which uncovers the hidden semantics and simultaneously respects
the intrinsic geometric structure. In GNMF, an affinity graph is constructed
from the original data space to encode the geometrical information. In this
paper, we propose a novel idea which engages a Multiple Kernel Learning
approach into refining the graph structure that reflects the factorization of
the matrix and the new data space. The GrNMF is improved by utilizing the graph
refined by the kernel learning, and then a novel kernel learning method is
introduced under the GrNMF framework. Our approach shows encouraging results of
the proposed algorithm in comparison to the state-of-the-art clustering
algorithms like NMF, GrNMF, SVD etc.Comment: This paper has been withdrawn by the author due to the terrible
writin
On Classification with Bags, Groups and Sets
Many classification problems can be difficult to formulate directly in terms
of the traditional supervised setting, where both training and test samples are
individual feature vectors. There are cases in which samples are better
described by sets of feature vectors, that labels are only available for sets
rather than individual samples, or, if individual labels are available, that
these are not independent. To better deal with such problems, several
extensions of supervised learning have been proposed, where either training
and/or test objects are sets of feature vectors. However, having been proposed
rather independently of each other, their mutual similarities and differences
have hitherto not been mapped out. In this work, we provide an overview of such
learning scenarios, propose a taxonomy to illustrate the relationships between
them, and discuss directions for further research in these areas
- …