23,242 research outputs found
Stacked Generalization Approach to Improve Prediction of Molecular Atomization Energies
Machine learning holds the promise of learning the energy functional via
examples, bypassing the need to solve complicated quantum-chemical equations
and realizing efficient computing of molecular electronic properties.Comment: 15 pages, 4 Figur
Inductive machine learning of optimal modular structures: Estimating solutions using support vector machines
Structural optimization is usually handled by iterative methods requiring repeated samples of a physics-based model, but this process can be computationally demanding. Given a set of previously optimized structures of the same topology, this paper uses inductive learning to replace this optimization process entirely by deriving a function that directly maps any given load to an optimal geometry. A support vector machine is trained to determine the optimal geometry of individual modules of a space frame structure given a specified load condition. Structures produced by learning are compared against those found by a standard gradient descent optimization, both as individual modules and then as a composite structure. The primary motivation for this is speed, and results show the process is highly efficient for cases in which similar optimizations must be performed repeatedly. The function learned by the algorithm can approximate the result of optimization very closely after sufficient training, and has also been found effective at generalizing the underlying optima to produce structures that perform better than those found by standard iterative methods
High-Speed Tracking with Kernelized Correlation Filters
The core component of most modern trackers is a discriminative classifier,
tasked with distinguishing between the target and the surrounding environment.
To cope with natural image changes, this classifier is typically trained with
translated and scaled sample patches. Such sets of samples are riddled with
redundancies -- any overlapping pixels are constrained to be the same. Based on
this simple observation, we propose an analytic model for datasets of thousands
of translated patches. By showing that the resulting data matrix is circulant,
we can diagonalize it with the Discrete Fourier Transform, reducing both
storage and computation by several orders of magnitude. Interestingly, for
linear regression our formulation is equivalent to a correlation filter, used
by some of the fastest competitive trackers. For kernel regression, however, we
derive a new Kernelized Correlation Filter (KCF), that unlike other kernel
algorithms has the exact same complexity as its linear counterpart. Building on
it, we also propose a fast multi-channel extension of linear correlation
filters, via a linear kernel, which we call Dual Correlation Filter (DCF). Both
KCF and DCF outperform top-ranking trackers such as Struck or TLD on a 50
videos benchmark, despite running at hundreds of frames-per-second, and being
implemented in a few lines of code (Algorithm 1). To encourage further
developments, our tracking framework was made open-source
Recent Advance in Content-based Image Retrieval: A Literature Survey
The explosive increase and ubiquitous accessibility of visual data on the Web
have led to the prosperity of research activity in image search or retrieval.
With the ignorance of visual content as a ranking clue, methods with text
search techniques for visual retrieval may suffer inconsistency between the
text words and visual content. Content-based image retrieval (CBIR), which
makes use of the representation of visual content to identify relevant images,
has attracted sustained attention in recent two decades. Such a problem is
challenging due to the intention gap and the semantic gap problems. Numerous
techniques have been developed for content-based image retrieval in the last
decade. The purpose of this paper is to categorize and evaluate those
algorithms proposed during the period of 2003 to 2016. We conclude with several
promising directions for future research.Comment: 22 page
Deep Learning and its Application to LHC Physics
Machine learning has played an important role in the analysis of high-energy
physics data for decades. The emergence of deep learning in 2012 allowed for
machine learning tools which could adeptly handle higher-dimensional and more
complex problems than previously feasible. This review is aimed at the reader
who is familiar with high energy physics but not machine learning. The
connections between machine learning and high energy physics data analysis are
explored, followed by an introduction to the core concepts of neural networks,
examples of the key results demonstrating the power of deep learning for
analysis of LHC data, and discussion of future prospects and concerns.Comment: Posted with permission from the Annual Review of Nuclear and Particle
Science, Volume 68. (c) 2018 by Annual Reviews, http://www.annualreviews.or
3D Shape Estimation from 2D Landmarks: A Convex Relaxation Approach
We investigate the problem of estimating the 3D shape of an object, given a
set of 2D landmarks in a single image. To alleviate the reconstruction
ambiguity, a widely-used approach is to confine the unknown 3D shape within a
shape space built upon existing shapes. While this approach has proven to be
successful in various applications, a challenging issue remains, i.e., the
joint estimation of shape parameters and camera-pose parameters requires to
solve a nonconvex optimization problem. The existing methods often adopt an
alternating minimization scheme to locally update the parameters, and
consequently the solution is sensitive to initialization. In this paper, we
propose a convex formulation to address this problem and develop an efficient
algorithm to solve the proposed convex program. We demonstrate the exact
recovery property of the proposed method, its merits compared to alternative
methods, and the applicability in human pose and car shape estimation.Comment: In Proceedings of CVPR 201
A Triangle Algorithm for Semidefinite Version of Convex Hull Membership Problem
Given a subset of , the set of
real symmetric matrices, we define its {\it spectrahull} as the
set , where is the {\it spectraplex}, . We let {\it spectrahull
membership} (SHM) to be the problem of testing if a given
lies in . On the one hand when 's are diagonal matrices,
SHM reduces to the {\it convex hull membership} (CHM), a fundamental problem in
LP. On the other hand, a bounded SDP feasibility is reducible to SHM. By
building on the {\it Triangle Algorithm} (TA) \cite{kalchar,kalsep}, developed
for CHM and its generalization, we design a TA for SHM, where given
, in iterations it either computes a
hyperplane separating from , or such that , maximum error over . Under certain conditions
iteration complexity improves to or even . The worst-case complexity of each iteration is , plus
testing the existence of a pivot, shown to be equivalent to estimating the
least eigenvalue of a symmetric matrix. This together with a semidefinite
version of Carath\'eodory theorem allow implementing TA as if solving a CHM,
resorting to the {\it power method} only as needed, thereby improving the
complexity of iterations. The proposed Triangle Algorithm for SHM is simple,
practical and applicable to general SDP feasibility and optimization. Also, it
extends to a spectral analogue of SVM for separation of two spectrahulls.Comment: 18 page
Identification of functionally related enzymes by learning-to-rank methods
Enzyme sequences and structures are routinely used in the biological sciences
as queries to search for functionally related enzymes in online databases. To
this end, one usually departs from some notion of similarity, comparing two
enzymes by looking for correspondences in their sequences, structures or
surfaces. For a given query, the search operation results in a ranking of the
enzymes in the database, from very similar to dissimilar enzymes, while
information about the biological function of annotated database enzymes is
ignored.
In this work we show that rankings of that kind can be substantially improved
by applying kernel-based learning algorithms. This approach enables the
detection of statistical dependencies between similarities of the active cleft
and the biological function of annotated enzymes. This is in contrast to
search-based approaches, which do not take annotated training data into
account. Similarity measures based on the active cleft are known to outperform
sequence-based or structure-based measures under certain conditions. We
consider the Enzyme Commission (EC) classification hierarchy for obtaining
annotated enzymes during the training phase. The results of a set of sizeable
experiments indicate a consistent and significant improvement for a set of
similarity measures that exploit information about small cavities in the
surface of enzymes
Persistent-Homology-based Machine Learning and its Applications -- A Survey
A suitable feature representation that can both preserve the data intrinsic
information and reduce data complexity and dimensionality is key to the
performance of machine learning models. Deeply rooted in algebraic topology,
persistent homology (PH) provides a delicate balance between data
simplification and intrinsic structure characterization, and has been applied
to various areas successfully. However, the combination of PH and machine
learning has been hindered greatly by three challenges, namely topological
representation of data, PH-based distance measurements or metrics, and PH-based
feature representation. With the development of topological data analysis,
progresses have been made on all these three problems, but widely scattered in
different literatures. In this paper, we provide a systematical review of PH
and PH-based supervised and unsupervised models from a computational
perspective. Our emphasizes are the recent development of mathematical models
and tools, including PH softwares and PH-based functions, feature
representations, kernels, and similarity models. Essentially, this paper can
work as a roadmap for the practical application of PH-based machine learning
tools. Further, we consider different topological feature representations in
different machine learning models, and investigate their impacts on the protein
secondary structure classification.Comment: 42 pages; 6 figures; 9 table
A Distributed Approach towards Discriminative Distance Metric Learning
Distance metric learning is successful in discovering intrinsic relations in
data. However, most algorithms are computationally demanding when the problem
size becomes large. In this paper, we propose a discriminative metric learning
algorithm, and develop a distributed scheme learning metrics on moderate-sized
subsets of data, and aggregating the results into a global solution. The
technique leverages the power of parallel computation. The algorithm of the
aggregated distance metric learning (ADML) scales well with the data size and
can be controlled by the partition. We theoretically analyse and provide bounds
for the error induced by the distributed treatment. We have conducted
experimental evaluation of ADML, both on specially designed tests and on
practical image annotation tasks. Those tests have shown that ADML achieves the
state-of-the-art performance at only a fraction of the cost incurred by most
existing methods
- …