3,541 research outputs found
Multidimensional Scaling by Deterministic Annealing with Iterative Majorization Algorithm
Abstract—Multidimensional Scaling (MDS) is a dimension reduction method for information visualization, which is set up as a non-linear optimization problem. It is applicable to many data intensive scientific problems including studies of DNA sequences but tends to get trapped in local minima. Deterministic Annealing (DA) has been applied to many optimization problems to avoid local minima. We apply DA approach to MDS problem in this paper and show that our proposed DA approach improves the mapping quality and shows high reliability in a variety of experimental results. Further its execution time is similar to that of the un-annealed approach. We use different data sets for comparing the proposed DA approach with both a well known algorithm called SMACOF and a MDS with distance smoothing method which aims to avoid local optima. Our proposed DA method outperforms SMACOF algorithm and the distance smoothing MDS algorithm in terms of the mapping quality and shows much less sensitivity with respect to initial configurations and stopping condition. We also investigate various temperature cooling parameters for our deterministic annealing method within an exponential cooling scheme. I
Recommended from our members
Soft topographic map for clustering and classification of bacteria
In this work a new method for clustering and building a
topographic representation of a bacteria taxonomy is presented. The method is based on the analysis of stable parts of the genome, the so-called “housekeeping genes”. The proposed method generates topographic maps of the bacteria taxonomy, where relations among different
type strains can be visually inspected and verified. Two well known DNA alignement algorithms are applied to the genomic sequences. Topographic maps are optimized to represent the similarity among the sequences according to their evolutionary distances. The experimental analysis is carried out on 147 type strains of the Gammaprotebacteria
class by means of the 16S rRNA housekeeping gene. Complete sequences of the gene have been retrieved from the NCBI public database. In the experimental tests the maps show clusters of homologous type strains and present some singular cases potentially due to incorrect classification
or erroneous annotations in the database
Partitioning Relational Matrices of Similarities or Dissimilarities using the Value of Information
In this paper, we provide an approach to clustering relational matrices whose
entries correspond to either similarities or dissimilarities between objects.
Our approach is based on the value of information, a parameterized,
information-theoretic criterion that measures the change in costs associated
with changes in information. Optimizing the value of information yields a
deterministic annealing style of clustering with many benefits. For instance,
investigators avoid needing to a priori specify the number of clusters, as the
partitions naturally undergo phase changes, during the annealing process,
whereby the number of clusters changes in a data-driven fashion. The
global-best partition can also often be identified.Comment: Submitted to the IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP
Practical implementation of nonlinear time series methods: The TISEAN package
Nonlinear time series analysis is becoming a more and more reliable tool for
the study of complicated dynamics from measurements. The concept of
low-dimensional chaos has proven to be fruitful in the understanding of many
complex phenomena despite the fact that very few natural systems have actually
been found to be low dimensional deterministic in the sense of the theory. In
order to evaluate the long term usefulness of the nonlinear time series
approach as inspired by chaos theory, it will be important that the
corresponding methods become more widely accessible. This paper, while not a
proper review on nonlinear time series analysis, tries to make a contribution
to this process by describing the actual implementation of the algorithms, and
their proper usage. Most of the methods require the choice of certain
parameters for each specific time series application. We will try to give
guidance in this respect. The scope and selection of topics in this article, as
well as the implementational choices that have been made, correspond to the
contents of the software package TISEAN which is publicly available from
http://www.mpipks-dresden.mpg.de/~tisean . In fact, this paper can be seen as
an extended manual for the TISEAN programs. It fills the gap between the
technical documentation and the existing literature, providing the necessary
entry points for a more thorough study of the theoretical background.Comment: 27 pages, 21 figures, downloadable software at
http://www.mpipks-dresden.mpg.de/~tisea
Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm
Over the past five decades, k-means has become the clustering algorithm of
choice in many application domains primarily due to its simplicity, time/space
efficiency, and invariance to the ordering of the data points. Unfortunately,
the algorithm's sensitivity to the initial selection of the cluster centers
remains to be its most serious drawback. Numerous initialization methods have
been proposed to address this drawback. Many of these methods, however, have
time complexity superlinear in the number of data points, which makes them
impractical for large data sets. On the other hand, linear methods are often
random and/or sensitive to the ordering of the data points. These methods are
generally unreliable in that the quality of their results is unpredictable.
Therefore, it is common practice to perform multiple runs of such methods and
take the output of the run that produces the best results. Such a practice,
however, greatly increases the computational requirements of the otherwise
highly efficient k-means algorithm. In this chapter, we investigate the
empirical performance of six linear, deterministic (non-random), and
order-invariant k-means initialization methods on a large and diverse
collection of data sets from the UCI Machine Learning Repository. The results
demonstrate that two relatively unknown hierarchical initialization methods due
to Su and Dy outperform the remaining four methods with respect to two
objective effectiveness criteria. In addition, a recent method due to Erisoglu
et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms
(Springer, 2014). arXiv admin note: substantial text overlap with
arXiv:1304.7465, arXiv:1209.196
Restoration Ecology: Two-Sex Dynamics and Cost Minimization
We model a spatially detailed, two-sex population dynamics, to study the cost
of ecological restoration. We assume that cost is proportional to the number of
individuals introduced into a large habitat. We treat dispersal as homogeneous
diffusion. The local population dynamics depends on sex ratio at birth, and
allows mortality rates to differ between sexes. Furthermore, local density
dependence induces a strong Allee effect, implying that the initial population
must be sufficiently large to avert rapid extinction. We address three
different initial spatial distributions for the introduced individuals; for
each we minimize the associated cost, constrained by the requirement that the
species must be restored throughout the habitat. First, we consider spatially
inhomogeneous, unstable stationary solutions of the model's equations as
plausible candidates for small restoration cost. Second, we use numerical
simulations to find the smallest cluster size, enclosing a spatially
homogeneous population density, that minimizes the cost of assured restoration.
Finally, by employing simulated annealing, we minimize restoration cost among
all possible initial spatial distributions of females and males. For biased sex
ratios, or for a significant between-sex difference in mortality, we find that
sex-specific spatial distributions minimize the cost. But as long as the sex
ratio maximizes the local equilibrium density for given mortality rates, a
common homogeneous distribution for both sexes that spans a critical distance
yields a similarly low cost
Optimizing Mean Mission Duration for Multiple-Payload Satellites
This thesis addresses the problem of optimally selecting and specifying satellite payloads for inclusion on a satellite bus to be launched into a constellation. The objective is to select and specify payloads so that the total lifetime utility of the constellation is maximized. The satellite bus is limited by finite power, weight, volume, and cost constraints. This problem is modeled as a classical knapsack problem in one and multiple dimensions, and dynamic programming and binary integer programming formulations are provided to solve the problem. Due to the computational complexity of the problem, the solution techniques include exact methods as well as four heuristic procedures including a greedy heuristic, two norm-based heuristics, and a simulated annealing heuristic. The performance of the exact and heuristic approaches is evaluated on the basis of solution quality and computation time by solving a series of notional and randomly-generated problem instances. The numerical results indicate that, when an exact solution is required for a moderately-sized constellation, the integer programming formulation is most reliable in solving the problem to optimality. However, if the problem size is very large, and near-optimal solutions are acceptable, then the simulated annealing algorithm performs best among the heuristic procedures
- …