Search CORE

3,541 research outputs found

Multidimensional Scaling by Deterministic Annealing with Iterative Majorization Algorithm

Author: Geoffrey C. Fox
Judy Qiu
Seung-hee Bae
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Abstract—Multidimensional Scaling (MDS) is a dimension reduction method for information visualization, which is set up as a non-linear optimization problem. It is applicable to many data intensive scientific problems including studies of DNA sequences but tends to get trapped in local minima. Deterministic Annealing (DA) has been applied to many optimization problems to avoid local minima. We apply DA approach to MDS problem in this paper and show that our proposed DA approach improves the mapping quality and shows high reliability in a variety of experimental results. Further its execution time is similar to that of the un-annealed approach. We use different data sets for comparing the proposed DA approach with both a well known algorithm called SMACOF and a MDS with distance smoothing method which aims to avoid local optima. Our proposed DA method outperforms SMACOF algorithm and the distance smoothing MDS algorithm in terms of the mapping quality and shows much less sensitivity with respect to initial configurations and stopping condition. We also investigate various temperature cooling parameters for our deterministic annealing method within an exponential cooling scheme. I

CiteSeerX

Crossref

Recommended from our members

Soft topographic map for clustering and classification of bacteria

Author: G.M. Garrity
H. Klock
I.T. Joliffe
J.D. Thompson
J.E. Clarridge III
K. Rose
M. Drancourt
M. Drancourt
M. Drancourt
M. Remm
P. Rice
S. Altschul
S. Dubnov
S. Kumar
S.B. Needleman
S.P. Luttrell
T. Graepel
T. Graepel
T. Hofmann
T. Hofmann
T. Kohonen
T. Kohonen
T.H. Jukes
W.S. Torgerson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

In this work a new method for clustering and building a topographic representation of a bacteria taxonomy is presented. The method is based on the analysis of stable parts of the genome, the so-called “housekeeping genes”. The proposed method generates topographic maps of the bacteria taxonomy, where relations among different type strains can be visually inspected and verified. Two well known DNA alignement algorithms are applied to the genomic sequences. Topographic maps are optimized to represent the similarity among the sequences according to their evolutionary distances. The experimental analysis is carried out on 147 type strains of the Gammaprotebacteria class by means of the 16S rRNA housekeeping gene. Complete sequences of the gene have been retrieved from the NCBI public database. In the experimental tests the maps show clusters of homologous type strains and present some singular cases potentially due to incorrect classification or erroneous annotations in the database

Central Archive at the University of Reading

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Università di Palermo

Partitioning Relational Matrices of Similarities or Dissimilarities using the Value of Information

Author: Principe Jose C.
Sledge Isaac J.
Publication venue
Publication date: 27/10/2017
Field of study

In this paper, we provide an approach to clustering relational matrices whose entries correspond to either similarities or dissimilarities between objects. Our approach is based on the value of information, a parameterized, information-theoretic criterion that measures the change in costs associated with changes in information. Optimizing the value of information yields a deterministic annealing style of clustering with many benefits. For instance, investigators avoid needing to a priori specify the number of clusters, as the partitions naturally undergo phase changes, during the annealing process, whereby the number of clusters changes in a data-driven fashion. The global-best partition can also often be identified.Comment: Submitted to the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP

arXiv.org e-Print Archive

Crossref

Practical implementation of nonlinear time series methods: The TISEAN package

Author: Hegger Rainer
Kantz Holger
Schreiber Thomas
Publication venue: 'AIP Publishing'
Publication date: 30/09/1998
Field of study

Nonlinear time series analysis is becoming a more and more reliable tool for the study of complicated dynamics from measurements. The concept of low-dimensional chaos has proven to be fruitful in the understanding of many complex phenomena despite the fact that very few natural systems have actually been found to be low dimensional deterministic in the sense of the theory. In order to evaluate the long term usefulness of the nonlinear time series approach as inspired by chaos theory, it will be important that the corresponding methods become more widely accessible. This paper, while not a proper review on nonlinear time series analysis, tries to make a contribution to this process by describing the actual implementation of the algorithms, and their proper usage. Most of the methods require the choice of certain parameters for each specific time series application. We will try to give guidance in this respect. The scope and selection of topics in this article, as well as the implementational choices that have been made, correspond to the contents of the software package TISEAN which is publicly available from http://www.mpipks-dresden.mpg.de/~tisean . In fact, this paper can be seen as an extended manual for the TISEAN programs. It fills the gap between the technical documentation and the existing literature, providing the necessary entry points for a more thorough study of the theoretical background.Comment: 27 pages, 21 figures, downloadable software at http://www.mpipks-dresden.mpg.de/~tisea

arXiv.org e-Print Archive

MPG.PuRe

Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

arXiv.org e-Print Archive

Crossref

Restoration Ecology: Two-Sex Dynamics and Cost Minimization

Author: Caraco T.
Caragine C.
Korniss G.
Molnar Jr F.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

We model a spatially detailed, two-sex population dynamics, to study the cost of ecological restoration. We assume that cost is proportional to the number of individuals introduced into a large habitat. We treat dispersal as homogeneous diffusion. The local population dynamics depends on sex ratio at birth, and allows mortality rates to differ between sexes. Furthermore, local density dependence induces a strong Allee effect, implying that the initial population must be sufficiently large to avert rapid extinction. We address three different initial spatial distributions for the introduced individuals; for each we minimize the associated cost, constrained by the requirement that the species must be restored throughout the habitat. First, we consider spatially inhomogeneous, unstable stationary solutions of the model's equations as plausible candidates for small restoration cost. Second, we use numerical simulations to find the smallest cluster size, enclosing a spatially homogeneous population density, that minimizes the cost of assured restoration. Finally, by employing simulated annealing, we minimize restoration cost among all possible initial spatial distributions of females and males. For biased sex ratios, or for a significant between-sex difference in mortality, we find that sex-specific spatial distributions minimize the cost. But as long as the sex ratio maximizes the local equilibrium density for given mortality rates, a common homogeneous distribution for both sexes that spans a critical distance yields a similarly low cost

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

PubMed Central

FigShare

Optimizing Mean Mission Duration for Multiple-Payload Satellites

Author: Flory John A.
Publication venue: AFIT Scholar
Publication date: 01/03/2006
Field of study

This thesis addresses the problem of optimally selecting and specifying satellite payloads for inclusion on a satellite bus to be launched into a constellation. The objective is to select and specify payloads so that the total lifetime utility of the constellation is maximized. The satellite bus is limited by finite power, weight, volume, and cost constraints. This problem is modeled as a classical knapsack problem in one and multiple dimensions, and dynamic programming and binary integer programming formulations are provided to solve the problem. Due to the computational complexity of the problem, the solution techniques include exact methods as well as four heuristic procedures including a greedy heuristic, two norm-based heuristics, and a simulated annealing heuristic. The performance of the exact and heuristic approaches is evaluated on the basis of solution quality and computation time by solving a series of notional and randomly-generated problem instances. The numerical results indicate that, when an exact solution is required for a moderately-sized constellation, the integer programming formulation is most reliable in solving the problem to optimality. However, if the problem size is very large, and near-optimal solutions are acceptable, then the simulated annealing algorithm performs best among the heuristic procedures

AFTI Scholar (Air Force Institute of Technology)