35,061 research outputs found
Parsimony-based genetic algorithm for haplotype resolution and block partitioning
This dissertation proposes a new algorithm for performing simultaneous haplotype resolution and block partitioning. The algorithm is based on genetic algorithm approach and the parsimonious principle. The multiloculs LD measure (Normalized Entropy Difference) is used as a block identification criterion. The proposed algorithm incorporates missing data is a part of the model and allows blocks of arbitrary length. In addition, the algorithm provides scores for the block boundaries which represent measures of strength of the boundaries at specific positions. The performance of the proposed algorithm was validated by running it on several publicly available data sets including the HapMap data and comparing results to those of the existing state-of-the-art algorithms. The results show that the proposed genetic algorithm provides the accuracy of haplotype decomposition within the range of the same indicators shown by the other algorithms. The block structure output by our algorithm in general agrees with the block structure for the same data provided by the other algorithms. Thus, the proposed algorithm can be successfully used for block partitioning and haplotype phasing while providing some new valuable features like scores for block boundaries and fully incorporated treatment of missing data. In addition, the proposed algorithm for haplotyping and block partitioning is used in development of the new clustering algorithm for two-population mixed genotype samples. The proposed clustering algorithm extracts from the given genotype sample two clusters with substantially different block structures and finds haplotype resolution and block partitioning for each cluster
Memetic Multilevel Hypergraph Partitioning
Hypergraph partitioning has a wide range of important applications such as
VLSI design or scientific computing. With focus on solution quality, we develop
the first multilevel memetic algorithm to tackle the problem. Key components of
our contribution are new effective multilevel recombination and mutation
operations that provide a large amount of diversity. We perform a wide range of
experiments on a benchmark set containing instances from application areas such
VLSI, SAT solving, social networks, and scientific computing. Compared to the
state-of-the-art hypergraph partitioning tools hMetis, PaToH, and KaHyPar, our
new algorithm computes the best result on almost all instances
Genetic algorithm based two-mode clustering of metabolomics data
Metabolomics and other omics tools are generally characterized by large data sets with many variables obtained under different environmental conditions. Clustering methods and more specifically two-mode clustering methods are excellent tools for analyzing this type of data. Two-mode clustering methods allow for analysis of the behavior of subsets of metabolites under different experimental conditions. In addition, the results are easily visualized. In this paper we introduce a two-mode clustering method based on a genetic algorithm that uses a criterion that searches for homogeneous clusters. Furthermore we introduce a cluster stability criterion to validate the clusters and we provide an extended knee plot to select the optimal number of clusters in both experimental and metabolite modes. The genetic algorithm-based two-mode clustering gave biological relevant results when it was applied to two real life metabolomics data sets. It was, for instance, able to identify a catabolic pathway for growth on several of the carbon sources
Recommended from our members
A niching memetic algorithm for simultaneous clustering and feature selection
Clustering is inherently a difficult task, and is made even more difficult when the selection of relevant features is also an issue. In this paper we propose an approach for simultaneous clustering and feature selection using a niching memetic algorithm. Our approach (which we call NMA_CFS) makes feature selection an integral part of the global clustering search procedure and attempts to overcome the problem of identifying less promising locally optimal solutions in both clustering and feature selection, without making any a priori assumption about the number of clusters. Within the NMA_CFS procedure, a variable composite representation is devised to encode both feature selection and cluster centers with different numbers of clusters. Further, local search operations are introduced to refine feature selection and cluster centers encoded in the chromosomes. Finally, a niching method is integrated to preserve the population diversity and prevent premature convergence. In an experimental evaluation we demonstrate the effectiveness of the proposed approach and compare it with other related approaches, using both synthetic and real data
Improving Table Compression with Combinatorial Optimization
We study the problem of compressing massive tables within the
partition-training paradigm introduced by Buchsbaum et al. [SODA'00], in which
a table is partitioned by an off-line training procedure into disjoint
intervals of columns, each of which is compressed separately by a standard,
on-line compressor like gzip. We provide a new theory that unifies previous
experimental observations on partitioning and heuristic observations on column
permutation, all of which are used to improve compression rates. Based on the
theory, we devise the first on-line training algorithms for table compression,
which can be applied to individual files, not just continuously operating
sources; and also a new, off-line training algorithm, based on a link to the
asymmetric traveling salesman problem, which improves on prior work by
rearranging columns prior to partitioning. We demonstrate these results
experimentally. On various test files, the on-line algorithms provide 35-55%
improvement over gzip with negligible slowdown; the off-line reordering
provides up to 20% further improvement over partitioning alone. We also show
that a variation of the table compression problem is MAX-SNP hard.Comment: 22 pages, 2 figures, 5 tables, 23 references. Extended abstract
appears in Proc. 13th ACM-SIAM SODA, pp. 213-222, 200
Recent Advances in Graph Partitioning
We survey recent trends in practical algorithms for balanced graph
partitioning together with applications and future research directions
- …