19,883 research outputs found
Phase Transitions of the Typical Algorithmic Complexity of the Random Satisfiability Problem Studied with Linear Programming
Here we study the NP-complete -SAT problem. Although the worst-case
complexity of NP-complete problems is conjectured to be exponential, there
exist parametrized random ensembles of problems where solutions can typically
be found in polynomial time for suitable ranges of the parameter. In fact,
random -SAT, with as control parameter, can be solved quickly
for small enough values of . It shows a phase transition between a
satisfiable phase and an unsatisfiable phase. For branch and bound algorithms,
which operate in the space of feasible Boolean configurations, the empirically
hardest problems are located only close to this phase transition. Here we study
-SAT () and the related optimization problem MAX-SAT by a linear
programming approach, which is widely used for practical problems and allows
for polynomial run time. In contrast to branch and bound it operates outside
the space of feasible configurations. On the other hand, finding a solution
within polynomial time is not guaranteed. We investigated several variants like
including artificial objective functions, so called cutting-plane approaches,
and a mapping to the NP-complete vertex-cover problem. We observed several
easy-hard transitions, from where the problems are typically solvable (in
polynomial time) using the given algorithms, respectively, to where they are
not solvable in polynomial time. For the related vertex-cover problem on random
graphs these easy-hard transitions can be identified with structural properties
of the graphs, like percolation transitions. For the present random -SAT
problem we have investigated numerous structural properties also exhibiting
clear transitions, but they appear not be correlated to the here observed
easy-hard transitions. This renders the behaviour of random -SAT more
complex than, e.g., the vertex-cover problem.Comment: 11 pages, 5 figure
A Wolf Pack Optimization Theory Based Improved Density Peaks Clustering Approach
In view of the problem that the Density Peaks Clustering (DPC) algorithm needs to manually set the parameter cut-off distance (dc) we propose a Wolf Pack optimization theory based Density Peaks Clustering approach (WPA-DPC). Firstly, we introduce dc parameter into the Wolf Pack Algorithm (WPA) to speed up the search. Secondly, we introduce the WPA into the DPC algorithm; the cut-off distance is used as the location of the wolf group. Finally, we make silhouette index in the search process as the fitness value, and the optimal location of the wolf group is the parameter value at the end. The simulation results show that compared with the traditional Density Peaks Clustering algorithm, the proposed algorithm is closer to the true clustering number. According to the evaluation results of silhouette and f-measure, the quality of clustering and the accuracy are greatly improved
Fast k-means based on KNN Graph
In the era of big data, k-means clustering has been widely adopted as a basic
processing tool in various contexts. However, its computational cost could be
prohibitively high as the data size and the cluster number are large. It is
well known that the processing bottleneck of k-means lies in the operation of
seeking closest centroid in each iteration. In this paper, a novel solution
towards the scalability issue of k-means is presented. In the proposal, k-means
is supported by an approximate k-nearest neighbors graph. In the k-means
iteration, each data sample is only compared to clusters that its nearest
neighbors reside. Since the number of nearest neighbors we consider is much
less than k, the processing cost in this step becomes minor and irrelevant to
k. The processing bottleneck is therefore overcome. The most interesting thing
is that k-nearest neighbor graph is constructed by iteratively calling the fast
-means itself. Comparing with existing fast k-means variants, the proposed
algorithm achieves hundreds to thousands times speed-up while maintaining high
clustering quality. As it is tested on 10 million 512-dimensional data, it
takes only 5.2 hours to produce 1 million clusters. In contrast, to fulfill the
same scale of clustering, it would take 3 years for traditional k-means
- …