64,589 research outputs found
Multi-level algorithms for modularity clustering
Modularity is one of the most widely used quality measures for graph
clusterings. Maximizing modularity is NP-hard, and the runtime of exact
algorithms is prohibitive for large graphs. A simple and effective class of
heuristics coarsens the graph by iteratively merging clusters (starting from
singletons), and optionally refines the resulting clustering by iteratively
moving individual vertices between clusters. Several heuristics of this type
have been proposed in the literature, but little is known about their relative
performance.
This paper experimentally compares existing and new coarsening- and
refinement-based heuristics with respect to their effectiveness (achieved
modularity) and efficiency (runtime). Concerning coarsening, it turns out that
the most widely used criterion for merging clusters (modularity increase) is
outperformed by other simple criteria, and that a recent algorithm by Schuetz
and Caflisch is no improvement over simple greedy coarsening for these
criteria. Concerning refinement, a new multi-level algorithm is shown to
produce significantly better clusterings than conventional single-level
algorithms. A comparison with published benchmark results and algorithm
implementations shows that combinations of coarsening and multi-level
refinement are competitive with the best algorithms in the literature.Comment: 12 pages, 10 figures, see
http://www.informatik.tu-cottbus.de/~rrotta/ for downloading the graph
clustering softwar
Chemoinformatics Research at the University of Sheffield: A History and Citation Analysis
This paper reviews the work of the Chemoinformatics Research Group in the Department of Information Studies at the University of Sheffield, focusing particularly on the work carried out in the period 1985-2002. Four major research areas are discussed, these involving the development of methods for: substructure searching in databases of three-dimensional structures, including both rigid and flexible molecules; the representation and searching of the Markush structures that occur in chemical patents; similarity searching in databases of both two-dimensional and three-dimensional structures; and compound selection and the design of combinatorial libraries. An analysis of citations to 321 publications from the Group shows that it attracted a total of 3725 residual citations during the period 1980-2002. These citations appeared in 411 different journals, and involved 910 different citing organizations from 54 different countries, thus demonstrating the widespread impact of the Group's work
The Evaluation Of Molecular Similarity And Molecular Diversity Methods Using Biological Activity Data
This paper reviews the techniques available for quantifying the effectiveness of methods for molecule similarity and molecular diversity, focusing in particular on similarity searching and on compound selection procedures. The evaluation criteria considered are based on biological activity data, both qualitative and quantitative, with rather different criteria needing to be used depending on the type of data available
- …