26 research outputs found
Towards an Efficient Discovery of the Topological Representative Subgraphs
With the emergence of graph databases, the task of frequent subgraph
discovery has been extensively addressed. Although the proposed approaches in
the literature have made this task feasible, the number of discovered frequent
subgraphs is still very high to be efficiently used in any further exploration.
Feature selection for graph data is a way to reduce the high number of frequent
subgraphs based on exact or approximate structural similarity. However, current
structural similarity strategies are not efficient enough in many real-world
applications, besides, the combinatorial nature of graphs makes it
computationally very costly. In order to select a smaller yet structurally
irredundant set of subgraphs, we propose a novel approach that mines the top-k
topological representative subgraphs among the frequent ones. Our approach
allows detecting hidden structural similarities that existing approaches are
unable to detect such as the density or the diameter of the subgraph. In
addition, it can be easily extended using any user defined structural or
topological attributes depending on the sought properties. Empirical studies on
real and synthetic graph datasets show that our approach is fast and scalable
Segmentation Based Mesh Denoising
Feature-preserving mesh denoising has received noticeable attention recently.
Many methods often design great weighting for anisotropic surfaces and small
weighting for isotropic surfaces, to preserve sharp features. However, they
often disregard the fact that small weights still pose negative impacts to the
denoising outcomes. Furthermore, it may increase the difficulty in parameter
tuning, especially for users without any background knowledge. In this paper,
we propose a novel clustering method for mesh denoising, which can avoid the
disturbance of anisotropic information and be easily embedded into
commonly-used mesh denoising frameworks. Extensive experiments have been
conducted to validate our method, and demonstrate that it can enhance the
denoising results of some existing methods remarkably both visually and
quantitatively. It also largely relaxes the parameter tuning procedure for
users, in terms of increasing stability for existing mesh denoising methods
Median Graph Shift: A New Clustering Algorithm for Graph Domain
ISSN: 1051-4651 Print ISBN: 978-1-4244-7542-1International audiencen the context of unsupervised clustering, a new algorithm for the domain of graphs is introduced. In this paper, the key idea is to adapt the mean-shift clustering and its variants proposed for the domain of feature vectors to graph clustering. These algorithms have been applied successfully in image analysis and computer vision domains. The proposed algorithm works in an iterative manner by shifting each graph towards the median graph in a neighborhood. Both the set median graph and the generalized median graph are tested for the shifting procedure. In the experiment part, a set of cluster validation indices are used to evaluate our clustering algorithm and a comparison with the well-known Kmeans algorithm is provided
Medoidshift clustering applied to genomic bulk tumor data.
Despite the enormous medical impact of cancers and intensive study of their biology, detailed characterization of tumor growth and development remains elusive. This difficulty occurs in large part because of enormous heterogeneity in the molecular mechanisms of cancer progression, both tumor-to-tumor and cell-to-cell in single tumors. Advances in genomic technologies, especially at the single-cell level, are improving the situation, but these approaches are held back by limitations of the biotechnologies for gathering genomic data from heterogeneous cell populations and the computational methods for making sense of those data. One popular way to gain the advantages of whole-genome methods without the cost of single-cell genomics has been the use of computational deconvolution (unmixing) methods to reconstruct clonal heterogeneity from bulk genomic data. These methods, too, are limited by the difficulty of inferring genomic profiles of rare or subtly varying clonal subpopulations from bulk data, a problem that can be computationally reduced to that of reconstructing the geometry of point clouds of tumor samples in a genome space. Here, we present a new method to improve that reconstruction by better identifying subspaces corresponding to tumors produced from mixtures of distinct combinations of clonal subpopulations. We develop a nonparametric clustering method based on medoidshift clustering for identifying subgroups of tumors expected to correspond to distinct trajectories of evolutionary progression. We show on synthetic and real tumor copy-number data that this new method substantially improves our ability to resolve discrete tumor subgroups, a key step in the process of accurately deconvolving tumor genomic data and inferring clonal heterogeneity from bulk data
Visual Landmark Recognition from Internet Photo Collections: A Large-Scale Evaluation
The task of a visual landmark recognition system is to identify photographed
buildings or objects in query photos and to provide the user with relevant
information on them. With their increasing coverage of the world's landmark
buildings and objects, Internet photo collections are now being used as a
source for building such systems in a fully automatic fashion. This process
typically consists of three steps: clustering large amounts of images by the
objects they depict; determining object names from user-provided tags; and
building a robust, compact, and efficient recognition index. To this date,
however, there is little empirical information on how well current approaches
for those steps perform in a large-scale open-set mining and recognition task.
Furthermore, there is little empirical information on how recognition
performance varies for different types of landmark objects and where there is
still potential for improvement. With this paper, we intend to fill these gaps.
Using a dataset of 500k images from Paris, we analyze each component of the
landmark recognition pipeline in order to answer the following questions: How
many and what kinds of objects can be discovered automatically? How can we best
use the resulting image clusters to recognize the object in a query? How can
the object be efficiently represented in memory for recognition? How reliably
can semantic information be extracted? And finally: What are the limiting
factors in the resulting pipeline from query to semantics? We evaluate how
different choices of methods and parameters for the individual pipeline steps
affect overall system performance and examine their effects for different query
categories such as buildings, paintings or sculptures
Community Detection Using Revised Medoid-Shift Based on KNN
Community detection becomes an important problem with the booming of social
networks. As an excellent clustering algorithm, Mean-Shift can not be applied
directly to community detection, since Mean-Shift can only handle data with
coordinates, while the data in the community detection problem is mostly
represented by a graph that can be treated as data with a distance matrix (or
similarity matrix). Fortunately, a new clustering algorithm called Medoid-Shift
is proposed. The Medoid-Shift algorithm preserves the benefits of Mean-Shift
and can be applied to problems based on distance matrix, such as community
detection. One drawback of the Medoid-Shift algorithm is that there may be no
data points within the neighborhood region defined by a distance parameter. To
deal with the community detection problem better, a new algorithm called
Revised Medoid-Shift (RMS) in this work is thus proposed. During the process of
finding the next medoid, the RMS algorithm is based on a neighborhood defined
by KNN, while the original Medoid-Shift is based on a neighborhood defined by a
distance parameter. Since the neighborhood defined by KNN is more stable than
the one defined by the distance parameter in terms of the number of data points
within the neighborhood, the RMS algorithm may converge more smoothly. In the
RMS method, each of the data points is shifted towards a medoid within the
neighborhood defined by KNN. After the iterative process of shifting, each of
the data point converges into a cluster center, and the data points converging
into the same center are grouped into the same cluster
Rapid Mode Estimation for 3D Brain MRI Tumor Segmentation
International audienceIn this work we develop a method for the efficient automated segmentation of brain tumors by developing a rapid initialization method. Brain tumor segmentation is crucial for brain tumor resection planning, and a high-quality initialization may have a significant impact on segmentation quality. The main contribution of our work is an efficient method to initialize the segmentation by casting it as nonparametric density mode estimation, and developing a Branch and Bound-based method to efficiently find the mode (maximum) of the density function. Our technique is exact, has guaranteed convergence to the global optimum, and scales logarithmically in the volume dimensions by virtue of recursively subdividing the search space through Branch-and-Bound. Our method employs the Dual Tree data structure originally developed for nonparametric density estimation, and recently used for object detection with branch-and-bound. In this work we 'close the loop', and use the Dual Tree data structure for finding the mode of a density. This estimated mode provides our system with an initial tumor hypothesis which is then refined by graph-cuts to provide a sharper outline of the tumor area. We demonstrate a 12-fold acceleration with respect to a standard mean-shift implementation, allowing us to accelerate tumor detection to a level that would facilitate high-quality brain tumor resection planning