403 research outputs found
Roto-Translation Covariant Convolutional Networks for Medical Image Analysis
We propose a framework for rotation and translation covariant deep learning
using group convolutions. The group product of the special Euclidean
motion group describes how a concatenation of two roto-translations
results in a net roto-translation. We encode this geometric structure into
convolutional neural networks (CNNs) via group convolutional layers,
which fit into the standard 2D CNN framework, and which allow to generically
deal with rotated input samples without the need for data augmentation.
We introduce three layers: a lifting layer which lifts a 2D (vector valued)
image to an -image, i.e., 3D (vector valued) data whose domain is
; a group convolution layer from and to an -image; and a
projection layer from an -image to a 2D image. The lifting and group
convolution layers are covariant (the output roto-translates with the
input). The final projection layer, a maximum intensity projection over
rotations, makes the full CNN rotation invariant.
We show with three different problems in histopathology, retinal imaging, and
electron microscopy that with the proposed group CNNs, state-of-the-art
performance can be achieved, without the need for data augmentation by rotation
and with increased performance compared to standard CNNs that do rely on
augmentation.Comment: 8 pages, 2 figures, 1 table, accepted at MICCAI 201
SMART: Unique splitting-while-merging framework for gene clustering
Copyright @ 2014 Fa et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original author and source are credited.Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named “splitting merging awareness tactics” (SMART), which does not require any a priori knowledge of either the number of clusters or even the possible range of this number. Unlike existing self-splitting algorithms, which over-cluster the dataset to a large number of clusters and then merge some similar clusters, our framework has the ability to split and merge clusters automatically during the process and produces the the most reliable clustering results, by intrinsically integrating many clustering techniques and tasks. The SMART framework is implemented with two distinct clustering paradigms in two algorithms: competitive learning and finite mixture model. Nevertheless, within the proposed SMART framework, many other algorithms can be derived for different clustering paradigms. The minimum message length algorithm is integrated into the framework as the clustering selection criterion. The usefulness of the SMART framework and its algorithms is tested in demonstration datasets and simulated gene expression datasets. Moreover, two real microarray gene expression datasets are studied using this approach. Based on the performance of many metrics, all numerical results show that SMART is superior to compared existing self-splitting algorithms and traditional algorithms. Three main properties of the proposed SMART framework are summarized as: (1) needing no parameters dependent on the respective dataset or a priori knowledge about the datasets, (2) extendible to many different applications, (3) offering superior performance compared with counterpart algorithms.National Institute for Health Researc
Recommended from our members
Genome-wide association study of primary open-angle glaucoma in continental and admixed African populations.
Primary open angle glaucoma (POAG) is a complex disease with a major genetic contribution. Its prevalence varies greatly among ethnic groups, and is up to five times more frequent in black African populations compared to Europeans. So far, worldwide efforts to elucidate the genetic complexity of POAG in African populations has been limited. We conducted a genome-wide association study in 1113 POAG cases and 1826 controls from Tanzanian, South African and African American study samples. Apart from confirming evidence of association at TXNRD2 (rs16984299; OR[T] 1.20; P = 0.003), we found that a genetic risk score combining the effects of the 15 previously reported POAG loci was significantly associated with POAG in our samples (OR 1.56; 95% CI 1.26-1.93; P = 4.79 × 10-5). By genome-wide association testing we identified a novel candidate locus, rs141186647, harboring EXOC4 (OR[A] 0.48; P = 3.75 × 10-8), a gene transcribing a component of the exocyst complex involved in vesicle transport. The low frequency and high degree of genetic heterogeneity at this region hampered validation of this finding in predominantly West-African replication sets. Our results suggest that established genetic risk factors play a role in African POAG, however, they do not explain the higher disease load. The high heterogeneity within Africans remains a challenge to identify the genetic commonalities for POAG in this ethnicity, and demands studies of extremely large size
On negative results when using sentiment analysis tools for software engineering research
Recent years have seen an increasing attention to social aspects of software engineering, including studies of emotions and sentiments experienced and expressed by the software developers. Most of these studies reuse existing sentiment analysis tools such as SentiStrength and NLTK. However, these tools have been trained on product reviews and movie reviews and, therefore, their results might not be applicable in the software engineering domain. In this paper we study whether the sentiment analysis tools agree with the sentiment recognized by human evaluators (as reported in an earlier study) as well as with each other. Furthermore, we evaluate the impact of the choice of a sentiment analysis tool on software engineering studies by conducting a simple study of differences in issue resolution times for positive, negative and neutral texts. We repeat the study for seven datasets (issue trackers and Stack Overflow questions) and different sentiment analysis tools and observe that the disagreement between the tools can lead to diverging conclusions. Finally, we perform two replications of previously published studies and observe that the results of those studies cannot be confirmed when a different sentiment analysis tool is used
A Quality Metric for Visualization of Clusters in Graphs
Traditionally, graph quality metrics focus on readability, but recent studies
show the need for metrics which are more specific to the discovery of patterns
in graphs. Cluster analysis is a popular task within graph analysis, yet there
is no metric yet explicitly quantifying how well a drawing of a graph
represents its cluster structure. We define a clustering quality metric
measuring how well a node-link drawing of a graph represents the clusters
contained in the graph. Experiments with deforming graph drawings verify that
our metric effectively captures variations in the visual cluster quality of
graph drawings. We then use our metric to examine how well different graph
drawing algorithms visualize cluster structures in various graphs; the results
con-firm that some algorithms which have been specifically designed to show
cluster structures perform better than other algorithms.Comment: Appears in the Proceedings of the 27th International Symposium on
Graph Drawing and Network Visualization (GD 2019
Ranked Adjusted Rand: integrating distance and partition information in a measure of clustering agreement
BACKGROUND: Biological information is commonly used to cluster or classify entities of interest such as genes, conditions, species or samples. However, different sources of data can be used to classify the same set of entities and methods allowing the comparison of the performance of two data sources or the determination of how well a given classification agrees with another are frequently needed, especially in the absence of a universally accepted "gold standard" classification. RESULTS: Here, we describe a novel measure – the Ranked Adjusted Rand (RAR) index. RAR differs from existing methods by evaluating the extent of agreement between any two groupings, taking into account the intercluster distances. This characteristic is relevant to evaluate cases of pairs of entities grouped in the same cluster by one method and separated by another. The latter method may assign them to close neighbour clusters or, on the contrary, to clusters that are far apart from each other. RAR is applicable even when intercluster distance information is absent for both or one of the groupings. In the first case, RAR is equal to its predecessor, Adjusted Rand (HA) index. Artificially designed clusterings were used to demonstrate situations in which only RAR was able to detect differences in the grouping patterns. A study with larger simulated clusterings ensured that in realistic conditions, RAR is effectively integrating distance and partition information. The new method was applied to biological examples to compare 1) two microbial typing methods, 2) two gene regulatory network distances and 3) microarray gene expression data with pathway information. In the first application, one of the methods does not provide intercluster distances while the other originated a hierarchical clustering. RAR proved to be more sensitive than HA in the choice of a threshold for defining clusters in the hierarchical method that maximizes agreement between the results of both methods. CONCLUSION: RAR has its major advantage in combining cluster distance and partition information, while the previously available methods used only the latter. RAR should be used in the research problems were HA was previously used, because in the absence of inter cluster distance effects it is an equally effective measure, and in the presence of distance effects it is a more complete one
Improving cluster recovery with feature rescaling factors
The data preprocessing stage is crucial in clustering. Features may describe entities using different scales. To rectify this, one usually applies feature normalisation aiming at rescaling features so that none of them overpowers the others in the objective function of the selected clustering algorithm. In this paper, we argue that the rescaling procedure should not treat all features identically. Instead, it should favour the features that are more meaningful for clustering. With this in mind, we introduce a feature rescaling method that takes into account the within-cluster degree of relevance of each feature. Our comprehensive simulation study, carried out on real and synthetic data, with and without noise features, clearly demonstrates that clustering methods that use the proposed data normalization strategy clearly outperform those that use traditional data normalization
- …