58,287 research outputs found
Geosocial Graph-Based Community Detection
We apply spectral clustering and multislice modularity optimization to a Los
Angeles Police Department field interview card data set. To detect communities
(i.e., cohesive groups of vertices), we use both geographic and social
information about stops involving street gang members in the LAPD district of
Hollenbeck. We then compare the algorithmically detected communities with known
gang identifications and argue that discrepancies are due to sparsity of social
connections in the data as well as complex underlying sociological factors that
blur distinctions between communities.Comment: 5 pages, 4 figures Workshop paper for the IEEE International
Conference on Data Mining 2012: Workshop on Social Media Analysis and Minin
SACOC: A spectral-based ACO clustering algorithm
The application of ACO-based algorithms in data mining is growing over the last few years and several supervised and unsupervised learning algorithms have been developed using this bio-inspired approach. Most recent works concerning unsupervised learning have been focused on clustering, where ACO-based techniques have showed a great potential. At the same time, new clustering techniques that seek the continuity of data, specially focused on spectral-based approaches in opposition to classical centroid-based approaches, have attracted an increasing research interest–an area still under study by ACO clustering techniques. This work presents a hybrid spectral-based ACO clustering algorithm inspired by the ACO Clustering (ACOC) algorithm. The proposed approach combines ACOC with the spectral Laplacian to generate a new search space for the algorithm in order to obtain more promising solutions. The new algorithm, called SACOC, has been compared against well-known algorithms (K-means and Spectral Clustering) and with ACOC. The experiments measure the accuracy of the algorithm for both synthetic datasets and real-world datasets extracted from the UCI Machine Learning Repository
Manifold Learning in MR spectroscopy using nonlinear dimensionality reduction and unsupervised clustering
Purpose To investigate whether nonlinear dimensionality reduction improves unsupervised classification of 1H MRS brain tumor data compared with a linear method. Methods In vivo single-voxel 1H magnetic resonance spectroscopy (55 patients) and 1H magnetic resonance spectroscopy imaging (MRSI) (29 patients) data were acquired from histopathologically diagnosed gliomas. Data reduction using Laplacian eigenmaps (LE) or independent component analysis (ICA) was followed by k-means clustering or agglomerative hierarchical clustering (AHC) for unsupervised learning to assess tumor grade and for tissue type segmentation of MRSI data. Results An accuracy of 93% in classification of glioma grade II and grade IV, with 100% accuracy in distinguishing tumor and normal spectra, was obtained by LE with unsupervised clustering, but not with the combination of k-means and ICA. With 1H MRSI data, LE provided a more linear distribution of data for cluster analysis and better cluster stability than ICA. LE combined with k-means or AHC provided 91% accuracy for classifying tumor grade and 100% accuracy for identifying normal tissue voxels. Color-coded visualization of normal brain, tumor core, and infiltration regions was achieved with LE combined with AHC. Conclusion Purpose To investigate whether nonlinear dimensionality reduction improves unsupervised classification of 1H MRS brain tumor data compared with a linear method. Methods In vivo single-voxel 1H magnetic resonance spectroscopy (55 patients) and 1H magnetic resonance spectroscopy imaging (MRSI) (29 patients) data were acquired from histopathologically diagnosed gliomas. Data reduction using Laplacian eigenmaps (LE) or independent component analysis (ICA) was followed by k-means clustering or agglomerative hierarchical clustering (AHC) for unsupervised learning to assess tumor grade and for tissue type segmentation of MRSI data. Results An accuracy of 93% in classification of glioma grade II and grade IV, with 100% accuracy in distinguishing tumor and normal spectra, was obtained by LE with unsupervised clustering, but not with the combination of k-means and ICA. With 1H MRSI data, LE provided a more linear distribution of data for cluster analysis and better cluster stability than ICA. LE combined with k-means or AHC provided 91% accuracy for classifying tumor grade and 100% accuracy for identifying normal tissue voxels. Color-coded visualization of normal brain, tumor core, and infiltration regions was achieved with LE combined with AHC. Conclusion The LE method is promising for unsupervised clustering to separate brain and tumor tissue with automated color-coding for visualization of 1H MRSI data after cluster analysis
The Block Point Process Model for Continuous-Time Event-Based Dynamic Networks
We consider the problem of analyzing timestamped relational events between a
set of entities, such as messages between users of an on-line social network.
Such data are often analyzed using static or discrete-time network models,
which discard a significant amount of information by aggregating events over
time to form network snapshots. In this paper, we introduce a block point
process model (BPPM) for continuous-time event-based dynamic networks. The BPPM
is inspired by the well-known stochastic block model (SBM) for static networks.
We show that networks generated by the BPPM follow an SBM in the limit of a
growing number of nodes. We use this property to develop principled and
efficient local search and variational inference procedures initialized by
regularized spectral clustering. We fit BPPMs with exponential Hawkes processes
to analyze several real network data sets, including a Facebook wall post
network with over 3,500 nodes and 130,000 events.Comment: To appear at The Web Conference 201
Discovering universal statistical laws of complex networks
Different network models have been suggested for the topology underlying
complex interactions in natural systems. These models are aimed at replicating
specific statistical features encountered in real-world networks. However, it
is rarely considered to which degree the results obtained for one particular
network class can be extrapolated to real-world networks. We address this issue
by comparing different classical and more recently developed network models
with respect to their generalisation power, which we identify with large
structural variability and absence of constraints imposed by the construction
scheme. After having identified the most variable networks, we address the
issue of which constraints are common to all network classes and are thus
suitable candidates for being generic statistical laws of complex networks. In
fact, we find that generic, not model-related dependencies between different
network characteristics do exist. This allows, for instance, to infer global
features from local ones using regression models trained on networks with high
generalisation power. Our results confirm and extend previous findings
regarding the synchronisation properties of neural networks. Our method seems
especially relevant for large networks, which are difficult to map completely,
like the neural networks in the brain. The structure of such large networks
cannot be fully sampled with the present technology. Our approach provides a
method to estimate global properties of under-sampled networks with good
approximation. Finally, we demonstrate on three different data sets (C.
elegans' neuronal network, R. prowazekii's metabolic network, and a network of
synonyms extracted from Roget's Thesaurus) that real-world networks have
statistical relations compatible with those obtained using regression models
Poisson noise reduction with non-local PCA
Photon-limited imaging arises when the number of photons collected by a
sensor array is small relative to the number of detector elements. Photon
limitations are an important concern for many applications such as spectral
imaging, night vision, nuclear medicine, and astronomy. Typically a Poisson
distribution is used to model these observations, and the inherent
heteroscedasticity of the data combined with standard noise removal methods
yields significant artifacts. This paper introduces a novel denoising algorithm
for photon-limited images which combines elements of dictionary learning and
sparse patch-based representations of images. The method employs both an
adaptation of Principal Component Analysis (PCA) for Poisson noise and recently
developed sparsity-regularized convex optimization algorithms for
photon-limited images. A comprehensive empirical evaluation of the proposed
method helps characterize the performance of this approach relative to other
state-of-the-art denoising methods. The results reveal that, despite its
conceptual simplicity, Poisson PCA-based denoising appears to be highly
competitive in very low light regimes.Comment: erratum: Image man is wrongly name pepper in the journal versio
Cluster-scaling, chaotic order and coherence in DNA
Different numerical mappings of the DNA sequences have been studied using a
new cluster-scaling method and the well known spectral methods. It is shown, in
particular, that the nucleotide sequences in DNA molecules have robust
cluster-scaling properties. These properties are relevant to both types of
nucleotide pair-bases interactions: hydrogen bonds and stacking interactions.
It is shown that taking into account the cluster-scaling properties can help to
improve heterogeneous models of the DNA dynamics. It is also shown that a
chaotic (deterministic) order, rather than a stochastic randomness, controls
the energy minima positions of the stacking interactions in the DNA sequences
on large scales. The chaotic order results in a large-scale chaotic coherence
between the two complimentary DNA-duplex's sequences. A competition between
this broad-band chaotic coherence and the resonance coherence produced by
genetic code has been briefly discussed. The Arabidopsis plant genome (which is
a model plant for genome analysis) and two human genes: BRCA2 and NRXN1, have
been considered as examples.Comment: extended. arXiv admin note: substantial text overlap with
arXiv:1008.135
- …