122,411 research outputs found
Soft clustering analysis of galaxy morphologies: A worked example with SDSS
Context: The huge and still rapidly growing amount of galaxies in modern sky
surveys raises the need of an automated and objective classification method.
Unsupervised learning algorithms are of particular interest, since they
discover classes automatically. Aims: We briefly discuss the pitfalls of
oversimplified classification methods and outline an alternative approach
called "clustering analysis". Methods: We categorise different classification
methods according to their capabilities. Based on this categorisation, we
present a probabilistic classification algorithm that automatically detects the
optimal classes preferred by the data. We explore the reliability of this
algorithm in systematic tests. Using a small sample of bright galaxies from the
SDSS, we demonstrate the performance of this algorithm in practice. We are able
to disentangle the problems of classification and parametrisation of galaxy
morphologies in this case. Results: We give physical arguments that a
probabilistic classification scheme is necessary. The algorithm we present
produces reasonable morphological classes and object-to-class assignments
without any prior assumptions. Conclusions: There are sophisticated automated
classification algorithms that meet all necessary requirements, but a lot of
work is still needed on the interpretation of the results.Comment: 18 pages, 19 figures, 2 tables, submitted to A
The Impact of Network Flows on Community Formation in Models of Opinion Dynamics
We study dynamics of opinion formation in a network of coupled agents. As the
network evolves to a steady state, opinions of agents within the same community
converge faster than those of other agents. This framework allows us to study
how network topology and network flow, which mediates the transfer of opinions
between agents, both affect the formation of communities. In traditional models
of opinion dynamics, agents are coupled via conservative flows, which result in
one-to-one opinion transfer. However, social interactions are often
non-conservative, resulting in one-to-many transfer of opinions. We study
opinion formation in networks using one-to-one and one-to-many interactions and
show that they lead to different community structure within the same network.Comment: accepted for publication in The Journal of Mathematical Sociology.
arXiv admin note: text overlap with arXiv:1201.238
Outlier Edge Detection Using Random Graph Generation Models and Applications
Outliers are samples that are generated by different mechanisms from other
normal data samples. Graphs, in particular social network graphs, may contain
nodes and edges that are made by scammers, malicious programs or mistakenly by
normal users. Detecting outlier nodes and edges is important for data mining
and graph analytics. However, previous research in the field has merely focused
on detecting outlier nodes. In this article, we study the properties of edges
and propose outlier edge detection algorithms using two random graph generation
models. We found that the edge-ego-network, which can be defined as the induced
graph that contains two end nodes of an edge, their neighboring nodes and the
edges that link these nodes, contains critical information to detect outlier
edges. We evaluated the proposed algorithms by injecting outlier edges into
some real-world graph data. Experiment results show that the proposed
algorithms can effectively detect outlier edges. In particular, the algorithm
based on the Preferential Attachment Random Graph Generation model consistently
gives good performance regardless of the test graph data. Further more, the
proposed algorithms are not limited in the area of outlier edge detection. We
demonstrate three different applications that benefit from the proposed
algorithms: 1) a preprocessing tool that improves the performance of graph
clustering algorithms; 2) an outlier node detection algorithm; and 3) a novel
noisy data clustering algorithm. These applications show the great potential of
the proposed outlier edge detection techniques.Comment: 14 pages, 5 figures, journal pape
Fast Approximate Spectral Clustering for Dynamic Networks
Spectral clustering is a widely studied problem, yet its complexity is
prohibitive for dynamic graphs of even modest size. We claim that it is
possible to reuse information of past cluster assignments to expedite
computation. Our approach builds on a recent idea of sidestepping the main
bottleneck of spectral clustering, i.e., computing the graph eigenvectors, by
using fast Chebyshev graph filtering of random signals. We show that the
proposed algorithm achieves clustering assignments with quality approximating
that of spectral clustering and that it can yield significant complexity
benefits when the graph dynamics are appropriately bounded
IMPACT: Investigation of Mobile-user Patterns Across University Campuses using WLAN Trace Analysis
We conduct the most comprehensive study of WLAN traces to date. Measurements
collected from four major university campuses are analyzed with the aim of
developing fundamental understanding of realistic user behavior in wireless
networks. Both individual user and inter-node (group) behaviors are
investigated and two classes of metrics are devised to capture the underlying
structure of such behaviors.
For individual user behavior we observe distinct patterns in which most users
are 'on' for a small fraction of the time, the number of access points visited
is very small and the overall on-line user mobility is quite low. We clearly
identify categories of heavy and light users. In general, users exhibit high
degree of similarity over days and weeks.
For group behavior, we define metrics for encounter patterns and friendship.
Surprisingly, we find that a user, on average, encounters less than 6% of the
network user population within a month, and that encounter and friendship
relations are highly asymmetric. We establish that number of encounters follows
a biPareto distribution, while friendship indexes follow an exponential
distribution. We capture the encounter graph using a small world model, the
characteristics of which reach steady state after only one day.
We hope for our study to have a great impact on realistic modeling of network
usage and mobility patterns in wireless networks.Comment: 16 pages, 31 figure
EC3: Combining Clustering and Classification for Ensemble Learning
Classification and clustering algorithms have been proved to be successful
individually in different contexts. Both of them have their own advantages and
limitations. For instance, although classification algorithms are more powerful
than clustering methods in predicting class labels of objects, they do not
perform well when there is a lack of sufficient manually labeled reliable data.
On the other hand, although clustering algorithms do not produce label
information for objects, they provide supplementary constraints (e.g., if two
objects are clustered together, it is more likely that the same label is
assigned to both of them) that one can leverage for label prediction of a set
of unknown objects. Therefore, systematic utilization of both these types of
algorithms together can lead to better prediction performance. In this paper,
We propose a novel algorithm, called EC3 that merges classification and
clustering together in order to support both binary and multi-class
classification. EC3 is based on a principled combination of multiple
classification and multiple clustering methods using an optimization function.
We theoretically show the convexity and optimality of the problem and solve it
by block coordinate descent method. We additionally propose iEC3, a variant of
EC3 that handles imbalanced training data. We perform an extensive experimental
analysis by comparing EC3 and iEC3 with 14 baseline methods (7 well-known
standalone classifiers, 5 ensemble classifiers, and 2 existing methods that
merge classification and clustering) on 13 standard benchmark datasets. We show
that our methods outperform other baselines for every single dataset, achieving
at most 10% higher AUC. Moreover our methods are faster (1.21 times faster than
the best baseline), more resilient to noise and class imbalance than the best
baseline method.Comment: 14 pages, 7 figures, 11 table
- …