5,093 research outputs found
Comparing Community Structure to Characteristics in Online Collegiate Social Networks
We study the structure of social networks of students by examining the graphs
of Facebook "friendships" at five American universities at a single point in
time. We investigate each single-institution network's community structure and
employ graphical and quantitative tools, including standardized pair-counting
methods, to measure the correlations between the network communities and a set
of self-identified user characteristics (residence, class year, major, and high
school). We review the basic properties and statistics of the pair-counting
indices employed and recall, in simplified notation, a useful analytical
formula for the z-score of the Rand coefficient. Our study illustrates how to
examine different instances of social networks constructed in similar
environments, emphasizes the array of social forces that combine to form
"communities," and leads to comparative observations about online social lives
that can be used to infer comparisons about offline social structures. In our
illustration of this methodology, we calculate the relative contributions of
different characteristics to the community structure of individual universities
and subsequently compare these relative contributions at different
universities, measuring for example the importance of common high school
affiliation to large state universities and the varying degrees of influence
common major can have on the social structure at different universities. The
heterogeneity of communities that we observe indicates that these networks
typically have multiple organizing factors rather than a single dominant one.Comment: Version 3 (17 pages, 5 multi-part figures), accepted in SIAM Revie
Multilayer Networks
In most natural and engineered systems, a set of entities interact with each
other in complicated patterns that can encompass multiple types of
relationships, change in time, and include other types of complications. Such
systems include multiple subsystems and layers of connectivity, and it is
important to take such "multilayer" features into account to try to improve our
understanding of complex systems. Consequently, it is necessary to generalize
"traditional" network theory by developing (and validating) a framework and
associated tools to study multilayer systems in a comprehensive fashion. The
origins of such efforts date back several decades and arose in multiple
disciplines, and now the study of multilayer networks has become one of the
most important directions in network science. In this paper, we discuss the
history of multilayer networks (and related concepts) and review the exploding
body of work on such networks. To unify the disparate terminology in the large
body of recent work, we discuss a general framework for multilayer networks,
construct a dictionary of terminology to relate the numerous existing concepts
to each other, and provide a thorough discussion that compares, contrasts, and
translates between related notions such as multilayer networks, multiplex
networks, interdependent networks, networks of networks, and many others. We
also survey and discuss existing data sets that can be represented as
multilayer networks. We review attempts to generalize single-layer-network
diagnostics to multilayer networks. We also discuss the rapidly expanding
research on multilayer-network models and notions like community structure,
connected components, tensor decompositions, and various types of dynamical
processes on multilayer networks. We conclude with a summary and an outlook.Comment: Working paper; 59 pages, 8 figure
Empirical Comparative Analysis of 1-of-K Coding and K-Prototypes in Categorical Clustering
Clustering is a fundamental machine learning application, which partitions data into homogeneous groups. K-means and its variants are the most widely used class of clustering algorithms today. However, the original k-means algorithm can only be applied to numeric data. For categorical data, the data has to be converted into numeric data through 1-of-K coding which itself causes many problems. K-prototypes, another clustering algorithm that originates from the k-means algorithm, can handle categorical data by adopting a different notion of distance. In this paper, we systematically compare these two methods through an experimental analysis. Our analysis shows that K-prototypes is more suited when the dataset is large-scaled, while the performance of k-means with 1-of-K coding is more stable. We believe these are useful heuristics for clustering methods working with highly categorical data
Intrusion detection using clustering
In increasing trends of network environment every one gets connected to the system. So there is need of securing information, because there are lots of security threats are present in network environment. A number of techniques are available for intrusion detection. Data mining is the one of the efficient techniques available for intrusion detection. Data mining techniques may be supervised or unsuprevised.Various Author have applied various clustering algorithm for intrusion detection, but all of these are suffers form class dominance, force assignment and No Class problem. This paper proposes a hybrid model to overcome these problems. The performance of proposed model is evaluated over KDD Cup 1999 data set
- …