124,816 research outputs found
Stream-dashboard : a big data stream clustering framework with applications to social media streams.
Data mining is concerned with detecting patterns of data in raw datasets, which are then used to unearth knowledge that might not have been discovered using conventional querying or statistical methods. This discovered knowledge has been used to empower decision makers in countless applications spanning across many multi-disciplinary areas including business, education, astronomy, security and Information Retrieval to name a few. Many applications generate massive amounts of data continuously and at an increasing rate. This is the case for user activity over social networks such as Facebook and Twitter. This flow of data has been termed, appropriately, a Data Stream, and it introduced a set of new challenges to discover its evolving patterns using data mining techniques. Data stream clustering is concerned with detecting evolving patterns in a data stream using only the similarities between the data points as they arrive without the use of any external information (i.e. unsupervised learning). In this dissertation, we propose a complete and generic framework to simultaneously mine, track and validate clusters in a big data stream (Stream-Dashboard). The proposed framework consists of three main components: an online data stream clustering algorithm, a component for tracking and validation of pattern behavior using regression analysis, and a component that uses the behavioral information about the detected patterns to improve the quality of the clustering algorithm. As a first component, we propose RINO-Streams, an online clustering algorithm that incrementally updates the clustering model using robust statistics and incremental optimization. The second component is a methodology that we call TRACER, which continuously performs a set of statistical tests using regression analysis to track the evolution of the detected clusters, their characteristics and quality metrics. For the last component, we propose a method to build some behavioral profiles for the clustering model over time, that can be used to improve the performance of the online clustering algorithm, such as adapting the initial values of the input parameters. The performance and effectiveness of the proposed framework were validated using extensive experiments, and its use was demonstrated on a challenging real word application, specifically unsupervised mining of evolving cluster stories in one pass from the Twitter social media streams
Prediction of Emerging Technologies Based on Analysis of the U.S. Patent Citation Network
The network of patents connected by citations is an evolving graph, which
provides a representation of the innovation process. A patent citing another
implies that the cited patent reflects a piece of previously existing knowledge
that the citing patent builds upon. A methodology presented here (i) identifies
actual clusters of patents: i.e. technological branches, and (ii) gives
predictions about the temporal changes of the structure of the clusters. A
predictor, called the {citation vector}, is defined for characterizing
technological development to show how a patent cited by other patents belongs
to various industrial fields. The clustering technique adopted is able to
detect the new emerging recombinations, and predicts emerging new technology
clusters. The predictive ability of our new method is illustrated on the
example of USPTO subcategory 11, Agriculture, Food, Textiles. A cluster of
patents is determined based on citation data up to 1991, which shows
significant overlap of the class 442 formed at the beginning of 1997. These new
tools of predictive analytics could support policy decision making processes in
science and technology, and help formulate recommendations for action
Applications of Structural Balance in Signed Social Networks
We present measures, models and link prediction algorithms based on the
structural balance in signed social networks. Certain social networks contain,
in addition to the usual 'friend' links, 'enemy' links. These networks are
called signed social networks. A classical and major concept for signed social
networks is that of structural balance, i.e., the tendency of triangles to be
'balanced' towards including an even number of negative edges, such as
friend-friend-friend and friend-enemy-enemy triangles. In this article, we
introduce several new signed network analysis methods that exploit structural
balance for measuring partial balance, for finding communities of people based
on balance, for drawing signed social networks, and for solving the problem of
link prediction. Notably, the introduced methods are based on the signed graph
Laplacian and on the concept of signed resistance distances. We evaluate our
methods on a collection of four signed social network datasets.Comment: 37 page
Clustering and Community Detection in Directed Networks: A Survey
Networks (or graphs) appear as dominant structures in diverse domains,
including sociology, biology, neuroscience and computer science. In most of the
aforementioned cases graphs are directed - in the sense that there is
directionality on the edges, making the semantics of the edges non symmetric.
An interesting feature that real networks present is the clustering or
community structure property, under which the graph topology is organized into
modules commonly called communities or clusters. The essence here is that nodes
of the same community are highly similar while on the contrary, nodes across
communities present low similarity. Revealing the underlying community
structure of directed complex networks has become a crucial and
interdisciplinary topic with a plethora of applications. Therefore, naturally
there is a recent wealth of research production in the area of mining directed
graphs - with clustering being the primary method and tool for community
detection and evaluation. The goal of this paper is to offer an in-depth review
of the methods presented so far for clustering directed networks along with the
relevant necessary methodological background and also related applications. The
survey commences by offering a concise review of the fundamental concepts and
methodological base on which graph clustering algorithms capitalize on. Then we
present the relevant work along two orthogonal classifications. The first one
is mostly concerned with the methodological principles of the clustering
algorithms, while the second one approaches the methods from the viewpoint
regarding the properties of a good cluster in a directed network. Further, we
present methods and metrics for evaluating graph clustering results,
demonstrate interesting application domains and provide promising future
research directions.Comment: 86 pages, 17 figures. Physics Reports Journal (To Appear
Median evidential c-means algorithm and its application to community detection
Median clustering is of great value for partitioning relational data. In this
paper, a new prototype-based clustering method, called Median Evidential
C-Means (MECM), which is an extension of median c-means and median fuzzy
c-means on the theoretical framework of belief functions is proposed. The
median variant relaxes the restriction of a metric space embedding for the
objects but constrains the prototypes to be in the original data set. Due to
these properties, MECM could be applied to graph clustering problems. A
community detection scheme for social networks based on MECM is investigated
and the obtained credal partitions of graphs, which are more refined than crisp
and fuzzy ones, enable us to have a better understanding of the graph
structures. An initial prototype-selection scheme based on evidential
semi-centrality is presented to avoid local premature convergence and an
evidential modularity function is defined to choose the optimal number of
communities. Finally, experiments in synthetic and real data sets illustrate
the performance of MECM and show its difference to other methods
Complex networks analysis in socioeconomic models
This chapter aims at reviewing complex networks models and methods that were
either developed for or applied to socioeconomic issues, and pertinent to the
theme of New Economic Geography. After an introduction to the foundations of
the field of complex networks, the present summary adds insights on the
statistical mechanical approach, and on the most relevant computational aspects
for the treatment of these systems. As the most frequently used model for
interacting agent-based systems, a brief description of the statistical
mechanics of the classical Ising model on regular lattices, together with
recent extensions of the same model on small-world Watts-Strogatz and
scale-free Albert-Barabasi complex networks is included. Other sections of the
chapter are devoted to applications of complex networks to economics, finance,
spreading of innovations, and regional trade and developments. The chapter also
reviews results involving applications of complex networks to other relevant
socioeconomic issues, including results for opinion and citation networks.
Finally, some avenues for future research are introduced before summarizing the
main conclusions of the chapter.Comment: 39 pages, 185 references, (not final version of) a chapter prepared
for Complexity and Geographical Economics - Topics and Tools, P.
Commendatore, S.S. Kayam and I. Kubin Eds. (Springer, to be published
- …