3,192 research outputs found
Descriptive Modeling of Social Networks
AbstractThese last years, many analysis methods have been proposed to extract knowledge from social networks. As for the traditional data mining domain, these network-based approaches can be classified according to two main families. The approaches based on predictive modelling, which encompass the techniques that analyse current and historical facts to make predictive assumptions about future or unknown events. The approaches based on descriptive modelling, which cover the set of techniques that aim to summarize the data by identifying some relevant features in order to describe how things organize and actually work. In this paper, we review the main descriptive modelling methods of social networks and show for each of them the resulting useful knowledge on a running example. We particularly emphasize on the most recent methods that combine information available on both the network structure and the node attributes in order to provide original description models taking into account the context
Core Decomposition in Multilayer Networks: Theory, Algorithms, and Applications
Multilayer networks are a powerful paradigm to model complex systems, where
multiple relations occur between the same entities. Despite the keen interest
in a variety of tasks, algorithms, and analyses in this type of network, the
problem of extracting dense subgraphs has remained largely unexplored so far.
In this work we study the problem of core decomposition of a multilayer
network. The multilayer context is much challenging as no total order exists
among multilayer cores; rather, they form a lattice whose size is exponential
in the number of layers. In this setting we devise three algorithms which
differ in the way they visit the core lattice and in their pruning techniques.
We then move a step forward and study the problem of extracting the
inner-most (also known as maximal) cores, i.e., the cores that are not
dominated by any other core in terms of their core index in all the layers.
Inner-most cores are typically orders of magnitude less than all the cores.
Motivated by this, we devise an algorithm that effectively exploits the
maximality property and extracts inner-most cores directly, without first
computing a complete decomposition.
Finally, we showcase the multilayer core-decomposition tool in a variety of
scenarios and problems. We start by considering the problem of densest-subgraph
extraction in multilayer networks. We introduce a definition of multilayer
densest subgraph that trades-off between high density and number of layers in
which the high density holds, and exploit multilayer core decomposition to
approximate this problem with quality guarantees. As further applications, we
show how to utilize multilayer core decomposition to speed-up the extraction of
frequent cross-graph quasi-cliques and to generalize the community-search
problem to the multilayer setting
Link communities reveal multiscale complexity in networks
Networks have become a key approach to understanding systems of interacting
objects, unifying the study of diverse phenomena including biological organisms
and human society. One crucial step when studying the structure and dynamics of
networks is to identify communities: groups of related nodes that correspond to
functional subunits such as protein complexes or social spheres. Communities in
networks often overlap such that nodes simultaneously belong to several groups.
Meanwhile, many networks are known to possess hierarchical organization, where
communities are recursively grouped into a hierarchical structure. However, the
fact that many real networks have communities with pervasive overlap, where
each and every node belongs to more than one group, has the consequence that a
global hierarchy of nodes cannot capture the relationships between overlapping
groups. Here we reinvent communities as groups of links rather than nodes and
show that this unorthodox approach successfully reconciles the antagonistic
organizing principles of overlapping communities and hierarchy. In contrast to
the existing literature, which has entirely focused on grouping nodes, link
communities naturally incorporate overlap while revealing hierarchical
organization. We find relevant link communities in many networks, including
major biological networks such as protein-protein interaction and metabolic
networks, and show that a large social network contains hierarchically
organized community structures spanning inner-city to regional scales while
maintaining pervasive overlap. Our results imply that link communities are
fundamental building blocks that reveal overlap and hierarchical organization
in networks to be two aspects of the same phenomenon.Comment: Main text and supplementary informatio
BlogForever D2.6: Data Extraction Methodology
This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform
An Enhanced Web Data Learning Method for Integrating Item, Tag and Value for Mining Web Contents
The Proposed System Analyses the scopes introduced by Web 2.0 and collaborative tagging systems, several challenges have to be addressed too, notably, the problem of information overload. Recommender systems are among the most successful approaches for increasing the level of relevant content over the 201C;noise.201D; Traditional recommender systems fail to address the requirements presented in collaborative tagging systems. This paper considers the problem of item recommendation in collaborative tagging systems. It is proposed to model data from collaborative tagging systems with three-mode tensors, in order to capture the three-way correlations between users, tags, and items. By applying multiway analysis, latent correlations are revealed, which help to improve the quality of recommendations. Moreover, a hybrid scheme is proposed that additionally considers content-based information that is extracted from items. We propose an advanced data mining method using SVD that combines both tag and value similarity, item and user preference. SVD automatically extracts data from query result pages by first identifying and segmenting the query result records in the query result pages and then aligning the segmented query result records into a table, in which the data values from the same attribute are put into the same column. Specifically, we propose new techniques to handle the case when the query result records based on user preferences, which may be due to the presence of auxiliary information, such as a comment, recommendation or advertisement, and for handling any nested-structure that may exist in the query result records
A survey of frequent subgraph mining algorithms
AbstractGraph mining is an important research area within the domain of data mining. The field of study concentrates on the identification of frequent subgraphs within graph data sets. The research goals are directed at: (i) effective mechanisms for generating candidate subgraphs (without generating duplicates) and (ii) how best to process the generated candidate subgraphs so as to identify the desired frequent subgraphs in a way that is computationally efficient and procedurally effective. This paper presents a survey of current research in the field of frequent subgraph mining and proposes solutions to address the main research issues.</jats:p
SALSA: A Novel Dataset for Multimodal Group Behavior Analysis
Studying free-standing conversational groups (FCGs) in unstructured social
settings (e.g., cocktail party ) is gratifying due to the wealth of information
available at the group (mining social networks) and individual (recognizing
native behavioral and personality traits) levels. However, analyzing social
scenes involving FCGs is also highly challenging due to the difficulty in
extracting behavioral cues such as target locations, their speaking activity
and head/body pose due to crowdedness and presence of extreme occlusions. To
this end, we propose SALSA, a novel dataset facilitating multimodal and
Synergetic sociAL Scene Analysis, and make two main contributions to research
on automated social interaction analysis: (1) SALSA records social interactions
among 18 participants in a natural, indoor environment for over 60 minutes,
under the poster presentation and cocktail party contexts presenting
difficulties in the form of low-resolution images, lighting variations,
numerous occlusions, reverberations and interfering sound sources; (2) To
alleviate these problems we facilitate multimodal analysis by recording the
social interplay using four static surveillance cameras and sociometric badges
worn by each participant, comprising the microphone, accelerometer, bluetooth
and infrared sensors. In addition to raw data, we also provide annotations
concerning individuals' personality as well as their position, head, body
orientation and F-formation information over the entire event duration. Through
extensive experiments with state-of-the-art approaches, we show (a) the
limitations of current methods and (b) how the recorded multiple cues
synergetically aid automatic analysis of social interactions. SALSA is
available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure
Recommended from our members
Strategy and methodology for enterprise data warehouse development. Integrating data mining and social networking techniques for identifying different communities within the data warehouse.
Data warehouse technology has been successfully integrated into the information
infrastructure of major organizations as potential solution for eliminating redundancy and
providing for comprehensive data integration. Realizing the importance of a data
warehouse as the main data repository within an organization, this dissertation addresses
different aspects related to the data warehouse architecture and performance issues.
Many data warehouse architectures have been presented by industry analysts and
research organizations. These architectures vary from the independent and physical
business unit centric data marts to the centralised two-tier hub-and-spoke data warehouse.
The operational data store is a third tier which was offered later to address the business
requirements for inter-day data loading. While the industry-available architectures are all
valid, I found them to be suboptimal in efficiency (cost) and effectiveness (productivity).
In this dissertation, I am advocating a new architecture (The Hybrid Architecture)
which encompasses the industry advocated architecture. The hybrid architecture demands
the acquisition, loading and consolidation of enterprise atomic and detailed data into a
single integrated enterprise data store (The Enterprise Data Warehouse) where businessunit
centric Data Marts and Operational Data Stores (ODS) are built in the same instance
of the Enterprise Data Warehouse.
For the purpose of highlighting the role of data warehouses for different
applications, we describe an effort to develop a data warehouse for a geographical
information system (GIS). We further study the importance of data practices, quality and
governance for financial institutions by commenting on the RBC Financial Group case.
v
The development and deployment of the Enterprise Data Warehouse based on the
Hybrid Architecture spawned its own issues and challenges. Organic data growth and
business requirements to load additional new data significantly will increase the amount
of stored data. Consequently, the number of users will increase significantly. Enterprise
data warehouse obesity, performance degradation and navigation difficulties are chief
amongst the issues and challenges.
Association rules mining and social networks have been adopted in this thesis to
address the above mentioned issues and challenges. We describe an approach that uses
frequent pattern mining and social network techniques to discover different communities
within the data warehouse. These communities include sets of tables frequently accessed
together, sets of tables retrieved together most of the time and sets of attributes that
mostly appear together in the queries. We concentrate on tables in the discussion;
however, the model is general enough to discover other communities. We first build a
frequent pattern mining model by considering each query as a transaction and the tables
as items. Then, we mine closed frequent itemsets of tables; these itemsets include tables
that are mostly accessed together and hence should be treated as one unit in storage and
retrieval for better overall performance. We utilize social network construction and
analysis to find maximum-sized sets of related tables; this is a more robust approach as
opposed to a union of overlapping itemsets. We derive the Jaccard distance between the
closed itemsets and construct the social network of tables by adding links that represent
distance above a given threshold. The constructed network is analyzed to discover
communities of tables that are mostly accessed together. The reported test results are
promising and demonstrate the applicability and effectiveness of the developed approach
- …