147,328 research outputs found
StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices
Given a large-scale graph with millions of nodes and edges, how to reveal
macro patterns of interest, like cliques, bi-partite cores, stars, and chains?
Furthermore, how to visualize such patterns altogether getting insights from
the graph to support wise decision-making? Although there are many algorithmic
and visual techniques to analyze graphs, none of the existing approaches is
able to present the structural information of graphs at large-scale. Hence,
this paper describes StructMatrix, a methodology aimed at high-scalable visual
inspection of graph structures with the goal of revealing macro patterns of
interest. StructMatrix combines algorithmic structure detection and adjacency
matrix visualization to present cardinality, distribution, and relationship
features of the structures found in a given graph. We performed experiments in
real, large-scale graphs with up to one million nodes and millions of edges.
StructMatrix revealed that graphs of high relevance (e.g., Web, Wikipedia and
DBLP) have characterizations that reflect the nature of their corresponding
domains; our findings have not been seen in the literature so far. We expect
that our technique will bring deeper insights into large graph mining,
leveraging their use for decision making.Comment: To appear: 8 pages, paper to be published at the Fifth IEEE ICDM
Workshop on Data Mining in Networks, 2015 as Hugo Gualdron, Robson Cordeiro,
Jose Rodrigues (2015) StructMatrix: Large-scale visualization of graphs by
means of structure detection and dense matrices In: The Fifth IEEE ICDM
Workshop on Data Mining in Networks 1--8, IEE
Mining Representative Unsubstituted Graph Patterns Using Prior Similarity Matrix
One of the most powerful techniques to study protein structures is to look
for recurrent fragments (also called substructures or spatial motifs), then use
them as patterns to characterize the proteins under study. An emergent trend
consists in parsing proteins three-dimensional (3D) structures into graphs of
amino acids. Hence, the search of recurrent spatial motifs is formulated as a
process of frequent subgraph discovery where each subgraph represents a spatial
motif. In this scope, several efficient approaches for frequent subgraph
discovery have been proposed in the literature. However, the set of discovered
frequent subgraphs is too large to be efficiently analyzed and explored in any
further process. In this paper, we propose a novel pattern selection approach
that shrinks the large number of discovered frequent subgraphs by selecting the
representative ones. Existing pattern selection approaches do not exploit the
domain knowledge. Yet, in our approach we incorporate the evolutionary
information of amino acids defined in the substitution matrices in order to
select the representative subgraphs. We show the effectiveness of our approach
on a number of real datasets. The results issued from our experiments show that
our approach is able to considerably decrease the number of motifs while
enhancing their interestingness
Constraint-based sequence mining using constraint programming
The goal of constraint-based sequence mining is to find sequences of symbols
that are included in a large number of input sequences and that satisfy some
constraints specified by the user. Many constraints have been proposed in the
literature, but a general framework is still missing. We investigate the use of
constraint programming as general framework for this task. We first identify
four categories of constraints that are applicable to sequence mining. We then
propose two constraint programming formulations. The first formulation
introduces a new global constraint called exists-embedding. This formulation is
the most efficient but does not support one type of constraint. To support such
constraints, we develop a second formulation that is more general but incurs
more overhead. Both formulations can use the projected database technique used
in specialised algorithms. Experiments demonstrate the flexibility towards
constraint-based settings and compare the approach to existing methods.Comment: In Integration of AI and OR Techniques in Constraint Programming
(CPAIOR), 201
Datamining for Web-Enabled Electronic Business Applications
Web-Enabled Electronic Business is generating massive amount of data on customer purchases, browsing patterns, usage times and preferences at an increasing rate. Data mining techniques can be applied to all the data being collected for obtaining useful information. This chapter attempts to present issues associated with data mining for web-enabled electronic-business
- …