14 research outputs found
Exploring Communities in Large Profiled Graphs
Given a graph and a vertex , the community search (CS) problem
aims to efficiently find a subgraph of whose vertices are closely related
to . Communities are prevalent in social and biological networks, and can be
used in product advertisement and social event recommendation. In this paper,
we study profiled community search (PCS), where CS is performed on a profiled
graph. This is a graph in which each vertex has labels arranged in a
hierarchical manner. Extensive experiments show that PCS can identify
communities with themes that are common to their vertices, and is more
effective than existing CS approaches. As a naive solution for PCS is highly
expensive, we have also developed a tree index, which facilitate efficient and
online solutions for PCS
Core Decomposition in Multilayer Networks: Theory, Algorithms, and Applications
Multilayer networks are a powerful paradigm to model complex systems, where
multiple relations occur between the same entities. Despite the keen interest
in a variety of tasks, algorithms, and analyses in this type of network, the
problem of extracting dense subgraphs has remained largely unexplored so far.
In this work we study the problem of core decomposition of a multilayer
network. The multilayer context is much challenging as no total order exists
among multilayer cores; rather, they form a lattice whose size is exponential
in the number of layers. In this setting we devise three algorithms which
differ in the way they visit the core lattice and in their pruning techniques.
We then move a step forward and study the problem of extracting the
inner-most (also known as maximal) cores, i.e., the cores that are not
dominated by any other core in terms of their core index in all the layers.
Inner-most cores are typically orders of magnitude less than all the cores.
Motivated by this, we devise an algorithm that effectively exploits the
maximality property and extracts inner-most cores directly, without first
computing a complete decomposition.
Finally, we showcase the multilayer core-decomposition tool in a variety of
scenarios and problems. We start by considering the problem of densest-subgraph
extraction in multilayer networks. We introduce a definition of multilayer
densest subgraph that trades-off between high density and number of layers in
which the high density holds, and exploit multilayer core decomposition to
approximate this problem with quality guarantees. As further applications, we
show how to utilize multilayer core decomposition to speed-up the extraction of
frequent cross-graph quasi-cliques and to generalize the community-search
problem to the multilayer setting
DMCS : Density Modularity based Community Search
Community Search, or finding a connected subgraph (known as a community)
containing the given query nodes in a social network, is a fundamental problem.
Most of the existing community search models only focus on the internal
cohesiveness of a community. However, a high-quality community often has high
modularity, which means dense connections inside communities and sparse
connections to the nodes outside the community. In this paper, we conduct a
pioneer study on searching a community with high modularity. We point out that
while modularity has been popularly used in community detection (without query
nodes), it has not been adopted for community search, surprisingly, and its
application in community search (related to query nodes) brings in new
challenges. We address these challenges by designing a new graph modularity
function named Density Modularity. To the best of our knowledge, this is the
first work on the community search problem using graph modularity. The
community search based on the density modularity, termed as DMCS, is to find a
community in a social network that contains all the query nodes and has high
density-modularity. We prove that the DMCS problem is NP-hard. To efficiently
address DMCS, we present new algorithms that run in log-linear time to the
graph size. We conduct extensive experimental studies in real-world and
synthetic networks, which offer insights into the efficiency and effectiveness
of our algorithms. In particular, our algorithm achieves up to 8.5 times higher
accuracy in terms of NMI than baseline algorithms
Towards a "Swiss Army Knife" for Scalable User-Defined Temporal -Core Analysis
Querying cohesive subgraphs on temporal graphs (e.g., social network, finance
network, etc.) with various conditions has attracted intensive research
interests recently. In this paper, we study a novel Temporal
-Core Query (TXCQ) that extends a fundamental Temporal
-Core Query (TCQ) proposed in our conference paper by optimizing or
constraining an arbitrary metric of -core, such as size,
engagement, interaction frequency, time span, burstiness, periodicity, etc. Our
objective is to address specific TXCQ instances with conditions on different
in a unified algorithm framework that guarantees scalability. For
that, this journal paper proposes a taxonomy of measurement
and achieve our objective using a two-phase framework
while is time-insensitive or time-monotonic. Specifically,
Phase 1 still leverages the query processing algorithm of TCQ to induce all
distinct -cores during a given time range, and meanwhile locates the "time
zones" in which the cores emerge. Then, Phase 2 conducts fast local search and
evaluation in each time zone with respect to the time
insensitivity or monotonicity of . By revealing two
insightful concepts named tightest time interval and loosest time interval that
bound time zones, the redundant core induction and unnecessary
evaluation in a zone can be reduced dramatically. Our experimental results
demonstrate that TXCQ can be addressed as efficiently as TCQ, which achieves
the latest state-of-the-art performance, by using a general algorithm framework
that leaves as a user-defined function
Span-core Decomposition for Temporal Networks: Algorithms and Applications
When analyzing temporal networks, a fundamental task is the identification of
dense structures (i.e., groups of vertices that exhibit a large number of
links), together with their temporal span (i.e., the period of time for which
the high density holds). In this paper we tackle this task by introducing a
notion of temporal core decomposition where each core is associated with two
quantities, its coreness, which quantifies how densely it is connected, and its
span, which is a temporal interval: we call such cores \emph{span-cores}.
For a temporal network defined on a discrete temporal domain , the total
number of time intervals included in is quadratic in , so that the
total number of span-cores is potentially quadratic in as well. Our first
main contribution is an algorithm that, by exploiting containment properties
among span-cores, computes all the span-cores efficiently. Then, we focus on
the problem of finding only the \emph{maximal span-cores}, i.e., span-cores
that are not dominated by any other span-core by both their coreness property
and their span. We devise a very efficient algorithm that exploits theoretical
findings on the maximality condition to directly extract the maximal ones
without computing all span-cores.
Finally, as a third contribution, we introduce the problem of \emph{temporal
community search}, where a set of query vertices is given as input, and the
goal is to find a set of densely-connected subgraphs containing the query
vertices and covering the whole underlying temporal domain . We derive a
connection between this problem and the problem of finding (maximal)
span-cores. Based on this connection, we show how temporal community search can
be solved in polynomial-time via dynamic programming, and how the maximal
span-cores can be profitably exploited to significantly speed-up the basic
algorithm.Comment: ACM Transactions on Knowledge Discovery from Data (TKDD), 2020. arXiv
admin note: substantial text overlap with arXiv:1808.0937
Yksityisyyden turvaavia protokollia verkkoliikenteen suojaamiseen
Digital technologies have become an essential part of our lives. In many parts of the world, activities such as socializing, providing health care, leisure and education are entirely or partially relying on the internet. Moreover, the COVID-19 world pandemic has also contributed significantly to our dependency on the on-line world.
While the advancement of the internet brings many advantages, there are also disadvantages such as potential loss of privacy and security. While the users enjoy surfing on the web, service providers may collect a variety of information about their users, such as the users’ location, gender, and religion. Moreover, the attackers may try to violate the users’ security, for example, by infecting the users’ devices with malware.
In this PhD dissertation, to provide means to protect networking we propose several privacy-preserving protocols. Our protocols empower internet users to get a variety of services, while at the same time ensuring users’ privacy and security in the digital world. In other words, we design our protocols such that the users only share the amount of information with the service providers that is absolutely necessary to gain the service. Moreover, our protocols only add minimal additional time and communication costs, while leveraging cryptographic schemes to ensure users’ privacy and security.
The dissertation contains two main themes of protocols: privacy-preserving set operations and privacy-preserving graph queries. These protocols can be applied to a variety of application areas. We delve deeper into three application areas: privacy-preserving technologies for malware protection, protection of remote access, and protecting minors.Digitaaliteknologiasta on tullut oleellinen osa ihmisten elämää. Monissa osissa maailmaa sellaiset toiminnot kuten terveydenhuolto, vapaa-ajan vietto ja opetus ovat osittain tai kokonaan riippuvaisia internetistä. Lisäksi COVID-19 -pandemia on lisännyt ihmisten riippuvuutta tietoverkoista.
Vaikkakin internetin kehittyminen on tuonut paljon hyvää, se on tuonut mukanaan myös haasteita yksityisyydelle ja tietoturvalle. Käyttäjien selatessa verkkoa palveluntarjoajat voivat kerätä käyttäjästä monenlaista tietoa,
kuten esimerkiksi käyttäjän sijainnin, sukupuolen ja uskonnon. Lisäksi hyökkääjät voivat yrittää murtaa käyttäjän tietoturvan esimerkiksi asentamalla hänen koneelleen haittaohjelmia.
Tässä väitöskirjassa esitellään useita turvallisuutta suojaavia protokollia tietoverkossa tapahtuvan toiminnan turvaamiseen. Nämä protokollat mahdollistavat internetin käytön monilla tavoilla samalla kun ne turvaavat käyttäjän yksityisyyden ja tietoturvan digitaalisessa maailmassa. Toisin sanoen nämä protokollat on suunniteltu siten, että käyttäjät jakavat palveluntarjoajille vain sen tiedon, joka on ehdottoman välttämätöntä palvelun tuottamiseksi. Protokollat käyttävät kryptografisia menetelmiä käyttäjän yksityisyyden sekä tietoturvan varmistamiseksi, ja ne hidastavat kommunikaatiota mahdollisimman vähän.
Tämän väitöskirjan sisältämät protokollat voidaan jakaa kahteen eri teemaan: protokollat yksityisyyden suojaaville joukko-operaatioille ja protokollat yksityisyyden suojaaville graafihauille. Näitä protokollia voidaan soveltaa useilla aloilla. Näistä aloista väitöskirjassa käsitellään tarkemmin haittaohjelmilta suojautumista, etäyhteyksien suojaamista ja alaikäisten suojelemista
Identifying High-Coverage Communities in Edge-Weighted Networks
Την τελευταία δεκαετία, η αναζήτηση κοινοτήτων έχει συγκεντρώσει μεγάλη απήχηση σε επιστημονικά πεδία όπως η ανάλυση κοινωνικών και βιολογικών δικτύων. Σχετικές μελέτες χρησιμοποιούν μη σταθμισμένους γράφους για να αναπαριστούν υποκείμενες δομές και στοχεύουν στην εύρεση κοινωτήτων με υψηλή συνοχή. Παράλληλα, νέες έρευνες έχουν επικεντρωθεί στην αναζήτηση κοινοτήτων των οποίων τα μέλη 1) πληρούν ένα σύνολο προκαθορισμένων περιορισμών και 2) συλλογικά μεγιστοποιούν την τιμή μια συνάρτησης. Παρα το γεγονός ότι πλήθος δικτύων του πραγματικού κόσμου διαθέτουν ακμές με βάρη καθώς και κόμβους που σχετίζονται με ένα σύνολο χαρακτηριστικών, οι παραπάνω ήδη καταβληθείσες προσπάθειες επικεντρώνονται κυρίως σε μη σταθμισμένα δίκτυα χωρίς χαρακτηριστικά στους κόμβους. Σε αυτή τη διπλωματική, διερευνούμε μια παραλλαγή του προβλήματος αναζήτησης κοινοτήτων για μη κατευθυνόμενα δίκτυα, με βάρη στις ακμές και κόμβους που διαθέτουν ένα σύνολο χαρακτηριστικών. Δοθέντων ενός γράφου G, ενός συνόλου αρχικών κόμβων, ένα άνω όριο h ως προς το μέγεθος της επιστρεπτέας λύσης, καθώς και ένα κάτω φράγμα s ως προς την συνεκτικότητα, στοχεύουμε στην εύρεση ενός συνδεδεμένου υπογράφου του G ο οποίος: 1) περιέχει τους αρχικούς κόμβους, 2) το μέγεθος της κοινότητας που προσδιορίζεται είναι το πολύ h, 3) το μέτρο συνοχής είναι τουλάχιστον s και 4) ο συνολικός αριθμός των διαφορετικών χαρακτηριστικών που καλύπτονται από τους κόμβους της λύσης μεγιστοποιείται. Ονομάζουμε αυτό το πρόβλημα Αναζήτηση Κοινωτήτων Υψηλής Κάλυψης σε Δίκτυα με Βάρη Ακμών (WCCS). Σε αυτή την διπλωματική, εκμεταλλευόμαστε την πληροφορία που προέρχεται από τα βάρη των ακμών για να ποσοτικοποιήσουμε το ελάχιστο άθροισμα των βαρών που πρέπει να έχει κάποιος κόμβος σε κάθε υποψήφιο υπογράφημα. Υπό αυτές τις συνθήκες, αυτό το ελάχιστο άθροισμα των βαρών, χρησιμεύει ως μέτρο συνοχής. Δείχνουμε ότι Αναζήτηση Κοινωτήτων Υψηλής Κάλυψης σε Δίκτυα με Βάρη Ακμών (WCCS) είναι ένα NP-δύσκολο πρόβλημα όταν πρόκειται για γενικευμένα δίκτυα και ως εκ τούτου, προτείνουμε τρεις προσεγγίσεις για την αντιμετώπιση του εν λόγω προβλήματος. Πειραματικά αποτελέσματα έξι σύνολου δεδομένων πραγματικού κόσμου, δείχνουν ότι παρά τη δυσκολία του προβλήματός μας, μπορούμε αποδοτικά να εντοπίουμε λύσεις που παρέχουν αποτελεσματική κάλυψη.Over the past decade, community search has garnered massive appeal in the areas of social and biology network analysis. Pertinent studies have utilized unweighted graphs to represent underlying structures and seek to reveal highly-cohesive formed groups. Concurrent initiatives have focused on the search for communities whose members 1) comply with designated constraint(s) and 2) collectively present maximization of a score function. Despite the fact that a multitude of real-world networks feature both weighted edges and node attributes, the above already expended efforts focus mostly on unweighted networks without node attributes. In this thesis, we investigate a variant of the community search problem for undirected, edge-weighted, and node-attributed networks modeled as graphs. Given a weighted graph G, a query set of seed nodes Q, a community size constraint h, and a connectivity constraint s, we aim to find a connected subgraph of G that: 1) contains the seed nodes, 2) the size of the community identified is at most h, 3) its cohesiveness measure is at least s and 4) its total number of associated elements is maximized. We term this problem Weighted Covering Community Search (WCCS). In this thesis, we exploit edge-weight-information to quantify the minimum strength within each candidate subgraph considered. In this regard, this minimum strength serves as our cohesiveness measure. We show that the Weighted Covering Community Search (WCCS) is an NP-hard problem when it comes to generalized networks and therefore, we suggest three approaches to address the problem in question. Experimental results with six realworld datasets point to the fact that despite the hardness of our problem, we can efficiently identify solutions that render effective coverage
Attribute-Driven Community Search
Recently, community search over graphs has gained significant interest. In applications such as analysis of protein-protein interaction (PPI) networks, citation graphs, and collaboration networks, nodes tend to have attributes. Unfortunately, most previous community search algorithms ignore attributes and result in communities with poor cohesion w.r.t. their node attributes. In this paper, we study the problem of attribute-driven community search, that is, given an undirected graph G where nodes are associated with attributes, and an input query Q consisting of nodes Vq and attributes Wq, find the communities containing Vq, in which most community members are densely inter-connected and have similar attributes. We formulate this problem as finding attributed truss communities (ATC), i.e., finding connected and close k-truss subgraphs containing Vq, with the largest attribute relevance score. We design a framework of desirable properties that good score function should satisfy. We show that the problem is NP-hard. However, we develop an efficient greedy algorithmic framework to iteratively remove nodes with the least popular attributes, and shrink the graph into an ATC. In addition, we also build an elegant index to maintain k-truss structure and attribute information, and propose efficient query processing algorithms. Extensive experiments on large real-world networks with ground-truth communities show that our algorithms significantly outperform the state of the art and demonstrates their efficiency and effectiveness