164,293 research outputs found
Enumerating Maximal Bicliques from a Large Graph using MapReduce
We consider the enumeration of maximal bipartite cliques (bicliques) from a
large graph, a task central to many practical data mining problems in social
network analysis and bioinformatics. We present novel parallel algorithms for
the MapReduce platform, and an experimental evaluation using Hadoop MapReduce.
Our algorithm is based on clustering the input graph into smaller sized
subgraphs, followed by processing different subgraphs in parallel. Our
algorithm uses two ideas that enable it to scale to large graphs: (1) the
redundancy in work between different subgraph explorations is minimized through
a careful pruning of the search space, and (2) the load on different reducers
is balanced through the use of an appropriate total order among the vertices.
Our evaluation shows that the algorithm scales to large graphs with millions of
edges and tens of mil- lions of maximal bicliques. To our knowledge, this is
the first work on maximal biclique enumeration for graphs of this scale.Comment: A preliminary version of the paper was accepted at the Proceedings of
the 3rd IEEE International Congress on Big Data 201
The News Angler Project: Exploring the Next Generation of Journalistic Knowledge Platforms
The News Angler project aims to support journalists in finding new and unexpected connections and angles in the news. The project therefore explores how recent artificial intelligence (AI) techniques — such as knowledge graphs, natural-language processing (NLP) and machine learning (ML) — can support high-quality journalism that exploits big and open data sources. A central contribution is News Hunter, a series of prototype journalistic knowledge platforms (JKPs)
From engineering models to knowledge graph : delivering new insights into models
Essential information on the early stages of a mission design is contained in Engineering Models. Yet, these models are often uneasy to visualise, query, let alone compare. This study demonstrates how Knowledge Graphs can overcome these data silos, interconnect information, provide a big-picture perspective, and infer new knowledge that would have remained hidden otherwise. Following the migration of CubeSats Engineering Models to a Knowledge Graph, two case studies are explored. The first case study illustrates how graph inference can derive implicit knowledge from existing explicit concepts. In the second case study, a Natural Language Processing layer is adjoined to the Knowledge Graph to enhances the analysis of textual content. The Natural Language Processing layer relies on the document embedding method doc2v
Bench-Ranking: ettekirjutav analüüsimeetod suurte teadmiste graafide päringutele
Relatsiooniliste suurandmete (BD) töötlemisraamistike kasutamine suurte teadmiste graafide töötlemiseks kätkeb endas võimalust päringu jõudlust optimeerimida. Kaasaegsed BD-süsteemid on samas keerulised andmesüsteemid, mille konfiguratsioonid omavad olulist mõju jõudlusele. Erinevate raamistike ja konfiguratsioonide võrdlusuuringud pakuvad kogukonnale parimaid tavasid parema jõudluse saavutamiseks. Enamik neist võrdlusuuringutest saab liigitada siiski vaid kirjeldavaks ja diagnostiliseks analüütikaks. Lisaks puudub ühtne standard nende uuringute võrdlemiseks kvantitatiivselt järjestatud kujul. Veelgi enam, suurte graafide töötlemiseks vajalike konveierite kavandamine eeldab täiendavaid disainiotsuseid mis tulenevad mitteloomulikust (relatsioonilisest) graafi töötlemise paradigmast. Taolisi disainiotsuseid ei saa automaatselt langetada, nt relatsiooniskeemi, partitsioonitehnika ja salvestusvormingute valikut. Käesolevas töös käsitleme kuidas me antud uurimuslünga täidame. Esmalt näitame disainiotsuste kompromisside mõju BD-süsteemide jõudluse korratavusele suurte teadmiste graafide päringute tegemisel. Lisaks näitame BD-raamistike jõudluse kirjeldavate ja diagnostiliste analüüside piiranguid suurte graafide päringute tegemisel. Seejärel uurime, kuidas lubada ettekirjutavat analüütikat järjestamisfunktsioonide ja mitmemõõtmeliste optimeerimistehnikate (nn "Bench-Ranking") kaudu. See lähenemine peidab kirjeldava tulemusanalüüsi keerukuse, suunates praktiku otse teostatavate teadlike otsusteni.Leveraging relational Big Data (BD) processing frameworks to process large knowledge graphs yields a great interest in optimizing query performance. Modern BD systems are yet complicated data systems, where the configurations notably affect the performance. Benchmarking different frameworks and configurations provides the community with best practices for better performance. However, most of these benchmarking efforts are classified as descriptive and diagnostic analytics. Moreover, there is no standard for comparing these benchmarks based on quantitative ranking techniques. Moreover, designing mature pipelines for processing big graphs entails considering additional design decisions that emerge with the non-native (relational) graph processing paradigm. Those design decisions cannot be decided automatically, e.g., the choice of the relational schema, partitioning technique, and storage formats. Thus, in this thesis, we discuss how our work fills this timely research gap. Particularly, we first show the impact of those design decisions’ trade-offs on the BD systems’ performance replicability when querying large knowledge graphs. Moreover, we showed the limitations of the descriptive and diagnostic analyses of BD frameworks’ performance for querying large graphs. Thus, we investigate how to enable prescriptive analytics via ranking functions and Multi-Dimensional optimization techniques (called ”Bench-Ranking”). This approach abstracts out from the complexity of descriptive performance analysis, guiding the practitioner directly to actionable informed decisions.https://www.ester.ee/record=b553332
Decentralized Dictionary Learning Over Time-Varying Digraphs
This paper studies Dictionary Learning problems wherein the learning task is
distributed over a multi-agent network, modeled as a time-varying directed
graph. This formulation is relevant, for instance, in Big Data scenarios where
massive amounts of data are collected/stored in different locations (e.g.,
sensors, clouds) and aggregating and/or processing all data in a fusion center
might be inefficient or unfeasible, due to resource limitations, communication
overheads or privacy issues. We develop a unified decentralized algorithmic
framework for this class of nonconvex problems, which is proved to converge to
stationary solutions at a sublinear rate. The new method hinges on Successive
Convex Approximation techniques, coupled with a decentralized tracking
mechanism aiming at locally estimating the gradient of the smooth part of the
sum-utility. To the best of our knowledge, this is the first provably
convergent decentralized algorithm for Dictionary Learning and, more generally,
bi-convex problems over (time-varying) (di)graphs
- …