Search CORE

164,293 research outputs found

Enumerating Maximal Bicliques from a Large Graph using MapReduce

Author: Mukherjee Arko
Tirthapura Srikanta
Tirthapura Srikanta
Publication venue
Publication date: 01/01/2014
Field of study

We consider the enumeration of maximal bipartite cliques (bicliques) from a large graph, a task central to many practical data mining problems in social network analysis and bioinformatics. We present novel parallel algorithms for the MapReduce platform, and an experimental evaluation using Hadoop MapReduce. Our algorithm is based on clustering the input graph into smaller sized subgraphs, followed by processing different subgraphs in parallel. Our algorithm uses two ideas that enable it to scale to large graphs: (1) the redundancy in work between different subgraph explorations is minimized through a careful pruning of the search space, and (2) the load on different reducers is balanced through the use of an appropriate total order among the vertices. Our evaluation shows that the algorithm scales to large graphs with millions of edges and tens of mil- lions of maximal bicliques. To our knowledge, this is the first work on maximal biclique enumeration for graphs of this scale.Comment: A preliminary version of the paper was accepted at the Proceedings of the 3rd IEEE International Congress on Big Data 201

arXiv.org e-Print Archive

Digital Repository @ Iowa State University (ISU)

Crossref

The News Angler Project: Exploring the Next Generation of Journalistic Knowledge Platforms

Author: Gallofr´e Oca˜na Marc
L. Opdahl Andreas
Stoppel Sergej
Tessem Bjørnar
Veres Csaba
Publication venue: NIKT Foundation
Publication date: 22/11/2021
Field of study

The News Angler project aims to support journalists in finding new and unexpected connections and angles in the news. The project therefore explores how recent artificial intelligence (AI) techniques — such as knowledge graphs, natural-language processing (NLP) and machine learning (ML) — can support high-quality journalism that exploits big and open data sources. A central contribution is News Hunter, a series of prototype journalistic knowledge platforms (JKPs)

BIBSYS: Open Journals Systems

From engineering models to knowledge graph : delivering new insights into models

Author: Berquand Audrey
Riccardi Annalisa
Publication venue
Publication date: 30/09/2020
Field of study

Essential information on the early stages of a mission design is contained in Engineering Models. Yet, these models are often uneasy to visualise, query, let alone compare. This study demonstrates how Knowledge Graphs can overcome these data silos, interconnect information, provide a big-picture perspective, and infer new knowledge that would have remained hidden otherwise. Following the migration of CubeSats Engineering Models to a Knowledge Graph, two case studies are explored. The first case study illustrates how graph inference can derive implicit knowledge from existing explicit concepts. In the second case study, a Natural Language Processing layer is adjoined to the Knowledge Graph to enhances the analysis of textual content. The Natural Language Processing layer relies on the document embedding method doc2v

University of Strathclyde Institutional Repository

Bench-Ranking: ettekirjutav analüüsimeetod suurte teadmiste graafide päringutele

Author: Ragab Mohamed
Publication venue
Publication date: 21/12/2022
Field of study

Relatsiooniliste suurandmete (BD) töötlemisraamistike kasutamine suurte teadmiste graafide töötlemiseks kätkeb endas võimalust päringu jõudlust optimeerimida. Kaasaegsed BD-süsteemid on samas keerulised andmesüsteemid, mille konfiguratsioonid omavad olulist mõju jõudlusele. Erinevate raamistike ja konfiguratsioonide võrdlusuuringud pakuvad kogukonnale parimaid tavasid parema jõudluse saavutamiseks. Enamik neist võrdlusuuringutest saab liigitada siiski vaid kirjeldavaks ja diagnostiliseks analüütikaks. Lisaks puudub ühtne standard nende uuringute võrdlemiseks kvantitatiivselt järjestatud kujul. Veelgi enam, suurte graafide töötlemiseks vajalike konveierite kavandamine eeldab täiendavaid disainiotsuseid mis tulenevad mitteloomulikust (relatsioonilisest) graafi töötlemise paradigmast. Taolisi disainiotsuseid ei saa automaatselt langetada, nt relatsiooniskeemi, partitsioonitehnika ja salvestusvormingute valikut. Käesolevas töös käsitleme kuidas me antud uurimuslünga täidame. Esmalt näitame disainiotsuste kompromisside mõju BD-süsteemide jõudluse korratavusele suurte teadmiste graafide päringute tegemisel. Lisaks näitame BD-raamistike jõudluse kirjeldavate ja diagnostiliste analüüside piiranguid suurte graafide päringute tegemisel. Seejärel uurime, kuidas lubada ettekirjutavat analüütikat järjestamisfunktsioonide ja mitmemõõtmeliste optimeerimistehnikate (nn "Bench-Ranking") kaudu. See lähenemine peidab kirjeldava tulemusanalüüsi keerukuse, suunates praktiku otse teostatavate teadlike otsusteni.Leveraging relational Big Data (BD) processing frameworks to process large knowledge graphs yields a great interest in optimizing query performance. Modern BD systems are yet complicated data systems, where the configurations notably affect the performance. Benchmarking different frameworks and configurations provides the community with best practices for better performance. However, most of these benchmarking efforts are classified as descriptive and diagnostic analytics. Moreover, there is no standard for comparing these benchmarks based on quantitative ranking techniques. Moreover, designing mature pipelines for processing big graphs entails considering additional design decisions that emerge with the non-native (relational) graph processing paradigm. Those design decisions cannot be decided automatically, e.g., the choice of the relational schema, partitioning technique, and storage formats. Thus, in this thesis, we discuss how our work fills this timely research gap. Particularly, we first show the impact of those design decisions’ trade-offs on the BD systems’ performance replicability when querying large knowledge graphs. Moreover, we showed the limitations of the descriptive and diagnostic analyses of BD frameworks’ performance for querying large graphs. Thus, we investigate how to enable prescriptive analytics via ranking functions and Multi-Dimensional optimization techniques (called ”Bench-Ranking”). This approach abstracts out from the complexity of descriptive performance analysis, guiding the practitioner directly to actionable informed decisions.https://www.ester.ee/record=b553332

DSpace at Tartu University Library

Decentralized Dictionary Learning Over Time-Varying Digraphs

Author: Daneshmand Amir
Facchinei Francisco
Sadler Brian M.
Scutari Gesualdo
Sun Ying
Publication venue
Publication date: 01/01/2019
Field of study

This paper studies Dictionary Learning problems wherein the learning task is distributed over a multi-agent network, modeled as a time-varying directed graph. This formulation is relevant, for instance, in Big Data scenarios where massive amounts of data are collected/stored in different locations (e.g., sensors, clouds) and aggregating and/or processing all data in a fusion center might be inefficient or unfeasible, due to resource limitations, communication overheads or privacy issues. We develop a unified decentralized algorithmic framework for this class of nonconvex problems, which is proved to converge to stationary solutions at a sublinear rate. The new method hinges on Successive Convex Approximation techniques, coupled with a decentralized tracking mechanism aiming at locally estimating the gradient of the smooth part of the sum-utility. To the best of our knowledge, this is the first provably convergent decentralized algorithm for Dictionary Learning and, more generally, bi-convex problems over (time-varying) (di)graphs

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza