123 research outputs found

    Graph Summarization

    Full text link
    The continuous and rapid growth of highly interconnected datasets, which are both voluminous and complex, calls for the development of adequate processing and analytical techniques. One method for condensing and simplifying such datasets is graph summarization. It denotes a series of application-specific algorithms designed to transform graphs into more compact representations while preserving structural patterns, query answers, or specific property distributions. As this problem is common to several areas studying graph topologies, different approaches, such as clustering, compression, sampling, or influence detection, have been proposed, primarily based on statistical and optimization methods. The focus of our chapter is to pinpoint the main graph summarization methods, but especially to focus on the most recent approaches and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie

    Coverage-Based Summaries for RDF KBs

    Get PDF
    As more and more data become available as linked data, the need for efficient and effective methods for their exploration becomes apparent. Semantic summaries try to extract meaning from data, while reducing its size. State of the art structural semantic summaries, focus primarily on the graph structure of the data, trying to maximize the summary’s utility for query answering, i.e. the query coverage. In this poster paper, we present an algorithm, trying to maximize the aforementioned query coverage, using ideas borrowed from result diversification. The key idea of our algorithm is that, instead of focusing only to the “central” nodes, to push node selection also to the perimeter of the graph. Our experiments show the potential of our algorithm and demonstrate the considerable advantages gained for answering larger fragments of user queries.acceptedVersionPeer reviewe

    RDF graph summarization: principles, techniques and applications (tutorial)

    Get PDF
    International audienceThe explosion in the amount of the RDF on the Web has lead to the need to explore, query and understand such data sources. The task is challenging due to the complex and heterogeneous structure of RDF graphs which, unlike relational databases, do not come with a structure-dictating schema. Summarization has been applied to RDF data to facilitate these tasks. Its purpose is to extract concise and meaningful information from RDF knowledge bases, representing their content as faithfully as possible. There is no single concept of RDF summary, and not a single but many approaches to build such summaries; the summarization goal, and the main computational tools employed for summarizing graphs, are the main factors behind this diversity. This tutorial presents a structured analysis and comparison existing works in the area of RDF summarization; it is based upon a recent survey which we co-authored with colleagues [3]. We present the concepts at the core of each approach, outline their main technical aspects and implementation. We conclude by identifying the most pertinent summarization method for different usage scenarios, and discussing areas where future effort is needed

    Instance-Based Lossless Summarization of Knowledge Graph With Optimized Triples and Corrections (IBA-OTC)

    Get PDF
    Knowledge graph (KG) summarization facilitates efficient information retrieval for exploring complex structural data. For fast information retrieval, it requires processing on redundant data. However, it necessitates the completion of information in a summary graph. It also saves computational time during data retrieval, storage space, in-memory visualization, and preserving structure after summarization. State-of-the-art approaches summarize a given KG by preserving its structure at the cost of information loss. Additionally, the approaches not preserving the underlying structure, compromise the summarization ratio by focusing only on the compression of specific regions. In this way, these approaches either miss preserving the original facts or the wrong prediction of inferred information. To solve these problems, we present a novel framework for generating a lossless summary by preserving the structure through super signatures and their corresponding corrections. The proposed approach summarizes only the naturally overlapped instances while maintaining its information and preserving the underlying Resource Description Framework RDF graph. The resultant summary is composed of triples with positive, negative, and star corrections that are optimized by the smart calling of two novel functions namely merge and disperse . To evaluate the effectiveness of our proposed approach, we perform experiments on nine publicly available real-world knowledge graphs and obtain a better summarization ratio than state-of-the-art approaches by a margin of 10% to 30% with achieving its completeness, correctness, and compactness. In this way, the retrieval of common events and groups by queries is accelerated in the resultant graph

    The Case of Wikidata

    Get PDF
    Since its launch in 2012, Wikidata has grown to become the largest open knowledge base (KB), containing more than 100 million data items and over 6 million registered users. Wikidata serves as the structured data backbone of Wikipedia, addressing data inconsistencies, and adhering to the motto of “serving anyone anywhere in the world,” a vision realized through the diversity of knowledge. Despite being a collaboratively contributed platform, the Wikidata community heavily relies on bots, automated accounts with batch, and speedy editing rights, for a majority of edits. As Wikidata approaches its first decade, the question arises: How close is Wikidata to achieving its vision of becoming a global KB and how diverse is it in serving the global population? This dissertation investigates the current status of Wikidata’s diversity, the role of bot interventions on diversity, and how bots can be leveraged to improve diversity within the context of Wikidata. The methodologies used in this study are mapping study and content analysis, which led to the development of three datasets: 1) Wikidata Research Articles Dataset, covering the literature on Wikidata from its first decade of existence sourced from online databases to inspect its current status; 2) Wikidata Requests-for-Permissions Dataset, based on the pages requesting bot rights on the Wikidata website to explore bots from a community perspective; and 3) Wikidata Revision History Dataset, compiled from the edit history of Wikidata to investigate bot editing behavior and its impact on diversity, all of which are freely available online. The insights gained from the mapping study reveal the growing popularity of Wikidata in the research community and its various application areas, indicative of its progress toward the ultimate goal of reaching the global community. However, there is currently no research addressing the topic of diversity in Wikidata, which could shed light on its capacity to serve a diverse global population. To address this gap, this dissertation proposes a diversity measurement concept that defines diversity in a KB context in terms of variety, balance, and disparity and is capable of assessing diversity in a KB from two main angles: user and data. The application of this concept on the domains and classes of the Wikidata Revision History Dataset exposes imbalanced content distribution across Wikidata domains, which indicates low data diversity in Wikidata domains. Further analysis discloses that bots have been active since the inception of Wikidata, and the community embraces their involvement in content editing tasks, often importing data from Wikipedia, which shows a low diversity of sources in bot edits. Bots and human users engage in similar editing tasks but exhibit distinct editing patterns. The findings of this thesis confirm that bots possess the potential to influence diversity within Wikidata by contributing substantial amounts of data to specific classes and domains, leading to an imbalance. However, this potential can also be harnessed to enhance coverage in classes with limited content and restore balance, thus improving diversity. Hence, this study proposes to enhance diversity through automation and demonstrate the practical implementation of the recommendations using a specific use case. In essence, this research enhances our understanding of diversity in relation to a KB, elucidates the influence of automation on data diversity, and sheds light on diversity improvement within a KB context through the usage of automation.Seit seiner Einführung im Jahr 2012 hat sich Wikidata zu der grĂ¶ĂŸten offenen Wissensdatenbank entwickelt, die mehr als 100 Millionen Datenelemente und über 6 Millionen registrierte Benutzer enthĂ€lt. Wikidata dient als das strukturierte Rückgrat von Wikipedia, indem es Datenunstimmigkeiten angeht und sich dem Motto verschrieben hat, ’jedem überall auf der Welt zu dienen’, eine Vision, die durch die DiversitĂ€t des Wissens verwirklicht wird. Trotz seiner kooperativen Natur ist die Wikidata-Community in hohem Maße auf Bots, automatisierte Konten mit Batch- Verarbeitung und schnelle Bearbeitungsrechte angewiesen, um die Mehrheit der Bearbeitungen durchzuführen. Da Wikidata seinem ersten Jahrzehnt entgegengeht, stellt sich die Frage: Wie nahe ist Wikidata daran, seine Vision, eine globale Wissensdatenbank zu werden, zu verwirklichen, und wie ausgeprĂ€gt ist seine Dienstleistung für die globale Bevölkerung? Diese Dissertation untersucht den aktuellen Status der DiversitĂ€t von Wikidata, die Rolle von Bot-Eingriffen in Bezug auf DiversitĂ€t und wie Bots im Kontext von Wikidata zur Verbesserung der DiversitĂ€t genutzt werden können. Die in dieser Studie verwendeten Methoden sind Mapping-Studie und Inhaltsanalyse, die zur Entwicklung von drei DatensĂ€tzen geführt haben: 1) Wikidata Research Articles Dataset, die die Literatur zu Wikidata aus dem ersten Jahrzehnt aus Online-Datenbanken umfasst, um den aktuellen Stand zu untersuchen; 2) Requestfor- Permission Dataset, der auf den Seiten zur Beantragung von Bot-Rechten auf der Wikidata-Website basiert, um Bots aus der Perspektive der Gemeinschaft zu untersuchen; und 3)Wikidata Revision History Dataset, der aus der Bearbeitungshistorie von Wikidata zusammengestellt wurde, um das Bearbeitungsverhalten von Bots zu untersuchen und dessen Auswirkungen auf die DiversitĂ€t, die alle online frei verfügbar sind. Die Erkenntnisse aus der Mapping-Studie zeigen die wachsende Beliebtheit von Wikidata in der Forschungsgemeinschaft und in verschiedenen Anwendungsbereichen, was auf seinen Fortschritt hin zur letztendlichen Zielsetzung hindeutet, die globale Gemeinschaft zu erreichen. Es gibt jedoch derzeit keine Forschung, die sich mit dem Thema der DiversitĂ€t in Wikidata befasst und Licht auf seine FĂ€higkeit werfen könnte, eine vielfĂ€ltige globale Bevölkerung zu bedienen. Um diese Lücke zu schließen, schlĂ€gt diese Dissertation ein Konzept zur Messung der DiversitĂ€t vor, das die DiversitĂ€t im Kontext einer Wissensbasis anhand von Vielfalt, Balance und Diskrepanz definiert und in der Lage ist, die DiversitĂ€t aus zwei Hauptperspektiven zu bewerten: Benutzer und Daten. Die Anwendung dieses Konzepts auf die Bereiche und Klassen des Wikidata Revision History Dataset zeigt eine unausgewogene Verteilung des Inhalts über die Bereiche von Wikidata auf, was auf eine geringe DiversitĂ€t der Daten in den Bereichen von Wikidata hinweist. Weitere Analysen zeigen, dass Bots seit der Gründung von Wikidata aktiv waren und von der Gemeinschaft inhaltliche Bearbeitungsaufgaben angenommen werden, oft mit Datenimporten aus Wikipedia, was auf eine geringe DiversitĂ€t der Quellen bei Bot-Bearbeitungen hinweist. Bots und menschliche Benutzer führen Ă€hnliche Bearbeitungsaufgaben aus, zeigen jedoch unterschiedliche Bearbeitungsmuster. Die Ergebnisse dieser Dissertation bestĂ€tigen, dass Bots das Potenzial haben, die DiversitĂ€t in Wikidata zu beeinflussen, indem sie bedeutende Datenmengen zu bestimmten Klassen und Bereichen beitragen, was zu einer Ungleichgewichtung führt. Dieses Potenzial kann jedoch auch genutzt werden, um die Abdeckung in Klassen mit begrenztem Inhalt zu verbessern und das Gleichgewicht wiederherzustellen, um die DiversitĂ€t zu verbessern. Daher schlĂ€gt diese Studie vor, die DiversitĂ€t durch Automatisierung zu verbessern und die praktische Umsetzung der Empfehlungen anhand eines spezifischen Anwendungsfalls zu demonstrieren. Kurz gesagt trĂ€gt diese Forschung dazu bei, unser VerstĂ€ndnis der DiversitĂ€t im Kontext einer Wissensbasis zu vertiefen, wirft Licht auf den Einfluss von Automatisierung auf die DiversitĂ€t von Daten und zeigt die Verbesserung der DiversitĂ€t im Kontext einer Wissensbasis durch die Verwendung von Automatisierung auf

    Statistically-driven generation of multidimensional analytical schemas from linked data

    Get PDF
    The ever-increasing Linked Data (LD) initiative has given place to open, large amounts of semi-structured and rich data published on the Web. However, effective analytical tools that aid the user in his/her analysis and go beyond browsing and querying are still lacking. To address this issue, we propose the automatic generation of multidimensional analytical stars (MDAS). The success of the multidimensional (MD) model for data analysis has been in great part due to its simplicity. Therefore, in this paper we aim at automatically discovering MD conceptual patterns that summarize LD. These patterns resemble the MD star schema typical of relational data warehousing. The underlying foundations of our method is a statistical framework that takes into account both concept and instance data. We present an implementation that makes use of the statistical framework to generate the MDAS. We have performed several experiments that assess and validate the statistical approach with two well-known and large LD sets.This research has been partially funded by the “Ministerio de Economía y Competitividad” with contract number TIN2014-55335-R. Victoria Nebot was supported by the UJI Postdoctoral Fel- lowship program with reference PI14490

    Visual Question Answering: A Survey of Methods and Datasets

    Full text link
    Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Given an image and a question in natural language, it requires reasoning over visual elements of the image and general knowledge to infer the correct answer. In the first part of this survey, we examine the state of the art by comparing modern approaches to the problem. We classify methods by their mechanism to connect the visual and textual modalities. In particular, we examine the common approach of combining convolutional and recurrent neural networks to map images and questions to a common feature space. We also discuss memory-augmented and modular architectures that interface with structured knowledge bases. In the second part of this survey, we review the datasets available for training and evaluating VQA systems. The various datatsets contain questions at different levels of complexity, which require different capabilities and types of reasoning. We examine in depth the question/answer pairs from the Visual Genome project, and evaluate the relevance of the structured annotations of images with scene graphs for VQA. Finally, we discuss promising future directions for the field, in particular the connection to structured knowledge bases and the use of natural language processing models.Comment: 25 page

    Validation and Evaluation

    Get PDF
    In this technical report, we present prototypical implementations of innovative tools and methods for personalized and contextualized (multimedia) search, collaborative ontology evolution, ontology evaluation and cost models, and dynamic access and trends in distributed (semantic) knowledge, developed according to the working plan outlined in Technical Report TR-B-12-04. The prototypes complete the next milestone on the path to an integral Corporate Semantic Web architecture based on the three pillars Corporate Ontology Engineering, Corporate Semantic Collaboration, and Corporate Semantic Search, as envisioned in TR-B-08-09
    • 

    corecore