Search CORE

8 research outputs found

Statix - statistical type inference on linked data

Author: Cudré-Mauroux Philippe
Khayati Mourad
Lutov Artem
Roshankish Soheil
Publication venue
Publication date: 16/02/2019
Field of study

Large knowledge bases typically contain data adhering to various schemas with incomplete and/or noisy type information. This seriously complicates further integration and post-processing efforts, as type information is crucial in correctly handling the data. In this paper, we introduce a novel statistical type inference method, called StaTIX, to effectively infer instance types in Linked Data sets in a fully unsupervised manner. Our inference technique leverages a new hierarchical clustering algorithm that is robust, highly effective, and scalable. We introduce a novel approach to reduce the processing complexity of the similarity matrix specifying the relations between various instances in the knowledge base. This approach speeds up the inference process while also improving the correctness of the inferred types due to the noise attenuation in the input data. We further optimize the clustering process by introducing a dedicated hash function that speeds up the inference process by orders of magnitude without negatively affecting its accuracy. Finally, we describe a new technique to identify representative clusters from the multi-scale output of our clustering algorithm to further improve the accuracy of the inferred types. We empirically evaluate our approach on several real-world datasets and compare it to the state of the art. Our results show that StaTIX is more efficient than existing methods (both in terms of speed and memory consumption) as well as more effective. StaTIX reduces the F1-score error of the predicted types by about 40% on average compared to the state of the art and improves the execution time by orders of magnitude

arXiv.org e-Print Archive

RERO DOC Digital Library

From Text to Knowledge with Graphs: modelling, querying and exploiting textual content

Author: Alves Mirian Halfeld Ferrari
Forst Anne-Lyse Minard
Vargas-Solar Genoveva
Publication venue
Publication date: 09/10/2023
Field of study

This paper highlights the challenges, current trends, and open issues related to the representation, querying and analytics of content extracted from texts. The internet contains vast text-based information on various subjects, including commercial documents, medical records, scientific experiments, engineering tests, and events that impact urban and natural environments. Extracting knowledge from this text involves understanding the nuances of natural language and accurately representing the content without losing information. This allows knowledge to be accessed, inferred, or discovered. To achieve this, combining results from various fields, such as linguistics, natural language processing, knowledge representation, data storage, querying, and analytics, is necessary. The vision in this paper is that graphs can be a well-suited text content representation once annotated and the right querying and analytics techniques are applied. This paper discusses this hypothesis from the perspective of linguistics, natural language processing, graph models and databases and artificial intelligence provided by the panellists of the DOING session in the MADICS Symposium 2022

arXiv.org e-Print Archive

Overview of query optimization in XML database systems

Author: Abdel Kader R.
van Keulen Maurice
Publication venue: Centre for Telematics and Information Technology (CTIT)
Publication date: 12/11/2007
Field of study

University of Twente Research Information

Superhuman, Transhuman, Post/Human: Mapping the Production and Reception of the Posthuman Body

Author: Jeffery Scott W.
Publication venue: University of Stirling
Publication date: 01/01/2013
Field of study

The figure of the cyborg, or more latterly, the posthuman body has been an increasingly familiar presence in a number of academic disciplines. The majority of such studies have focused on popular culture, particularly the depiction of the posthuman in science-fiction, fantasy and horror. To date however, few studies have focused on the posthuman and the comic book superhero, despite their evident corporeality, and none have questioned comics’ readers about their responses to the posthuman body. This thesis presents a cultural history of the posthuman body in superhero comics along with the findings from twenty-five, two-hour interviews with readers. By way of literature reviews this thesis first provides a new typography of the posthuman, presenting it not as a stable bounded subject but as what Deleuze and Guattari (1987) describe as a ‘rhizome’. Within the rhizome of the posthuman body are several discursive plateaus that this thesis names Superhumanism (the representation of posthuman bodies in popular culture), Post/Humanism (a critical-theoretical stance that questions the assumptions of Humanism) and Transhumanism (the philosophy and practice of human enhancement with technology). With these categories in mind the thesis explores the development of the posthuman in body in the Superhuman realm of comic books. Exploring the body-types most prominent during the Golden (1938-1945), Silver (1958-1974) and contemporary Ages of superheroes it presents three explorations of what I term the Perfect Body, Cosmic Body and Military-Industrial Body respectively. These body types are presented as ‘assemblages’ (Delueze and Guattari, 1987) that display rhizomatic connections to the other discursive realms of the Post/Human and Transhuman. This investigation reveals how the depiction of the Superhuman body developed and diverged from, and sometimes back into, these realms as each attempted to territorialise the meaning and function of the posthuman body. Ultimately it describes how, in spite of attempts by nationalistic or economic interests to control Transhuman enhancement in real-world practices, the realms of Post/Humanism and Superhumanism share a more critical approach. The final section builds upon this cultural history of the posthuman body by addressing reader’s relationship with these images. This begins by refuting some of the common assumptions in comics studies about superheroes and bodily representations. Readers stated that they viewed such imagery as iconographic rather than representational, whether it was the depiction of bodies or technology. Moreover, regular or committed readers of superhero comics were generally suspicious of the notion of human enhancement, displaying a belief in the same binary categories -artificial/natural, human/non-human - that critical Post/Humanism seeks to problematize. The thesis concludes that while superhero comics remain ultimately too human to be truly Post/Humanist texts, it is never the less possible to conceptualise the relationship between reader, text, producer and so on in Post/Humanist terms as reading-assemblage, and that such a cyborgian fusing of human and comic book allow both bodies to ‘become other’, to move in new directions and form new assemblages not otherwise possible when considered separately

Stirling Online Research Repository

An Algebraic Approach to XQuery Optimization

Author: May Norman
Publication venue: Universität Mannheim
Publication date: 01/01/2007
Field of study

As more data is stored in XML and more applications need to process this data, XML query optimization becomes performance critical. While optimization techniques for relational databases have been developed over the last thirty years, the optimization of XML queries poses new challenges. Query optimizers for XQuery, the standard query language for XML data, need to consider both document order and sequence order. Nevertheless, algebraic optimization proved powerful in query optimizers in relational and object oriented databases. Thus, this dissertation presents an algebraic approach to XQuery optimization. In this thesis, an algebra over sequences is presented that allows for a simple translation of XQuery into this algebra. The formal definitions of the operators in this algebra allow us to reason formally about algebraic optimizations. This thesis leverages the power of this formalism when unnesting nested XQuery expressions. In almost all cases unnesting nested queries in XQuery reduces query execution times from hours to seconds or milliseconds. Moreover, this dissertation presents three basic algebraic patterns of nested queries. For every basic pattern a decision tree is developed to select the most effective unnesting equivalence for a given query. Query unnesting extends the search space that can be considered during cost-based optimization of XQuery. As a result, substantially more efficient query execution plans may be detected. This thesis presents two more important cases where the number of plan alternatives leads to substantially shorter query execution times: join ordering and reordering location steps in path expressions. Our algebraic framework detects cases where document order or sequence order is destroyed. However, state-of-the-art techniques for order optimization in cost-based query optimizers have efficient mechanisms to repair order in these cases. The results obtained for query unnesting and cost-based optimization of XQuery underline the need for an algebraic approach to XQuery optimization for efficient XML query processing. Moreover, they are applicable to optimization in relational databases where order semantics are considered

MAnnheim DOCument Server

Exploring a striped XML world

Author: Makalias Savvas
Publication venue: The University of Edinburgh
Publication date: 01/01/2010
Field of study

EXtensible Markup Language, XML, was designed as a markup language for structuring, storing and transporting data on the World Wide Web. The focus of XML is on data content; arbitrary markup is used to describe data. This versatile, self-describing data representation has established XML as the universal data format and the de facto standard for information exchange on the Web. This has gradually given rise to the need for efficient storage and querying of large XML repositories. To that end, we propose a new model for building a native XML store which is based on a generalisation of vertical decomposition. Nodes of a document satisfying the same label-path, are extracted and stored together in a single container, a Stripe. Stripes make use of a labelling scheme allowing us to maintain full structural information. Over this new representation, we introduce various evaluation techniques, which allow us to handle a large fragment of XPath 2.0. We also focus on the optimisation opportunities that arise from our decomposition model during any query evaluation phase. During query validation, we present an input minimisation process that exploits the proposed model for identifying input that is only relevant to the given query, in terms of Stripes. We also define query equivalence rules for query rewriting over our proposed model. Finally, during query optimisation, we deal with whether and under which circumstances certain evaluation algorithms can be replaced by others having lower I/O and/or CPU cost. We propose three storage schemes under our general decomposition technique. The schemes differ in the compression method imposed on the structural part of the XML document. The first storage scheme imposes no compression. The second storage scheme exploits structural regularities of the document to minimise storage and, thus, I/O cost during query evaluation. Finally, the third storage scheme performs structureagnostic compression of the document structure which results in minimised storage, regardless the actual XML structure. We experiment on XML repositories of varying size, recursion and structural regularity. We consider query input size, execution plan size and query response time as metrics for our experimental results. We process query workloads by applying each of the proposed optimisations in isolation and then all of their combinations. In addition, we apply the same execution pipeline for all proposed storage schemes. As a reference to our proposed query evaluation pipeline, we use the current state-of-the-art system for XML query processing. Our results demonstrate that: • Our proposed data model provides the infrastructure for efficiently selecting the parts of the document that are relevant to a given query. • The application of query rewriting, combined with input minimisation, reduces query input size as well as the number of physical operators used. In addition, when evaluation algorithms are specialised to the decomposition method, query response time is further reduced. • Query evaluation performance is largely affected by the storage schemes, which are closely related to the structural properties of the data. The achieved compression ratio greatly affects storage size and therefore, query response times

Edinburgh Research Archive

Bowdoin Orient v.116, no.1-27 (1986-1987)

Author: The Bowdoin Orient
Publication venue: Bowdoin Digital Commons
Publication date: 08/01/1987
Field of study

https://digitalcommons.bowdoin.edu/bowdoinorient-1980s/1007/thumbnail.jp

Bowdoin College