1,217 research outputs found
Co-evolution of RDF Datasets
Linking Data initiatives have fostered the publication of large number of RDF
datasets in the Linked Open Data (LOD) cloud, as well as the development of
query processing infrastructures to access these data in a federated fashion.
However, different experimental studies have shown that availability of LOD
datasets cannot be always ensured, being RDF data replication required for
envisioning reliable federated query frameworks. Albeit enhancing data
availability, RDF data replication requires synchronization and conflict
resolution when replicas and source datasets are allowed to change data over
time, i.e., co-evolution management needs to be provided to ensure consistency.
In this paper, we tackle the problem of RDF data co-evolution and devise an
approach for conflict resolution during co-evolution of RDF datasets. Our
proposed approach is property-oriented and allows for exploiting semantics
about RDF properties during co-evolution management. The quality of our
approach is empirically evaluated in different scenarios on the DBpedia-live
dataset. Experimental results suggest that proposed proposed techniques have a
positive impact on the quality of data in source datasets and replicas.Comment: 18 pages, 4 figures, Accepted in ICWE, 201
Federated Query Processing over Heterogeneous Data Sources in a Semantic Data Lake
Data provides the basis for emerging scientific and interdisciplinary data-centric applications with the potential of improving the quality of life for citizens. Big Data plays an important role in promoting both manufacturing and scientific development through industrial digitization and emerging interdisciplinary research. Open data initiatives have encouraged the publication of Big Data by exploiting the decentralized nature of the Web, allowing for the availability of heterogeneous data generated and maintained by autonomous data providers. Consequently, the growing volume of data consumed by different applications raise the need for effective data integration approaches able to process a large volume of data that is represented in different format, schema and model, which may also include sensitive data, e.g., financial transactions, medical procedures, or personal data. Data Lakes are composed of heterogeneous data sources in their original format, that reduce the overhead of materialized data integration. Query processing over Data Lakes require the semantic description of data collected from heterogeneous data sources. A Data Lake with such semantic annotations is referred to as a Semantic Data Lake. Transforming Big Data into actionable knowledge demands novel and scalable techniques for enabling not only Big Data ingestion and curation to the Semantic Data Lake, but also for efficient large-scale semantic data integration, exploration, and discovery. Federated query processing techniques utilize source descriptions to find relevant data sources and find efficient execution plan that minimize the total execution time and maximize the completeness of answers. Existing federated query processing engines employ a coarse-grained description model where the semantics encoded in data sources are ignored. Such descriptions may lead to the erroneous selection of data sources for a query and unnecessary retrieval of data, affecting thus the performance of query processing engine. In this thesis, we address the problem of federated query processing against heterogeneous data sources in a Semantic Data Lake. First, we tackle the challenge of knowledge representation and propose a novel source description model, RDF Molecule Templates, that describe knowledge available in a Semantic Data Lake. RDF Molecule Templates (RDF-MTs) describes data sources in terms of an abstract description of entities belonging to the same semantic concept. Then, we propose a technique for data source selection and query decomposition, the MULDER approach, and query planning and optimization techniques, Ontario, that exploit the characteristics of heterogeneous data sources described using RDF-MTs and provide a uniform access to heterogeneous data sources. We then address the challenge of enforcing privacy and access control requirements imposed by data providers. We introduce a privacy-aware federated query technique, BOUNCER, able to enforce privacy and access control regulations during query processing over data sources in a Semantic Data Lake. In particular, BOUNCER exploits RDF-MTs based source descriptions in order to express privacy and access control policies as well as their automatic enforcement during source selection, query decomposition, and planning. Furthermore, BOUNCER implements query decomposition and optimization techniques able to identify query plans over data sources that not only contain the relevant entities to answer a query, but also are regulated by policies that allow for accessing these relevant entities. Finally, we tackle the problem of interest based update propagation and co-evolution of data sources. We present a novel approach for interest-based RDF update propagation that consistently maintains a full or partial replication of large datasets and deal with co-evolution
RDF-TR: Exploiting structural redundancies to boost RDF compression
The number and volume of semantic data have grown impressively over the last decade, promoting compression as an essential tool for RDF preservation, sharing and management. In contrast to universal compressors, RDF compression techniques are able to detect and exploit specific forms of redundancy in RDF data. Thus, state-of-the-art RDF compressors excel at exploiting syntactic and semantic redundancies, i.e., repetitions in the serialization format and information that can be inferred implicitly. However, little attention has been paid to the existence of structural patterns within the RDF dataset; i.e. structural redundancy. In this paper, we analyze structural regularities in real-world datasets, and show three schema-based sources of redundancies that underpin the schema-relaxed nature of RDF. Then, we propose RDF-Tr (RDF Triples Reorganizer), a preprocessing technique that discovers and removes this kind of redundancy before the RDF dataset is effectively compressed. In particular, RDF-Tr groups subjects that are described by the same predicates, and locally re-codes the objects related to these predicates. Finally, we integrate
RDF-Tr with two RDF compressors, HDT and k2-triples. Our experiments show that using RDF-Tr with these compressors improves by up to 2.3 times their original effectiveness, outperforming the most prominent state-of-the-art techniques
ARPA Whitepaper
We propose a secure computation solution for blockchain networks. The
correctness of computation is verifiable even under malicious majority
condition using information-theoretic Message Authentication Code (MAC), and
the privacy is preserved using Secret-Sharing. With state-of-the-art multiparty
computation protocol and a layer2 solution, our privacy-preserving computation
guarantees data security on blockchain, cryptographically, while reducing the
heavy-lifting computation job to a few nodes. This breakthrough has several
implications on the future of decentralized networks. First, secure computation
can be used to support Private Smart Contracts, where consensus is reached
without exposing the information in the public contract. Second, it enables
data to be shared and used in trustless network, without disclosing the raw
data during data-at-use, where data ownership and data usage is safely
separated. Last but not least, computation and verification processes are
separated, which can be perceived as computational sharding, this effectively
makes the transaction processing speed linear to the number of participating
nodes. Our objective is to deploy our secure computation network as an layer2
solution to any blockchain system. Smart Contracts\cite{smartcontract} will be
used as bridge to link the blockchain and computation networks. Additionally,
they will be used as verifier to ensure that outsourced computation is
completed correctly. In order to achieve this, we first develop a general MPC
network with advanced features, such as: 1) Secure Computation, 2) Off-chain
Computation, 3) Verifiable Computation, and 4)Support dApps' needs like
privacy-preserving data exchange
A Survey of Satisfiability Modulo Theory
Satisfiability modulo theory (SMT) consists in testing the satisfiability of
first-order formulas over linear integer or real arithmetic, or other theories.
In this survey, we explain the combination of propositional satisfiability and
decision procedures for conjunctions known as DPLL(T), and the alternative
"natural domain" approaches. We also cover quantifiers, Craig interpolants,
polynomial arithmetic, and how SMT solvers are used in automated software
analysis.Comment: Computer Algebra in Scientific Computing, Sep 2016, Bucharest,
Romania. 201
Rule mining on extended knowledge graphs
Masteroppgave i informatikkINF399MAMN-PROGMAMN-IN
Scientific Knowledge fit for society - Scoring scientific accuracy in climate change related news articles
The quantity of information is increasing exponentially, and there is a vast amount of content viewed on the internet that lacks an indicator as to whether it is scientifically accurate and correct or scientifically inaccurate and incorrect. This thesis proposes the development of an indicator of scientific accuracy in online media. This should help in public debates and help in the detection of misinformation. The thesis presents a baseline score and clear interfaces for further improvement. The necessity for such a score has been validated by a user survey, and the employed methodologies were evaluated and updated through interviews with experts from the ORKG team. Furthermore, an overview of the knowledge required to conduct research in this field and a discussion for future work is provided.Die Menge an Informationen nimmt exponentiell zu, und es gibt eine riesige Menge
an Inhalten im Internet, fĂĽr die es keinen Indikator dafĂĽr gibt, ob sie wissenschaftlich
genau und korrekt oder wissenschaftlich ungenau und falsch sind. In dieser Arbeit
wird die Entwicklung eines Indikators fĂĽr wissenschaftliche Genauigkeit in Online-
Medien vorgeschlagen. Dies sollte in öffentlichen Debatten und bei der Aufdeckung
von Fehlinformationen helfen. Die Arbeit legt einen Grundstein und schafft klare
Schnittstellen fĂĽr weitere Verbesserungen. Die Notwendigkeit eines solchen Scores
wurde durch eine Nutzerbefragung validiert, und die verwendeten Methoden wurden
durch Interviews mit Experten aus dem ORKG-Team evaluiert und aktualisiert.
DarĂĽber hinaus wird ein Ăśberblick ĂĽber das fĂĽr die Forschung in diesem Bereich
erforderliche Wissen gegeben und eine Diskussion ĂĽber kĂĽnftige Arbeiten gefĂĽhrt
- …