Search CORE

844 research outputs found

Ambiguous correlation

Author: Epstein Larry G.
Halevy Yoram
Publication venue: 'Oxford University Press (OUP)'
Publication date: 21/02/2018
Field of study

Many decisions are made in environments where outcomes are determined by the realization of multiple random events. A decision maker may be uncertain how these events are related. We identify and experimentally substantiate behavior that intuitively reflects a lack of confidence in their joint distribution. Our findings suggest a dimension of ambiguity which is different from that in the classical distinction between risk and "Knightian uncertainty"

Boston University Institutional Repository (OpenBU)

An XML Query Engine for Network-Bound Data

Author: Halevy Alon Y
Ives Zachary G
Weld Daniel S
Publication venue: ScholarlyCommons
Publication date: 01/01/2001
Field of study

XML has become the lingua franca for data exchange and integration across administrative and enterprise boundaries. Nearly all data providers are adding XML import or export capabilities, and standard XML Schemas and DTDs are being promoted for all types of data sharing. The ubiquity of XML has removed one of the major obstacles to integrating data from widely disparate sources –- namely, the heterogeneity of data formats. However, general-purpose integration of data across the wide area also requires a query processor that can query data sources on demand, receive streamed XML data from them, and combine and restructure the data into new XML output -- while providing good performance for both batch-oriented and ad-hoc, interactive queries. This is the goal of the Tukwila data integration system, the first system that focuses on network-bound, dynamic XML data sources. In contrast to previous approaches, which must read, parse, and often store entire XML objects before querying them, Tukwila can return query results even as the data is streaming into the system. Tukwila is built with a new system architecture that extends adaptive query processing and relational-engine techniques into the XML realm, as facilitated by a pair of operators that incrementally evaluate a query’s input path expressions as data is read. In this paper, we describe the Tukwila architecture and its novel aspects, and we experimentally demonstrate that Tukwila provides better overall query performance and faster initial answers than existing systems, and has excellent scalability

CiteSeerX

ScholarlyCommons@Penn

Piazza: Data Management Infrastructure for Semantic Web Applications

Author: Halevy Alon Y
Ives Zachary G
Mork Peter
Tatarinov Igor
Publication venue: ScholarlyCommons
Publication date: 20/05/2003
Field of study

The Semantic Web envisions a World Wide Web in which data is described with rich semantics and applications can pose complex queries. To this point, researchers have defined new languages for specifying meanings for concepts and developed techniques for reasoning about them, using RDF as the data model. To flourish, the Semantic Web needs to be able to accommodate the huge amounts of existing data and the applications operating on them. To achieve this, we are faced with two problems. First, most of the world\u27s data is available not in RDF but in XML; XML and the applications consuming it rely not only on the domain structure of the data, but also on its document structure. Hence, to provide interoperability between such sources, we must map between both their domain structures and their document structures. Second, data management practitioners often prefer to exchange data through local point-to-point data translations, rather than mapping to common mediated schemas or ontologies. This paper describes the Piazza system, which addresses these challenges. Piazza offers a language for mediating between data sources on the Semantic Web, which maps both the domain structure and document structure. Piazza also enables interoperation of XML data with RDF data that is accompanied by rich OWL ontologies. Mappings in Piazza are provided at a local scale between small sets of nodes, and our query answering algorithm is able to chain sets mappings together to obtain relevant data from across the Piazza network. We also describe an implemented scenario in Piazza and the lessons we learned from it

CiteSeerX

ScholarlyCommons@Penn

Crop Knowledge Discovery Based on Agricultural Big Data Integration

Author: Halevy A.
Lenzerini M.
Majumdar J.
Ngo V. M.
Schuetz C. G.
Publication venue
Publication date: 10/03/2020
Field of study

Nowadays, the agricultural data can be generated through various sources, such as: Internet of Thing (IoT), sensors, satellites, weather stations, robots, farm equipment, agricultural laboratories, farmers, government agencies and agribusinesses. The analysis of this big data enables farmers, companies and agronomists to extract high business and scientific knowledge, improving their operational processes and product quality. However, before analysing this data, different data sources need to be normalised, homogenised and integrated into a unified data representation. In this paper, we propose an agricultural data integration method using a constellation schema which is designed to be flexible enough to incorporate other datasets and big data models. We also apply some methods to extract knowledge with the view to improve crop yield; these include finding suitable quantities of soil properties, herbicides and insecticides for both increasing crop yield and protecting the environment.Comment: 5 page

arXiv.org e-Print Archive

Crossref

Research Repository UCD

Ambiguous Correlation

Author: Larry G Epstein
Yoram Halevy
Publication venue
Publication date: 24/04/2020
Field of study

Abstract Many decisions are made in environments where outcomes are determined by the realization of multiple random events. A decision maker may be uncertain how these events are related. We identify and experimentally substantiate behavior that intuitively re ‡ects a lack of con…dence in their joint distribution. Our …ndings suggest a dimension of ambiguity which is di¤erent from that in the classical distinction between risk and "Knightian uncertainty." Boston University, [email protected] and University of Toronto, [email protected]. We are grateful for discussions with Kyoungwon Seo and Chew Soo Hong, for detailed comments from Aurelien Baillon and especially Peter Wakker, for insightful and thoughtful suggestions by two referees and the editor -Dimitri Vayanos, and for comments from audiences in several conferences and workshops

CiteSeerX

Towards a large-area RPWELL detector: design optimization and performance

Author: Bressler S.
de Vito-Halevy F.
Jash A.
Moleri L.
Sela G.
Zavazieva D.
Publication venue
Publication date: 18/07/2023
Field of study

We present a new design and assembly procedure of a large-area gas-avalanche Resistive-Plate WELL (RPWELL) detector. A

50\times50 ~\mathrm{cm^2}

prototype was tested in

\mathrm{80 ~GeV/c}

muon beam at CERN-SPS, presenting improved performances compared to previous ones: MIP detection efficiency over 96\% with 3\% uniformity across the entire detector area, a charge gain of

\mathrm{\approx{7.5 \times 10^3}}

with a uniformity of 22\%, and discharge probability below

\mathrm{10^{-6}}

with a few single hotspots attributed to production imperfections. These results pave the way towards further up-scaling detectors of this kind

arXiv.org e-Print Archive

GUN: An Efficient Execution Strategy for Querying the Web of Data

Author: A. Schwarte
A.Y. Halevy
C. Bizer
D. Izquierdo
G. Ladwig
G. Wiederhold
J.-F. Baget
J.D. Ullman
M. Acosta
M.-E. Vidal
O. Hartig
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

International audienceLocal-As-View (LAV) mediators provide a uniform interface to a federation of heterogeneous data sources, attempting to execute queries against the federation. LAV mediators rely on query rewriters to translate mediator queries into equivalent queries on the federated data sources. The query rewriting problem in LAV mediators has shown to be NP-complete, and there may be an exponential number of rewritings, making unfeasible the execution or even generation of all the rewritings for some queries. The complexity of this problem can be particularly impacted when queries and data sources are described using SPARQL conjunctive queries, for which millions of rewritings could be generated. We aim at providing an efficient solution to the problem of executing LAV SPARQL query rewritings while the gathered answer is as complete as possible. We formulate the Result-Maximal k-Execution problem (ReMakE) as the problem of maximizing the query results obtained from the execution of only k rewritings. Additionally, a novel query execution strategy called GUN is proposed to solve the ReMakE problem. Our experimental evaluation demonstrates that GUN outperforms traditional techniques in terms of answer completeness and execution time

Crossref

VBN

Schema Mediation for Large-Scale Semantic Data Sharing

Author: Halevy Alon Y
Ives Zachary G
Suciu Dan
Tatarinov Igor
Publication venue: ScholarlyCommons
Publication date: 01/03/2005
Field of study

Intuitively, data management and data integration tools should be well suited for exchanging information in a semantically meaningful way. Unfortunately, they suffer from two significant problems: they typically require a common and comprehensive schema design before they can be used to store or share information, and they are difficult to extend because schema evolution is heavyweight and may break backward compatibility. As a result, many large-scale data sharing tasks are more easily facilitated by non-database-oriented tools that have little support for semantics. The goal of the peer data management system (PDMS) is to address this need: we propose the use of a decentralized, easily extensible data management architecture in which any user can contribute new data, schema information, or even mappings between other peers schemas. PDMSs represent a natural step beyond data integration systems, replacing their single logical schema with an interlinked collection of semantic mappings between peers individual schemas. This paper considers the problem of schema mediation in a PDMS. Our first contribution is a flexible language for mediating between peer schemas that extends known data integration formalisms to our more complex architecture. We precisely characterize the complexity of query answering for our language. Next, we describe a reformulation algorithm for our language that generalizes both global-as-view and local-as-view query answering algorithms. Then we describe several methods for optimizing the reformulation algorithm and an initial set of experiments studying its performance. Finally, we define and consider several global problems in managing semantic mappings in a PDMS

ScholarlyCommons@Penn

SemLAV: Local-As-View Mediation for SPARQL Queries

Author: A Doan
AY Halevy
C Bizer
C Bizer
D Calvanese
D Izquierdo
F Goasdoué
G Montoya
G Wiederhold
H Gupta
JD Ullman
M Taheriyan
M-E Vidal
R Chirkova
R Pottinger
S Abiteboul
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

International audienceThe Local-As-View(LAV) integration approach aims at querying heterogeneous data in dynamic environments. In LAV, data sources are described as views over a global schema which is used to pose queries. Query processing requires to generate and execute query rewritings, but for SPARQL queries, the LAV query rewritings may not be generated or executed in a reasonable time. In this paper, we present SemLAV, an alternative technique to process SPARQL queries over a LAV integration system without generating rewritings. SemLAV executes the query against a partial instance of the global schema which is built on-the-fly with data from the relevant views. The paper presents an experimental study for SemLAV, and compares its performance with traditional LAV-based query processing techniques. The results suggest that SemLAV scales up to SPARQL queries even over a large number of views, while it significantly outperforms traditional solutions

Crossref