Search CORE

242 research outputs found

Search and Result Presentation in Scientific Workflow Repositories

Author: Davidson Susan B.
Huang Xiaocheng
Stoyanovich Julia
Yuan Xiaojie
Publication venue
Publication date: 01/01/2013
Field of study

We study the problem of searching a repository of complex hierarchical workflows whose component modules, both composite and atomic, have been annotated with keywords. Since keyword search does not use the graph structure of a workflow, we develop a model of workflows using context-free bag grammars. We then give efficient polynomial-time algorithms that, given a workflow and a keyword query, determine whether some execution of the workflow matches the query. Based on these algorithms we develop a search and ranking solution that efficiently retrieves the top-k grammars from a repository. Finally, we propose a novel result presentation method for grammars matching a keyword query, based on representative parse-trees. The effectiveness of our approach is validated through an extensive experimental evaluation

arXiv.org e-Print Archive

Crossref

ScholarlyCommons@Penn

Taming Technical Bias in Machine Learning Pipelines

Author: Schelter S.
Stoyanovich J.
Publication venue
Publication date: 01/12/2020
Field of study

Machine Learning (ML) is commonly used to automate decisions in domains as varied as credit and lending, medical diagnosis, and hiring. These decisions are consequential, imploring us to carefully balance the benefits of efficiency with the potential risks. Much of the conversation about the risks centers around bias — a term that is used by the technical community ever more frequently but that is still poorly understood. In this paper we focus on technical bias — a type of bias that has so far received limited attention and that the data engineering community is well-equipped to address. We discuss dimensions of technical bias that can arise through the ML lifecycle, particularly when it’s due to preprocessing decisions or post-deployment issues. We present results of our recent work, and discuss future research directions. Our over-all goal is to support the development of systems that expose the knobs of responsibility to data scientists, allowing them to detect instances of technical bias and to mitigate it when possible

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Rule-Based Application Development using Webdamlog

Author: Abiteboul Serge
Antoine Émilien
Miklau Gerome
Stoyanovich Julia
Testard Jules
Publication venue
Publication date: 01/01/2013
Field of study

We present the WebdamLog system for managing distributed data on the Web in a peer-to-peer manner. We demonstrate the main features of the system through an application called Wepic for sharing pictures between attendees of the sigmod conference. Using Wepic, the attendees will be able to share, download, rate and annotate pictures in a highly decentralized manner. We show how WebdamLog handles heterogeneity of the devices and services used to share data in such a Web setting. We exhibit the simple rules that define the Wepic application and show how to easily modify the Wepic application.Comment: SIGMOD - Special Interest Group on Management Of Data (2013

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Symmetric Relations and Cardinality-Bounded Multisets in Database Systems

Author: J STOYANOVICH
K ROSS
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

Crossref

Introducing Access Control in Webdamlog

Author: Abiteboul Serge
Antoine Émilien
Miklau Gerome
Moffitt Vera Zaychik
Stoyanovich Julia
Publication venue
Publication date: 31/07/2013
Field of study

We survey recent work on the specification of an access control mechanism in a collaborative environment. The work is presented in the context of the WebdamLog language, an extension of datalog to a distributed context. We discuss a fine-grained access control mechanism for intentional data based on provenance as well as a control mechanism for delegation, i.e., for deploying rules at remote peers.Comment: Proceedings of the 14th International Symposium on Database Programming Languages (DBPL 2013), August 30, 2013, Riva del Garda, Trento, Ital

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Rennes 1

Recommended from our members

Schema Polynomials and Applications

Author: Ross Kenneth A.
Stoyanovich Julia
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2007
Field of study

Conceptual complexity is emerging as a new bottleneck as database developers, application developers, and database administrators struggle to design and comprehend large, complex schemas. The simplicity and conciseness of a schema depends critically on the idioms available to express the schema. We propose a formal conceptual schema representation language that combines different design formalisms, and allows schema manipulation that exposes the strengths of each of these formalisms. We demonstrate how the schema factorization framework can be used to generate relational, object-oriented, and faceted physical schemas, allowing a wider exploration of physical schema alternatives than traditional methodologies. We illustrate the potential practical benefits of schema factorization by showing that simple heuristics can significantly reduce the size of a real-world schema description. We also propose the use of schema polynomials to model and derive alternative representations for complex relationships with constraints

Columbia University Academic Commons