86 research outputs found
Search and Result Presentation in Scientific Workflow Repositories
We study the problem of searching a repository of complex hierarchical
workflows whose component modules, both composite and atomic, have been
annotated with keywords. Since keyword search does not use the graph structure
of a workflow, we develop a model of workflows using context-free bag grammars.
We then give efficient polynomial-time algorithms that, given a workflow and a
keyword query, determine whether some execution of the workflow matches the
query. Based on these algorithms we develop a search and ranking solution that
efficiently retrieves the top-k grammars from a repository. Finally, we propose
a novel result presentation method for grammars matching a keyword query, based
on representative parse-trees. The effectiveness of our approach is validated
through an extensive experimental evaluation
Most Expected Winner: An Interpretation of Winners over Uncertain Voter Preferences
It remains an open question how to determine the winner of an election when
voter preferences are incomplete or uncertain. One option is to assume some
probability space over the voting profile and select the Most Probable Winner
(MPW) -- the candidate or candidates with the best chance of winning. In this
paper, we propose an alternative winner interpretation, selecting the Most
Expected Winner (MEW) according to the expected performance of the candidates.
We separate the uncertainty in voter preferences into the generation step and
the observation step, which gives rise to a unified voting profile combining
both incomplete and probabilistic voting profiles. We use this framework to
establish the theoretical hardness of \mew over incomplete voter preferences,
and then identify a collection of tractable cases for a variety of voting
profiles, including those based on the popular Repeated Insertion Model (RIM)
and its special case, the Mallows model. We develop solvers customized for
various voter preference types to quantify the candidate performance for the
individual voters, and propose a pruning strategy that optimizes computation.
The performance of the proposed solvers and pruning strategy is evaluated
extensively on real and synthetic benchmarks, showing that our methods are
practical.Comment: This is the technical report of the following paper: Haoyue Ping and
Julia Stoyanovich. 2023. Most Expected Winner: An Interpretation of Winners
over Uncertain Voter Preferences. Proc. ACM Manag. Data, 1, N1, Article 22
(May 2023), 33 pages. https://doi.org/10.1145/358870
Recommended from our members
Schema Polynomials and Applications
Conceptual complexity is emerging as a new bottleneck as database developers, application developers, and database administrators struggle to design and comprehend large, complex schemas. The simplicity and conciseness of a schema depends critically on the idioms available to express the schema. We propose a formal conceptual schema representation language that combines different design formalisms, and allows schema manipulation that exposes the strengths of each of these formalisms. We demonstrate how the schema factorization framework can be used to generate relational, object-oriented, and faceted physical schemas, allowing a wider exploration of physical schema alternatives than traditional methodologies. We illustrate the potential practical benefits of schema factorization by showing that simple heuristics can significantly reduce the size of a real-world schema description. We also propose the use of schema polynomials to model and derive alternative representations for complex relationships with constraints
Recommended from our members
MutaGeneSys: Making Diagnostic Predictions Based on Genome-Wide Genotype Data in Association Studies
Summary: We present MutaGeneSys: a system that uses genomewide genotype data for disease prediction. Our system integrates three data sources: the International HapMap project, whole-genome marker correlation data and the Online Mendelian Inheritance in Man (OMIM) database. It accepts SNP data of individuals as query input and delivers disease susceptibility hypotheses even if the original set of typed SNPs is incomplete. Our system is scalable and flexible: it operates in real time and can be configured on the fly to produce population, technology, and confidence-specific predictions. Availability: Efforts are underway to deploy our system as part of the NCBI Reference Assembly. Meanwhile, the system may be obtained from the authors. Contact: [email protected]
Recommended from our members
Rank-Aware Subspace Clutering for Structured Datasets
In online applications such as Yahoo! Personals and Trulia.com users define structured profiles in order to find potentially interesting matches. Typically, profiles are evaluated against large datasets and produce thousands of matches. In addition to filtering, users also specify ranking in their profile, and matches are returned in the form of a ranked list. Top results in ranked lists are typically homogeneous, which hinders data exploration. For example, a user looking for 1- or 2-bedroom apartments sorted by price will see a large number of cheap 1-bedrooms in undesirable neighborhoods before seeing any apartment with different characteristics. An alternative to ranking is to group matches on common attribute values (e.g., cheap 1-bedrooms in good neighborhoods, 2-bedrooms with 2 baths). However, not all groups will be of interest to the user given the ranking criteria. We argue here that neither single-list ranking nor attribute-based grouping is adequate for effective exploration of ranked datasets. We formalize rank-aware clustering and develop a novel rank-aware bottom-up subspace clustering algorithm. We evaluate the performance of our algorithm over large datasets from a leading online dating site, and present an experimental evaluation of its effectiveness
Rule-Based Application Development using Webdamlog
We present the WebdamLog system for managing distributed data on the Web in a
peer-to-peer manner. We demonstrate the main features of the system through an
application called Wepic for sharing pictures between attendees of the sigmod
conference. Using Wepic, the attendees will be able to share, download, rate
and annotate pictures in a highly decentralized manner. We show how WebdamLog
handles heterogeneity of the devices and services used to share data in such a
Web setting. We exhibit the simple rules that define the Wepic application and
show how to easily modify the Wepic application.Comment: SIGMOD - Special Interest Group on Management Of Data (2013
- …