Search CORE

3,919 research outputs found

Approximating Graph Pattern Queries Using Views

Author: Cao Yang
Li Jia
Liu Xudong
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/10/2016
Field of study

Crossref

Edinburgh Research Explorer

A Full Probabilistic Model for Yes/No Type Crowdsourcing in Multi-Class Classification

Author: Pichara Karim
Protopapas Pavlos
Saldias Belen
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 13/08/2019
Field of study

Crowdsourcing has become widely used in supervised scenarios where training sets are scarce and difficult to obtain. Most crowdsourcing models in the literature assume labelers can provide answers to full questions. In classification contexts, full questions require a labeler to discern among all possible classes. Unfortunately, discernment is not always easy in realistic scenarios. Labelers may not be experts in differentiating all classes. In this work, we provide a full probabilistic model for a shorter type of queries. Our shorter queries only require "yes" or "no" responses. Our model estimates a joint posterior distribution of matrices related to labelers' confusions and the posterior probability of the class of every object. We developed an approximate inference approach, using Monte Carlo Sampling and Black Box Variational Inference, which provides the derivation of the necessary gradients. We built two realistic crowdsourcing scenarios to test our model. The first scenario queries for irregular astronomical time-series. The second scenario relies on the image classification of animals. We achieved results that are comparable with those of full query crowdsourcing. Furthermore, we show that modeling labelers' failures plays an important role in estimating true classes. Finally, we provide the community with two real datasets obtained from our crowdsourcing experiments. All our code is publicly available.Comment: SIAM International Conference on Data Mining (SDM19), 9 official pages, 5 supplementary page

arXiv.org e-Print Archive

Crossref

Inapproximability of Combinatorial Optimization Problems

Author: Trevisan Luca
Publication venue
Publication date: 01/01/2004
Field of study

We survey results on the hardness of approximating combinatorial optimization problems

arXiv.org e-Print Archive

CiteSeerX

Approximating the Permanent of a Random Matrix with Vanishing Mean

Author: Eldar Lior
Mehraban Saeed
Publication venue
Publication date: 09/10/2018
Field of study

We show an algorithm for computing the permanent of a random matrix with vanishing mean in quasi-polynomial time. Among special cases are the Gaussian, and biased-Bernoulli random matrices with mean 1/lnln(n)^{1/8}. In addition, we can compute the permanent of a random matrix with mean 1/poly(ln(n)) in time 2^{O(n^{\eps})} for any small constant \eps>0. Our algorithm counters the intuition that the permanent is hard because of the "sign problem" - namely the interference between entries of a matrix with different signs. A major open question then remains whether one can provide an efficient algorithm for random matrices of mean 1/poly(n), whose conjectured #P-hardness is one of the baseline assumptions of the BosonSampling paradigm

arXiv.org e-Print Archive

Crossref

Explain3D: Explaining Disagreements in Disjoint Datasets

Author: Wang Xiaolan
Meliou Alexandra
Publication venue
Publication date: 24/02/1911
Field of study

Data plays an important role in applications, analytic processes, and many aspects of human activity. As data grows in size and complexity, we are met with an imperative need for tools that promote understanding and explanations over data-related operations. Data management research on explanations has focused on the assumption that data resides in a single dataset, under one common schema. But the reality of today's data is that it is frequently un-integrated, coming from different sources with different schemas. When different datasets provide different answers to semantically similar questions, understanding the reasons for the discrepancies is challenging and cannot be handled by the existing single-dataset solutions. In this paper, we propose Explain3D, a framework for explaining the disagreements across disjoint datasets (3D). Explain3D focuses on identifying the reasons for the differences in the results of two semantically similar queries operating on two datasets with potentially different schemas. Our framework leverages the queries to perform a semantic mapping across the relevant parts of their provenance; discrepancies in this mapping point to causes of the queries' differences. Exploiting the queries gives Explain3D an edge over traditional schema matching and record linkage techniques, which are query-agnostic. Our work makes the following contributions: (1) We formalize the problem of deriving optimal explanations for the differences of the results of semantically similar queries over disjoint datasets. (2) We design a 3-stage framework for solving the optimal explanation problem. (3) We develop a smart-partitioning optimizer that improves the efficiency of the framework by orders of magnitude. (4)~We experiment with real-world and synthetic data to demonstrate that Explain3D can derive precise explanations efficiently

arXiv.org e-Print Archive

Trinity College