634 research outputs found

    Crowdsourcing complex workflows under budget constraints

    Get PDF
    We consider the problem of task allocation in crowdsourcing systems with multiple complex workflows, each of which consists of a set of interdependent micro-tasks. We propose Budgeteer, an algorithm to solve this problem under a budget constraint. In particular, our algorithm first calculates an efficient way to allocate budget to each workflow. It then determines the number of inter-dependent micro-tasks and the price to pay for each task within each workflow, given the corresponding budget constraints. We empirically evaluate it on a well-known crowdsourcing-based text correction workflow using Amazon Mechanical Turk, and show that Budgeteer can achieve similar levels of accuracy to current benchmarks, but is on average 45% cheaper

    SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets

    Get PDF
    State-of-the-art instance matching approaches do not perform well when used for matching instances across heterogeneous datasets. This shortcoming derives from their core operation depending on direct matching, which involves a direct comparison of instances in the source with instances in the target dataset. Direct matching is not suitable when the overlap between the datasets is small. Aiming at resolving this problem, we propose a new paradigm called class-based matching. Given a class of instances from the source dataset, called the class of interest, and a set of candidate matches retrieved from the target, class-based matching refines the candidates by filtering out those that do not belong to the class of interest. For this refinement, only data in the target is used, i.e., no direct comparison between source and target is involved. Based on extensive experiments using public benchmarks, we show our approach greatly improves the quality of state-of-the-art systems; especially on difficult matching tasks

    Structure Functions of Nuclei at Small x and Diffraction at HERA

    Get PDF
    Gribov theory is applied to investigate the shadowing effects in the structure functions of nuclei. In this approach these effects are related to the process of diffractive dissociation of a virtual photon. A model for this diffractive process, which describes well the HERA data, is used to calculate the shadowing in nuclear structure functions. A reasonable description of the x, Q^2 and A-dependence of nuclear shadowing is achieved.Comment: TeX, 10 pages, 7 figures in 6 ps-file

    Hard Diffraction at HERA and the Gluonic Content of the Pomeron

    Get PDF
    We show that the previously introduced CKMT model, based on conventional Regge theory, gives a good description of the HERA data on the structure function F_2^D for large rapidity gap (diffractive) events. These data allow, not only to determine the valence and sea quark content of the Pomeron, but also, through their Q^2 dependence, give information on its gluonic content. Using DGLAP evolution, we find that the gluon distribution in the Pomeron is very hard and the gluons carry more momentum than the quarks. This indicates that the Pomeron, unlike ordinary hadrons, is a mostly gluonic object. With our definition of the Pomeron flux factor the total momentum carried by quarks and gluons turns out to be 0.3-0.4 - strongly violating the momentum sum rule.Comment: C-Shell archive of a PostScript file containing a 20 page paper with text and 12 figures in i

    Deep Metric Learning Meets Deep Clustering: An Novel Unsupervised Approach for Feature Embedding

    Full text link
    Unsupervised Deep Distance Metric Learning (UDML) aims to learn sample similarities in the embedding space from an unlabeled dataset. Traditional UDML methods usually use the triplet loss or pairwise loss which requires the mining of positive and negative samples w.r.t. anchor data points. This is, however, challenging in an unsupervised setting as the label information is not available. In this paper, we propose a new UDML method that overcomes that challenge. In particular, we propose to use a deep clustering loss to learn centroids, i.e., pseudo labels, that represent semantic classes. During learning, these centroids are also used to reconstruct the input samples. It hence ensures the representativeness of centroids - each centroid represents visually similar samples. Therefore, the centroids give information about positive (visually similar) and negative (visually dissimilar) samples. Based on pseudo labels, we propose a novel unsupervised metric loss which enforces the positive concentration and negative separation of samples in the embedding space. Experimental results on benchmarking datasets show that the proposed approach outperforms other UDML methods.Comment: Accepted in BMVC 202
    corecore