1,491 research outputs found

    Conditional Hardness of Earth Mover Distance

    Get PDF
    The Earth Mover Distance (EMD) between two sets of points A, B subseteq R^d with |A| = |B| is the minimum total Euclidean distance of any perfect matching between A and B. One of its generalizations is asymmetric EMD, which is the minimum total Euclidean distance of any matching of size |A| between sets of points A,B subseteq R^d with |A| <= |B|. The problems of computing EMD and asymmetric EMD are well-studied and have many applications in computer science, some of which also ask for the EMD-optimal matching itself. Unfortunately, all known algorithms require at least quadratic time to compute EMD exactly. Approximation algorithms with nearly linear time complexity in n are known (even for finding approximately optimal matchings), but suffer from exponential dependence on the dimension. In this paper we show that significant improvements in exact and approximate algorithms for EMD would contradict conjectures in fine-grained complexity. In particular, we prove the following results: - Under the Orthogonal Vectors Conjecture, there is some c>0 such that EMD in Omega(c^{log^* n}) dimensions cannot be computed in truly subquadratic time. - Under the Hitting Set Conjecture, for every delta>0, no truly subquadratic time algorithm can find a (1 + 1/n^delta)-approximate EMD matching in omega(log n) dimensions. - Under the Hitting Set Conjecture, for every eta = 1/omega(log n), no truly subquadratic time algorithm can find a (1 + eta)-approximate asymmetric EMD matching in omega(log n) dimensions

    Generalized linear latent variable modeling analysis for multi-group studies

    Get PDF
    Latent variable modeling is commonly used in the behavioral, medical and social sciences. The models used in such analysis relate all observed variables to latent common factors. In many applications, the observed variables are in polytomous form. The existing procedures for models with polytomous outcomes can be considered lacking in several aspects, especially for multi-sample situations. We incorporate a new generalized linear latent variable modeling approach for developing statistically sound procedures that furnish meaningful interpretation and can incorporate many types of outcome variables. In the special case of polytomous outcomes, we also propose a model that incorporates response errors. A rather simple model parameterization used in our approach is appropriate for multi-sample analysis and leads to practically useful inference procedures. A Monte Carlo EM algorithm is developed for computing the full maximum likelihood estimates. Simulation studies are presented to validate the benefits of the new approach and to compare its performance to other methods. The new approach is also applied to analyze data from two substance abuse prevention studies

    Increasing Cheat Robustness of Crowdsourcing Tasks

    Get PDF
    Crowdsourcing successfully strives to become a widely used means of collecting large-scale scientific corpora. Many research fields, including Information Retrieval, rely on this novel way of data acquisition. However, it seems to be undermined by a significant share of workers that are primarily interested in producing quick generic answers rather than correct ones in order to optimise their time-efficiency and, in turn, earn more money. Recently, we have seen numerous sophisticated schemes of identifying such workers. Those, however, often require additional resources or introduce artificial limitations to the task. In this work, we take a different approach by investigating means of a priori making crowdsourced tasks more resistant against cheaters

    Comparing Rule-based, Feature-based and Deep Neural Methods for De-identification of Dutch Medical Records

    Get PDF
    Unstructured information in electronic health records provide an invaluable resource for medical research. To protect the confidentiality of patients and to conform to privacy regulations, de-identification methods automatically remove personally identifying information from these medical records. However, due to the unavailability of labeled data, most existing research is constrained to English medical text and little is known about the generalizability of de-identification methods across languages and domains. In this study, we construct a varied dataset consisting of the medical records of 1260 patients by sampling data from 9 institutes and three domains of Dutch healthcare. We test the generalizability of three de-identification methods across languages and domains. Our experiments show that an existing rule-based method specifically developed for the Dutch language fails to generalize to this new data. Furthermore, a state-of-the-art neural architecture performs strongly across languages and domains, even with limited training data. Compared to feature-based and rule-based methods the neural method requires significantly less configuration effort and domain-knowledge. We make all code and pre-trained de-identification models available to the research community, allowing practitioners to apply them to their datasets and to enable future benchmarks.Comment: Proceedings of the 1st ACM WSDM Health Search and Data Mining Workshop (HSDM2020), 202

    Separated at Birth: An Inquiry on the Conceptual Independence of the Entrepreneurship and the Leadership Constructs

    Get PDF
    Entrepreneurship and leadership may flow from the same genealogical source and the appearance of separation of the two constructs may be due to differences in the contexts through which the root phenomenon flows. Entrepreneurship and leadership are figuratively different manifestations of the need to create. To better understand the origin of entrepreneurship and leadership, research must first focus on the combinations or hierarchy of traits that are necessary, but perhaps not sufficient, to stimulate the two constructs. Factors that trigger a drive to create or take initiative within the individual in the context of a particular circumstance should be identified, and the situational factors that move the individual toward more traditional leader or classic entrepreneurial-type behaviors need to be understood

    Exploiting User Comments for Audio-Visual Content Indexing and Retrieval

    Get PDF
    State-of-the-art content sharing platforms often require users to assign tags to pieces of media in order to make them easily retrievable. Since this task is sometimes perceived as tedious or boring, annotations can be sparse. Commenting on the other hand is a frequently used means of expressing user opinion towards shared media items. This work makes use of time series analyses in order to infer potential tags and indexing terms for audio-visual content from user comments. In this way, we mitigate the vocabulary gap between queries and document descriptors. Additionally, we show how large-scale encyclopaedias such as Wikipedia can aid the task of tag prediction by serving as surrogates for high-coverage natural language vocabulary lists. Our evaluation is conducted on a corpus of several million real-world user comments from the popular video sharing platform YouTube, and demonstrates signicant improvements in retrieval performance

    ULearn: personalized medical learning on the web for patient empowerment

    Get PDF
    Health literacy constitutes an important step towards patient empowerment and the Web is presently the biggest repository of medical information and, thus, the biggest medical resource to be used in the learning process. However, at present, web medical information is mainly accessed through generic search engines that do not take into account the user specific needs and starting knowledge and so they are not able to support learning activities tailored to the specific user requirements. This work presents “ULearn” a meta engine that supports access, understanding and learning on the Web in the medical domain based on specific user requirements and knowledge levels towards what we call “balanced learning”. Balanced learning allows users to perform learning activities based on specific user requirements (understanding, deepening, widening and exploring) towards his/her empowerment. We have designed and developed ULearn to suggest search keywords correlated to the different user requirements and we have carried out some preliminary experiments to evaluate the effectiveness of the provided information
    corecore