374,685 research outputs found

    Towards a semantic and statistical selection of association rules

    Full text link
    The increasing growth of databases raises an urgent need for more accurate methods to better understand the stored data. In this scope, association rules were extensively used for the analysis and the comprehension of huge amounts of data. However, the number of generated rules is too large to be efficiently analyzed and explored in any further process. Association rules selection is a classical topic to address this issue, yet, new innovated approaches are required in order to provide help to decision makers. Hence, many interesting- ness measures have been defined to statistically evaluate and filter the association rules. However, these measures present two major problems. On the one hand, they do not allow eliminating irrelevant rules, on the other hand, their abun- dance leads to the heterogeneity of the evaluation results which leads to confusion in decision making. In this paper, we propose a two-winged approach to select statistically in- teresting and semantically incomparable rules. Our statis- tical selection helps discovering interesting association rules without favoring or excluding any measure. The semantic comparability helps to decide if the considered association rules are semantically related i.e comparable. The outcomes of our experiments on real datasets show promising results in terms of reduction in the number of rules

    Learning to select data for transfer learning with Bayesian Optimization

    Full text link
    Domain similarity measures can be used to gauge adaptability and select suitable data for transfer learning, but existing approaches define ad hoc measures that are deemed suitable for respective tasks. Inspired by work on curriculum learning, we propose to \emph{learn} data selection measures using Bayesian Optimization and evaluate them across models, domains and tasks. Our learned measures outperform existing domain similarity measures significantly on three tasks: sentiment analysis, part-of-speech tagging, and parsing. We show the importance of complementing similarity with diversity, and that learned measures are -- to some degree -- transferable across models, domains, and even tasks.Comment: EMNLP 2017. Code available at: https://github.com/sebastianruder/learn-to-select-dat

    Are We There Yet? A Communications Evaluation Guide

    Get PDF
    Most foundation and nonprofit communicators can speak at length about the work they do and what it's intended to achieve. But when it comes to describing exactly what their efforts are achieving, few can offer specifics.This guide helps foundation and nonprofit communicators learn whether their communications are effective and what is being achieved -- and determine if any course corrections are necessary.Among the reasons stressed for evaluating communication efforts are these:Evaluation improves the effectiveness of communications.Evaluation can help organizations more effectively engage with intended audiences.Situations change - strategies and tactics may need to change as well.Evaluation ensures wise allocation of resources.The guide points out that evaluation need not be limited to large-scale campaigns or major outreach activities, but should also conducted for efforts to raise awareness of an organization or an issue. And once an evaluation is underway, the guide suggests findings be shared with those who may benefit from what is learned, such as team members, the board, colleagues and peers.The guide includes:Background on why evaluation can contribute to good communications.Four case studies of evaluation in action from the Lumina Foundation for Education, the Robert Wood Johnson Foundation, the Neimand Collaborative, and the California HealthCare Foundation.A worksheet for creating an evaluation plan

    FLA Toolbox: Fair Hiring Processes

    Get PDF
    This document is part of a digital collection provided by the Martin P. Catherwood Library, ILR School, Cornell University, pertaining to the effects of globalization on the workplace worldwide. Special emphasis is placed on labor rights, working conditions, labor market changes, and union organizing.FLA__toolbox_HIRING.pdf: 32 downloads, before Oct. 1, 2020

    Nested Hierarchical Dirichlet Processes

    Full text link
    We develop a nested hierarchical Dirichlet process (nHDP) for hierarchical topic modeling. The nHDP is a generalization of the nested Chinese restaurant process (nCRP) that allows each word to follow its own path to a topic node according to a document-specific distribution on a shared tree. This alleviates the rigid, single-path formulation of the nCRP, allowing a document to more easily express thematic borrowings as a random effect. We derive a stochastic variational inference algorithm for the model, in addition to a greedy subtree selection method for each document, which allows for efficient inference using massive collections of text documents. We demonstrate our algorithm on 1.8 million documents from The New York Times and 3.3 million documents from Wikipedia.Comment: To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence, Special Issue on Bayesian Nonparametric

    Accuracy-based scoring for DOT: towards direct error minimization for data-oriented translation

    Get PDF
    In this work we present a novel technique to rescore fragments in the Data-Oriented Translation model based on their contribution to translation accuracy. We describe three new rescoring methods, and present the initial results of a pilot experiment on a small subset of the Europarl corpus. This work is a proof-of-concept, and is the first step in directly optimizing translation decisions solely on the hypothesized accuracy of potential translations resulting from those decisions
    corecore