374,685 research outputs found
Towards a semantic and statistical selection of association rules
The increasing growth of databases raises an urgent need for more accurate
methods to better understand the stored data. In this scope, association rules
were extensively used for the analysis and the comprehension of huge amounts of
data. However, the number of generated rules is too large to be efficiently
analyzed and explored in any further process. Association rules selection is a
classical topic to address this issue, yet, new innovated approaches are
required in order to provide help to decision makers. Hence, many interesting-
ness measures have been defined to statistically evaluate and filter the
association rules. However, these measures present two major problems. On the
one hand, they do not allow eliminating irrelevant rules, on the other hand,
their abun- dance leads to the heterogeneity of the evaluation results which
leads to confusion in decision making. In this paper, we propose a two-winged
approach to select statistically in- teresting and semantically incomparable
rules. Our statis- tical selection helps discovering interesting association
rules without favoring or excluding any measure. The semantic comparability
helps to decide if the considered association rules are semantically related
i.e comparable. The outcomes of our experiments on real datasets show promising
results in terms of reduction in the number of rules
Learning to select data for transfer learning with Bayesian Optimization
Domain similarity measures can be used to gauge adaptability and select
suitable data for transfer learning, but existing approaches define ad hoc
measures that are deemed suitable for respective tasks. Inspired by work on
curriculum learning, we propose to \emph{learn} data selection measures using
Bayesian Optimization and evaluate them across models, domains and tasks. Our
learned measures outperform existing domain similarity measures significantly
on three tasks: sentiment analysis, part-of-speech tagging, and parsing. We
show the importance of complementing similarity with diversity, and that
learned measures are -- to some degree -- transferable across models, domains,
and even tasks.Comment: EMNLP 2017. Code available at:
https://github.com/sebastianruder/learn-to-select-dat
Are We There Yet? A Communications Evaluation Guide
Most foundation and nonprofit communicators can speak at length about the work they do and what it's intended to achieve. But when it comes to describing exactly what their efforts are achieving, few can offer specifics.This guide helps foundation and nonprofit communicators learn whether their communications are effective and what is being achieved -- and determine if any course corrections are necessary.Among the reasons stressed for evaluating communication efforts are these:Evaluation improves the effectiveness of communications.Evaluation can help organizations more effectively engage with intended audiences.Situations change - strategies and tactics may need to change as well.Evaluation ensures wise allocation of resources.The guide points out that evaluation need not be limited to large-scale campaigns or major outreach activities, but should also conducted for efforts to raise awareness of an organization or an issue. And once an evaluation is underway, the guide suggests findings be shared with those who may benefit from what is learned, such as team members, the board, colleagues and peers.The guide includes:Background on why evaluation can contribute to good communications.Four case studies of evaluation in action from the Lumina Foundation for Education, the Robert Wood Johnson Foundation, the Neimand Collaborative, and the California HealthCare Foundation.A worksheet for creating an evaluation plan
FLA Toolbox: Fair Hiring Processes
This document is part of a digital collection provided by the Martin P. Catherwood Library, ILR School, Cornell University, pertaining to the effects of globalization on the workplace worldwide. Special emphasis is placed on labor rights, working conditions, labor market changes, and union organizing.FLA__toolbox_HIRING.pdf: 32 downloads, before Oct. 1, 2020
Nested Hierarchical Dirichlet Processes
We develop a nested hierarchical Dirichlet process (nHDP) for hierarchical
topic modeling. The nHDP is a generalization of the nested Chinese restaurant
process (nCRP) that allows each word to follow its own path to a topic node
according to a document-specific distribution on a shared tree. This alleviates
the rigid, single-path formulation of the nCRP, allowing a document to more
easily express thematic borrowings as a random effect. We derive a stochastic
variational inference algorithm for the model, in addition to a greedy subtree
selection method for each document, which allows for efficient inference using
massive collections of text documents. We demonstrate our algorithm on 1.8
million documents from The New York Times and 3.3 million documents from
Wikipedia.Comment: To appear in IEEE Transactions on Pattern Analysis and Machine
Intelligence, Special Issue on Bayesian Nonparametric
Accuracy-based scoring for DOT: towards direct error minimization for data-oriented translation
In this work we present a novel technique to rescore fragments in the Data-Oriented Translation model based on their contribution to translation accuracy. We describe
three new rescoring methods, and present the initial results of a pilot experiment on a small subset of the Europarl corpus. This work is a proof-of-concept, and
is the first step in directly optimizing translation
decisions solely on the hypothesized accuracy of potential translations resulting from those decisions
- …