11 research outputs found
Active Learning for Natural Language Generation
The field of text generation suffers from a severe shortage of labeled data
due to the extremely expensive and time consuming process involved in manual
annotation. A natural approach for coping with this problem is active learning
(AL), a well-known machine learning technique for improving annotation
efficiency by selectively choosing the most informative examples to label.
However, while AL has been well-researched in the context of text
classification, its application to text generation remained largely unexplored.
In this paper, we present a first systematic study of active learning for text
generation, considering a diverse set of tasks and multiple leading AL
strategies. Our results indicate that existing AL strategies, despite their
success in classification, are largely ineffective for the text generation
scenario, and fail to consistently surpass the baseline of random example
selection. We highlight some notable differences between the classification and
generation scenarios, and analyze the selection behaviors of existing AL
strategies. Our findings motivate exploring novel approaches for applying AL to
NLG tasks
Efficient Benchmarking (of Language Models)
The increasing versatility of language models LMs has given rise to a new
class of benchmarks that comprehensively assess a broad range of capabilities.
Such benchmarks are associated with massive computational costs reaching
thousands of GPU hours per model. However the efficiency aspect of these
evaluation efforts had raised little discussion in the literature. In this work
we present the problem of Efficient Benchmarking namely intelligently reducing
the computation costs of LM evaluation without compromising reliability. Using
the HELM benchmark as a test case we investigate how different benchmark design
choices affect the computation-reliability tradeoff. We propose to evaluate the
reliability of such decisions by using a new measure Decision Impact on
Reliability DIoR for short. We find for example that the current leader on HELM
may change by merely removing a low-ranked model from the benchmark and observe
that a handful of examples suffice to obtain the correct benchmark ranking.
Conversely a slightly different choice of HELM scenarios varies ranking widely.
Based on our findings we outline a set of concrete recommendations for more
efficient benchmark design and utilization practices leading to dramatic cost
savings with minimal loss of benchmark reliability often reducing computation
by x100 or more
Corpus Wide Argument Mining -- a Working Solution
One of the main tasks in argument mining is the retrieval of argumentative
content pertaining to a given topic. Most previous work addressed this task by
retrieving a relatively small number of relevant documents as the initial
source for such content. This line of research yielded moderate success, which
is of limited use in a real-world system. Furthermore, for such a system to
yield a comprehensive set of relevant arguments, over a wide range of topics,
it requires leveraging a large and diverse corpus in an appropriate manner.
Here we present a first end-to-end high-precision, corpus-wide argument mining
system. This is made possible by combining sentence-level queries over an
appropriate indexing of a very large corpus of newspaper articles, with an
iterative annotation scheme. This scheme addresses the inherent label bias in
the data and pinpoints the regions of the sample space whose manual labeling is
required to obtain high-precision among top-ranked candidates
Label Sleuth: From Unlabeled Text to a Classifier in a Few Hours
Text classification can be useful in many real-world scenarios, saving a lot
of time for end users. However, building a custom classifier typically requires
coding skills and ML knowledge, which poses a significant barrier for many
potential users. To lift this barrier, we introduce Label Sleuth, a free open
source system for labeling and creating text classifiers. This system is unique
for (a) being a no-code system, making NLP accessible to non-experts, (b)
guiding users through the entire labeling process until they obtain a custom
classifier, making the process efficient -- from cold start to classifier in a
few hours, and (c) being open for configuration and extension by developers. By
open sourcing Label Sleuth we hope to build a community of users and developers
that will broaden the utilization of NLP models.Comment: 7 pages, 2 figure
Land/Homeland, Story/History: the social landscapes of the Southern Levant from Alexander to Augustus
This material has been published in revised form in The Social Archaeology of the Levant from Prehistory to the Present edited by Assaf Yasur-Landau, Eric H. Cline, and Yorke Rowan https://doi.org/10.1017/9781316661468.024. This version is free to view and download for private research and study only. Not for re-distribution or re-use. © Cambridge University PressThe Hellenistic era opens with Alexander the Greatâs triumph over Achaemenid Persia, an event that inaugurates a millennium of western political hegemony over the Levant and paves the way for an infusion of western cultural ideas. I examine the social repercussions of this juncture of politics and culture for five self-identifying ethnoi within the region: Phoenicians (meaning Tyrians and Sidonians), Samaritans, Judeans, Idumeans, and Nabateans. I consider physical and written evidence as reflections of agency, opportunity, status, and authority, in order to reconstruct how people defined and presented themselves, and how they jockeyed for position and security in a crowded region and a volatile world. Fortunes fluctuated along with changes in imperial rule. The Ptolemies instituted a rapacious system of resource extraction, under which only the most nimble or removed kept their footing (i.e., Phoenicians, Nabateans). The Seleucids followed in the more magnanimous footsteps of the Achaemenids, offering a measure of economic and legal autonomy, an approach that placated some (e.g., Samaritans) and empowered others (e.g., Judeans). As Seleucid control weakened, groups used various means to claim status and authority. Samaritans, Judeans, and Idumeans deployed history and geography; Phoenicians and Nabateans depended on economic connections and cultural currency. Waning imperial powers in the later second century BCE left the regionâs ethnoi effectively autonomous. Phoenicians and Nabateans became wealthy cosmopolitans connected to Mediterranean markets. Judeans unleashed an aggressive program of territorial acquisition, first successfully against Idumeans and Samaritans, then less so against Tyrians and Nabateans. Contemporary writers turned these events into historical narratives â divinely countenanced (1 Maccabees, Dead Sea Scrolls) vs. opportunistic circumstance (2 Maccabees, Tacitus, Josephus). These accounts offered people differing templates by which to situate themselves in place and history â templates ill-suited for co-existence. By the time Roman authorities established their imperial presence here in the mid-first century BCE, the social landscape was mined and ready to erupt.Accepted manuscrip
Labor Division with Movable Walls: Composing Executable Specifications with Machine Learning and Search (Blue Sky Idea)
Artificial intelligence (AI) techniques, including, e.g., machine learning, multi-agent collaboration, planning, and heuristic search, are emerging as ever-stronger tools for solving hard problems in real-world applications. Executable specification techniques (ES), including, e.g., Statecharts and scenario-based programming, is a promising development approach, offering intuitiveness, ease of enhancement, compositionality, and amenability to formal analysis. We propose an approach for integrating AI and ES techniques in developing complex intelligent systems, which can greatly simplify agile/spiral development and maintenance processes. The approach calls for automated detection of whether certain goals and sub-goals are met; a clear division between sub-goals solved with AI and those solved with ES; compositional and incremental addition of AI-based or ES-based components, each focusing on a particular gap between a current capability and a well-stated goal; and, iterative refinement of sub-goals solved with AI into smaller sub-sub-goals where some are solved with ES, and some with AI. We describe the principles of the approach and its advantages, as well as key challenges and suggestions for how to tackle them
The concise guide to PHARMACOLOGY 2013/14:G protein-coupled receptors
The Concise Guide to PHARMACOLOGY 2013/14 provides concise overviews of the key properties of over 2000 human drug targets with their pharmacology, plus links to an open access knowledgebase of drug targets and their ligands (www.guidetopharmacology.org), which provides more detailed views of target and ligand properties. The full contents can be found at http://onlinelibrary.wiley.com/doi/10.1111/bph.12444/full. G protein-coupled receptors are one of the seven major pharmacological targets into which the Guide is divided, with the others being G protein-coupled receptors, ligand-gated ion channels, ion channels, catalytic receptors, nuclear hormone receptors, transporters and enzymes. These are presented with nomenclature guidance and summary information on the best available pharmacological tools, alongside key references and suggestions for further reading. A new landscape format has easy to use tables comparing related targets. It is a condensed version of material contemporary to late 2013, which is presented in greater detail and constantly updated on the website www.guidetopharmacology.org, superseding data presented in previous Guides to Receptors and Channels. It is produced in conjunction with NC-IUPHAR and provides the official IUPHAR classification and nomenclature for human drug targets, where appropriate. It consolidates information previously curated and displayed separately in IUPHAR-DB and the Guide to Receptors and Channels, providing a permanent, citable, point-in-time record that will survive database updates